



# Arm® Compiler

Version 6.6

## armasm User Guide

### Non-Confidential

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates).  
All rights reserved.

### Issue

DUI0801\_L\_en

# Arm® Compiler armasm User Guide

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

## Release information

### Document history

| Issue | Date             | Confidentiality  | Change                      |
|-------|------------------|------------------|-----------------------------|
| A     | 14 March 2014    | Non-Confidential | Arm Compiler v6.00 Release  |
| B     | 15 December 2014 | Non-Confidential | Arm Compiler v6.01 Release  |
| C     | 30 June 2015     | Non-Confidential | Arm Compiler v6.02 Release  |
| D     | 18 November 2015 | Non-Confidential | Arm Compiler v6.3 Release   |
| E     | 24 February 2016 | Non-Confidential | Arm Compiler v6.4 Release   |
| F     | 29 June 2016     | Non-Confidential | Arm Compiler v6.5 Release   |
| G     | 4 November 2016  | Non-Confidential | Arm Compiler v6.6 Release   |
| H     | 8 May 2017       | Non-Confidential | Arm Compiler v6.6.1 Release |
| I     | 29 November 2017 | Non-Confidential | Arm Compiler v6.6.2 Release |
| J     | 28 August 2019   | Non-Confidential | Arm Compiler v6.6.3 Release |
| K     | 26 August 2020   | Non-Confidential | Arm Compiler v6.6.4 Release |
| L     | 31 January 2023  | Non-Confidential | Arm Compiler v6.6.5 Release |

## Proprietary Notice

This document is protected by copyright and other related rights and the practice or implementation of the information contained in this document may be protected by one or more patents or pending patent applications. No part of this document may be reproduced in any form by any means without the express prior written permission of Arm. No license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this document unless specifically stated.

Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use the information for the purposes of determining whether implementations infringe any third party patents.

THIS DOCUMENT IS PROVIDED "AS IS". ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, Arm makes no representation with respect to, and has undertaken no analysis to identify or understand the scope and content of, patents, copyrights, trade secrets, or other rights.

This document may include technical inaccuracies or typographical errors.

TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure of this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof is not exported, directly or indirectly, in violation of such export laws. Use of the word "partner" in reference to Arm's customers is not intended to create or refer to any partnership relationship with any other company. Arm may make changes to this document at any time and without notice.

This document may be translated into other languages for convenience, and you agree that if there is any conflict between the English version of this document and any translation, the terms of the English version of the Agreement shall prevail.

The Arm corporate logo and words marked with ® or ™ are registered trademarks or trademarks of Arm Limited (or its affiliates) in the US and/or elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of their respective owners. Please follow Arm's trademark usage guidelines at <https://www.arm.com/company/policies/trademarks>.

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Arm Limited. Company 02557590 registered in England.

110 Fulbourn Road, Cambridge, England CB1 9NJ.

(LES-PRE-20349|version 21.0)

## Confidentiality Status

This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in accordance with the terms of the agreement entered into by Arm and the party that Arm delivered this document to.

Unrestricted Access is an Arm internal classification.

## Product Status

The information in this document is Final, that is for a developed product.

## Feedback

Arm® welcomes feedback on this product and its documentation. To provide feedback on the product, create a ticket on <https://support.developer.arm.com>

To provide feedback on the document, fill the following survey: <https://developer.arm.com/documentation-feedback-survey>.

## Inclusive language commitment

Arm values inclusive communities. Arm recognizes that we and our industry have used language that can be offensive. Arm strives to lead the industry and create change.

We believe that this document contains no offensive language. To report offensive language in this document, email [terms@arm.com](mailto:terms@arm.com).

# Contents

|                             |           |
|-----------------------------|-----------|
| <b>List of Figures.....</b> | <b>48</b> |
| <b>List of Tables.....</b>  | <b>49</b> |

|                                                                              |           |
|------------------------------------------------------------------------------|-----------|
| <b>1. Introduction.....</b>                                                  | <b>59</b> |
| 1.1 Conventions.....                                                         | 59        |
| 1.2 Other information.....                                                   | 60        |
| <b>2. Overview of the Assembler.....</b>                                     | <b>61</b> |
| 2.1 About the Arm Compiler toolchain assemblers.....                         | 61        |
| 2.2 Key features of the armasm assembler.....                                | 62        |
| 2.3 How the assembler works.....                                             | 62        |
| 2.4 Directives that can be omitted in pass 2 of the assembler.....           | 64        |
| 2.5 Support level definitions.....                                           | 65        |
| <b>3. Overview of the Armv8 Architecture.....</b>                            | <b>71</b> |
| 3.1 About the Arm architecture.....                                          | 71        |
| 3.2 A32 and T32 instruction sets.....                                        | 71        |
| 3.3 A64 instruction set.....                                                 | 72        |
| 3.4 Changing between AArch64 and AArch32 states.....                         | 73        |
| 3.5 Advanced SIMD.....                                                       | 73        |
| 3.6 Floating-point hardware.....                                             | 74        |
| <b>4. Overview of AArch32 state.....</b>                                     | <b>76</b> |
| 4.1 Changing between A32 and T32 instruction set states.....                 | 76        |
| 4.2 Processor modes, and privileged and unprivileged software execution..... | 76        |
| 4.3 Processor modes in Armv6-M, Armv7-M, and Armv8-M.....                    | 77        |
| 4.4 Registers in AArch32 state.....                                          | 78        |
| 4.5 General-purpose registers in AArch32 state.....                          | 80        |
| 4.6 Register accesses in AArch32 state.....                                  | 80        |
| 4.7 Predeclared core register names in AArch32 state.....                    | 81        |
| 4.8 Predeclared extension register names in AArch32 state.....               | 82        |
| 4.9 Program Counter in AArch32 state.....                                    | 82        |
| 4.10 The Q flag in AArch32 state.....                                        | 83        |
| 4.11 Application Program Status Register.....                                | 83        |
| 4.12 Current Program Status Register in AArch32 state.....                   | 84        |
| 4.13 Saved Program Status Registers in AArch32 state.....                    | 85        |
| 4.14 A32 and T32 instruction set overview.....                               | 85        |
| 4.15 Access to the inline barrel shifter in AArch32 state.....               | 86        |
| <b>5. Overview of AArch64 state.....</b>                                     | <b>88</b> |

|                                                                        |            |
|------------------------------------------------------------------------|------------|
| 5.1 Registers in AArch64 state.....                                    | 88         |
| 5.2 Exception levels.....                                              | 89         |
| 5.3 Link registers.....                                                | 90         |
| 5.4 Stack Pointer register.....                                        | 90         |
| 5.5 Predeclared core register names in AArch64 state.....              | 91         |
| 5.6 Predeclared extension register names in AArch64 state.....         | 92         |
| 5.7 Program Counter in AArch64 state.....                              | 92         |
| 5.8 Conditional execution in AArch64 state.....                        | 93         |
| 5.9 The Q flag in AArch64 state.....                                   | 93         |
| 5.10 Process State.....                                                | 94         |
| 5.11 Saved Program Status Registers in AArch64 state.....              | 95         |
| 5.12 A64 instruction set overview.....                                 | 95         |
| <b>6. Structure of Assembly Language Modules.....</b>                  | <b>97</b>  |
| 6.1 Syntax of source lines in assembly language.....                   | 97         |
| 6.2 Literals.....                                                      | 98         |
| 6.3 ELF sections and the AREA directive.....                           | 99         |
| 6.4 An example armasm syntax assembly language module.....             | 100        |
| <b>7. Writing A32/T32 Assembly Language.....</b>                       | <b>103</b> |
| 7.1 About the Unified Assembler Language.....                          | 103        |
| 7.2 Syntax differences between UAL and A64 assembly language.....      | 104        |
| 7.3 Register usage in subroutine calls.....                            | 105        |
| 7.4 Load immediate values.....                                         | 106        |
| 7.5 Load immediate values using MOV and MVN.....                       | 106        |
| 7.6 Load immediate values using MOV32.....                             | 109        |
| 7.7 Load immediate values using LDR Rd, =const.....                    | 109        |
| 7.8 Literal pools.....                                                 | 110        |
| 7.9 Load addresses into registers.....                                 | 111        |
| 7.10 Load addresses to a register using ADR.....                       | 112        |
| 7.11 Load addresses to a register using ADRL.....                      | 114        |
| 7.12 Load addresses to a register using LDR Rd, =label.....            | 114        |
| 7.13 Other ways to load and store registers.....                       | 116        |
| 7.14 Load and store multiple register instructions.....                | 117        |
| 7.15 Load and store multiple register instructions in A32 and T32..... | 117        |
| 7.16 Stack implementation using LDM and STM.....                       | 119        |
| 7.17 Stack operations for nested subroutines.....                      | 120        |

|                                                                                        |            |
|----------------------------------------------------------------------------------------|------------|
| 7.18 Block copy with LDM and STM.....                                                  | 121        |
| 7.19 Memory accesses.....                                                              | 122        |
| 7.20 The Read-Modify-Write operation.....                                              | 123        |
| 7.21 Optional hash with immediate constants.....                                       | 124        |
| 7.22 Use of macros.....                                                                | 124        |
| 7.23 Test-and-branch macro example.....                                                | 125        |
| 7.24 Unsigned integer division macro example.....                                      | 126        |
| 7.25 Instruction and directive relocations.....                                        | 127        |
| 7.26 Symbol versions.....                                                              | 129        |
| 7.27 Frame directives.....                                                             | 130        |
| 7.28 Exception tables and Unwind tables.....                                           | 130        |
| <b>8. Condition Codes.....</b>                                                         | <b>132</b> |
| 8.1 Conditional instructions.....                                                      | 132        |
| 8.2 Conditional execution in A32 code.....                                             | 132        |
| 8.3 Conditional execution in T32 code.....                                             | 133        |
| 8.4 Conditional execution in A64 code.....                                             | 134        |
| 8.5 Condition flags.....                                                               | 134        |
| 8.6 Updates to the condition flags in A32/T32 code.....                                | 135        |
| 8.7 Updates to the condition flags in A64 code.....                                    | 136        |
| 8.8 Floating-point instructions that update the condition flags.....                   | 136        |
| 8.9 Carry flag.....                                                                    | 137        |
| 8.10 Overflow flag.....                                                                | 138        |
| 8.11 Condition code suffixes.....                                                      | 138        |
| 8.12 Condition code suffixes and related flags.....                                    | 139        |
| 8.13 Comparison of condition code meanings in integer and floating-point code.....     | 141        |
| 8.14 Benefits of using conditional execution in A32 and T32 code.....                  | 142        |
| 8.15 Example showing the benefits of conditional instructions in A32 and T32 code..... | 142        |
| 8.16 Optimization for execution speed.....                                             | 145        |
| <b>9. Using armasm.....</b>                                                            | <b>146</b> |
| 9.1 armasm command-line syntax.....                                                    | 146        |
| 9.2 Specify command-line options with an environment variable.....                     | 146        |
| 9.3 Using stdin to input source code to the assembler.....                             | 147        |
| 9.4 Built-in variables and constants.....                                              | 148        |
| 9.5 Identifying versions of armasm in source code.....                                 | 152        |
| 9.6 Diagnostic messages.....                                                           | 152        |

|                                                                                |            |
|--------------------------------------------------------------------------------|------------|
| 9.7 Interlocks diagnostics.....                                                | 153        |
| 9.8 Automatic IT block generation in T32 code.....                             | 153        |
| 9.9 T32 branch target alignment.....                                           | 154        |
| 9.10 T32 code size diagnostics.....                                            | 154        |
| 9.11 A32 and T32 instruction portability diagnostics.....                      | 154        |
| 9.12 T32 instruction width diagnostics.....                                    | 155        |
| 9.13 Two pass assembler diagnostics.....                                       | 155        |
| 9.14 Using the C preprocessor.....                                             | 156        |
| 9.15 Address alignment in A32/T32 code.....                                    | 157        |
| 9.16 Address alignment in A64 code.....                                        | 158        |
| 9.17 Instruction width selection in T32 code.....                              | 158        |
| <b>10. Advanced SIMD Programming.....</b>                                      | <b>160</b> |
| 10.1 Architecture support for Advanced SIMD.....                               | 160        |
| 10.2 Extension register bank mapping for Advanced SIMD in AArch32 state.....   | 160        |
| 10.3 Extension register bank mapping for Advanced SIMD in AArch64 state.....   | 162        |
| 10.4 Views of the Advanced SIMD register bank in AArch32 state.....            | 164        |
| 10.5 Views of the Advanced SIMD register bank in AArch64 state.....            | 164        |
| 10.6 Differences between A32/T32 and A64 Advanced SIMD instruction syntax..... | 165        |
| 10.7 Load values to Advanced SIMD registers.....                               | 166        |
| 10.8 Conditional execution of A32/T32 Advanced SIMD instructions.....          | 166        |
| 10.9 Floating-point exceptions for Advanced SIMD in A32/T32 instructions.....  | 167        |
| 10.10 Advanced SIMD data types in A32/T32 instructions.....                    | 168        |
| 10.11 Polynomial arithmetic over {0,1}.....                                    | 168        |
| 10.12 Advanced SIMD vectors.....                                               | 169        |
| 10.13 Normal, long, wide, and narrow Advanced SIMD instructions.....           | 170        |
| 10.14 Saturating Advanced SIMD instructions.....                               | 171        |
| 10.15 Advanced SIMD scalars.....                                               | 171        |
| 10.16 Extended notation extension for Advanced SIMD in A32/T32 code.....       | 172        |
| 10.17 Advanced SIMD system registers in AArch32 state.....                     | 173        |
| 10.18 Flush-to-zero mode in Advanced SIMD.....                                 | 173        |
| 10.19 When to use flush-to-zero mode in Advanced SIMD.....                     | 174        |
| 10.20 The effects of using flush-to-zero mode in Advanced SIMD.....            | 174        |
| 10.21 Advanced SIMD operations not affected by flush-to-zero mode.....         | 175        |
| <b>11. Floating-point Programming.....</b>                                     | <b>176</b> |
| 11.1 Architecture support for floating-point.....                              | 176        |

|                                                                                 |            |
|---------------------------------------------------------------------------------|------------|
| 11.2 Extension register bank mapping for floating-point in AArch32 state.....   | 176        |
| 11.3 Extension register bank mapping in AArch64 state.....                      | 178        |
| 11.4 Views of the floating-point extension register bank in AArch32 state.....  | 179        |
| 11.5 Views of the floating-point extension register bank in AArch64 state.....  | 179        |
| 11.6 Differences between A32/T32 and A64 floating-point instruction syntax..... | 180        |
| 11.7 Load values to floating-point registers.....                               | 180        |
| 11.8 Conditional execution of A32/T32 floating-point instructions.....          | 181        |
| 11.9 Floating-point exceptions for floating-point in A32/T32 instructions.....  | 181        |
| 11.10 Floating-point data types in A32/T32 instructions.....                    | 182        |
| 11.11 Extended notation extension for floating-point in A32/T32 code.....       | 183        |
| 11.12 Floating-point system registers in AArch32 state.....                     | 184        |
| 11.13 Flush-to-zero mode in floating-point.....                                 | 184        |
| 11.14 When to use flush-to-zero mode in floating-point.....                     | 185        |
| 11.15 The effects of using flush-to-zero mode in floating-point.....            | 185        |
| 11.16 Floating-point operations not affected by flush-to-zero mode.....         | 186        |
| <b>12. armasm Command-line Options.....</b>                                     | <b>187</b> |
| 12.1 --16.....                                                                  | 187        |
| 12.2 --32.....                                                                  | 187        |
| 12.3 --apcs=qualifier..qualifier.....                                           | 187        |
| 12.4 --arm.....                                                                 | 189        |
| 12.5 --arm_only.....                                                            | 190        |
| 12.6 --bi.....                                                                  | 190        |
| 12.7 --bigend.....                                                              | 190        |
| 12.8 --brief_diagnostics, --no_brief_diagnostics.....                           | 190        |
| 12.9 --checkreglist.....                                                        | 191        |
| 12.10 --cpreproc.....                                                           | 191        |
| 12.11 --cpreproc_opts=option[,option,...].....                                  | 192        |
| 12.12 --cpu=list.....                                                           | 193        |
| 12.13 --cpu=name.....                                                           | 193        |
| 12.14 --debug.....                                                              | 196        |
| 12.15 --depend=dependfile.....                                                  | 196        |
| 12.16 --depend_format=string.....                                               | 196        |
| 12.17 --diag_error=tag[,tag,...].....                                           | 197        |
| 12.18 --diag_remark=tag[,tag,...].....                                          | 198        |
| 12.19 --diag_style=arm ide gnu.....                                             | 198        |

|                                                   |     |
|---------------------------------------------------|-----|
| 12.20 --diag_suppress=tag[,tag,...]               | 199 |
| 12.21 --diag_warning=tag[,tag,...]                | 200 |
| 12.22 --dllexport_all                             | 201 |
| 12.23 --dwarf2                                    | 201 |
| 12.24 --dwarf3                                    | 201 |
| 12.25 --errors=errorfile                          | 202 |
| 12.26 --exceptions, --no_exceptions               | 202 |
| 12.27 --exceptions_unwind, --no_exceptions_unwind | 202 |
| 12.28 --execstack, --no_execstack                 | 203 |
| 12.29 --execute_only                              | 203 |
| 12.30 --fpemode=model                             | 204 |
| 12.31 --fpu=list                                  | 205 |
| 12.32 --fpu=name                                  | 205 |
| 12.33 -g                                          | 206 |
| 12.34 --help                                      | 206 |
| 12.35 -idir[dir, ...]                             | 206 |
| 12.36 --keep                                      | 207 |
| 12.37 --length=n                                  | 207 |
| 12.38 --li                                        | 207 |
| 12.39 --library_type=lib                          | 207 |
| 12.40 --list=file                                 | 208 |
| 12.41 --list=                                     | 208 |
| 12.42 --littleend                                 | 209 |
| 12.43 -m                                          | 209 |
| 12.44 --maxcache=n                                | 209 |
| 12.45 --md                                        | 209 |
| 12.46 --no_code_gen                               | 209 |
| 12.47 --no_esc                                    | 210 |
| 12.48 --no_hide_all                               | 210 |
| 12.49 --no_regs                                   | 210 |
| 12.50 --no_terse                                  | 210 |
| 12.51 --no_warn                                   | 211 |
| 12.52 -o filename                                 | 211 |
| 12.53 --pd                                        | 211 |
| 12.54 --predefine "directive"                     | 211 |
| 12.55 --reduce_paths, --no_reduce_paths           | 212 |

|                                                      |     |
|------------------------------------------------------|-----|
| 12.56 --regnames.....                                | 213 |
| 12.57 --report-if-not-wysiwyg.....                   | 213 |
| 12.58 --show_cmdline.....                            | 214 |
| 12.59 --thumb.....                                   | 214 |
| 12.60 --unaligned_access, --no_unaligned_access..... | 215 |
| 12.61 --unsafe.....                                  | 215 |
| 12.62 --untyped_local_labels.....                    | 215 |
| 12.63 --version_number.....                          | 216 |
| 12.64 --via=filename.....                            | 216 |
| 12.65 --vsn.....                                     | 217 |
| 12.66 --width=n.....                                 | 217 |
| 12.67 --xref.....                                    | 217 |

|                                                               |            |
|---------------------------------------------------------------|------------|
| <b>13. Symbols, Literals, Expressions, and Operators.....</b> | <b>219</b> |
| 13.1 Symbol naming rules.....                                 | 219        |
| 13.2 Variables.....                                           | 220        |
| 13.3 Numeric constants.....                                   | 221        |
| 13.4 Assembly time substitution of variables.....             | 221        |
| 13.5 Register-relative and PC-relative expressions.....       | 222        |
| 13.6 Labels.....                                              | 223        |
| 13.7 Labels for PC-relative addresses.....                    | 223        |
| 13.8 Labels for register-relative addresses.....              | 224        |
| 13.9 Labels for absolute addresses.....                       | 225        |
| 13.10 Numeric local labels.....                               | 225        |
| 13.11 Syntax of numeric local labels.....                     | 226        |
| 13.12 String expressions.....                                 | 227        |
| 13.13 String literals.....                                    | 228        |
| 13.14 Numeric expressions.....                                | 228        |
| 13.15 Syntax of numeric literals.....                         | 229        |
| 13.16 Syntax of floating-point literals.....                  | 230        |
| 13.17 Logical expressions.....                                | 231        |
| 13.18 Logical literals.....                                   | 232        |
| 13.19 Unary operators.....                                    | 232        |
| 13.20 Binary operators.....                                   | 233        |
| 13.21 Multiplicative operators.....                           | 234        |
| 13.22 String manipulation operators.....                      | 234        |

|                                                                              |            |
|------------------------------------------------------------------------------|------------|
| 13.23 Shift operators.....                                                   | 235        |
| 13.24 Addition, subtraction, and logical operators.....                      | 235        |
| 13.25 Relational operators.....                                              | 236        |
| 13.26 Boolean operators.....                                                 | 236        |
| 13.27 Operator precedence.....                                               | 237        |
| 13.28 Difference between operator precedence in assembly language and C..... | 238        |
| <b>14. A32 and T32 Instructions.....</b>                                     | <b>240</b> |
| 14.1 A32 and T32 instruction summary.....                                    | 240        |
| 14.2 Instruction width specifiers.....                                       | 244        |
| 14.3 Flexible second operand (Operand2).....                                 | 244        |
| 14.4 Syntax of Operand2 as a constant.....                                   | 245        |
| 14.5 Syntax of Operand2 as a register with optional shift.....               | 246        |
| 14.6 Shift operations.....                                                   | 247        |
| 14.7 Saturating instructions.....                                            | 251        |
| 14.8 ADC (A32).....                                                          | 252        |
| 14.9 ADD (A32).....                                                          | 253        |
| 14.10 ADR (PC-relative) (A32).....                                           | 256        |
| 14.11 ADR (register-relative) (A32).....                                     | 258        |
| 14.12 ADRL pseudo-instruction (A32).....                                     | 259        |
| 14.13 AND (A32).....                                                         | 260        |
| 14.14 ASR (A32).....                                                         | 262        |
| 14.15 B (A32).....                                                           | 264        |
| 14.16 BFC (A32).....                                                         | 265        |
| 14.17 BFI (A32).....                                                         | 266        |
| 14.18 BIC (A32).....                                                         | 267        |
| 14.19 BKPT (A32).....                                                        | 269        |
| 14.20 BL (A32).....                                                          | 270        |
| 14.21 BLX, BLXNS (A32).....                                                  | 271        |
| 14.22 BX, BXNS (A32).....                                                    | 273        |
| 14.23 BXJ (A32).....                                                         | 274        |
| 14.24 CBZ and CBNZ (A32).....                                                | 275        |
| 14.25 CDP and CDP2 (A32).....                                                | 277        |
| 14.26 CLREX (A32).....                                                       | 278        |
| 14.27 CLZ (A32).....                                                         | 278        |
| 14.28 CMP and CMN (A32).....                                                 | 280        |

|                                           |     |
|-------------------------------------------|-----|
| 14.29 CPS (A32).....                      | 281 |
| 14.30 CPY pseudo-instruction.....         | 283 |
| 14.31 CRC32 (A32).....                    | 284 |
| 14.32 CRC32C (A32).....                   | 285 |
| 14.33 DBG (A32).....                      | 286 |
| 14.34 DCPS1 (T32 instruction).....        | 287 |
| 14.35 DCPS2 (T32 instruction).....        | 288 |
| 14.36 DCPS3 (T32 instruction).....        | 288 |
| 14.37 DMB (A32).....                      | 289 |
| 14.38 DSB (A32).....                      | 291 |
| 14.39 EOR (A32).....                      | 294 |
| 14.40 ERET (A32).....                     | 295 |
| 14.41 ESB (A32).....                      | 296 |
| 14.42 HLT (A32).....                      | 297 |
| 14.43 HVC (A32).....                      | 298 |
| 14.44 ISB (A32).....                      | 299 |
| 14.45 IT (A32).....                       | 300 |
| 14.46 LDA (A32).....                      | 303 |
| 14.47 LDAEX (A32).....                    | 304 |
| 14.48 LDC and LDC2 (A32).....             | 306 |
| 14.49 LDM (A32).....                      | 307 |
| 14.50 LDR (immediate offset) (A32).....   | 310 |
| 14.51 LDR (PC-relative) (A32).....        | 312 |
| 14.52 LDR (register offset) (A32).....    | 315 |
| 14.53 LDR (register-relative) (A32).....  | 317 |
| 14.54 LDR pseudo-instruction (A32).....   | 319 |
| 14.55 LDR, unprivileged (A32).....        | 322 |
| 14.56 LDREX (A32).....                    | 323 |
| 14.57 LSL (A32).....                      | 325 |
| 14.58 LSR (A32).....                      | 327 |
| 14.59 MCR and MCR2 (A32).....             | 329 |
| 14.60 MCRR and MCRR2 (A32).....           | 330 |
| 14.61 MLA (A32).....                      | 331 |
| 14.62 MLS (A32).....                      | 332 |
| 14.63 MOV (A32).....                      | 333 |
| 14.64 MOV32 pseudo-instruction (A32)..... | 335 |

|                                                                                |     |
|--------------------------------------------------------------------------------|-----|
| 14.65 MOVT (A32).....                                                          | 337 |
| 14.66 MRC and MRC2 (A32).....                                                  | 338 |
| 14.67 MRRC and MRRC2 (A32).....                                                | 339 |
| 14.68 MRS (PSR to general-purpose register) (A32).....                         | 340 |
| 14.69 MRS (system coprocessor register to general-purpose register) (A32)..... | 342 |
| 14.70 MSR (general-purpose register to system coprocessor register) (A32)..... | 343 |
| 14.71 MSR (general-purpose register to PSR) (A32).....                         | 344 |
| 14.72 MUL (A32).....                                                           | 346 |
| 14.73 MVN (A32).....                                                           | 347 |
| 14.74 NEG pseudo-instruction (A32).....                                        | 349 |
| 14.75 NOP (A32).....                                                           | 350 |
| 14.76 ORN (T32 only).....                                                      | 351 |
| 14.77 ORR (A32).....                                                           | 352 |
| 14.78 PKHBT and PKHTB (A32).....                                               | 354 |
| 14.79 PLD, PLDW, and PLI (A32).....                                            | 355 |
| 14.80 POP (A32).....                                                           | 357 |
| 14.81 PUSH (A32).....                                                          | 359 |
| 14.82 QADD (A32).....                                                          | 360 |
| 14.83 QADD8 (A32).....                                                         | 361 |
| 14.84 QADD16 (A32).....                                                        | 362 |
| 14.85 QASX (A32).....                                                          | 363 |
| 14.86 QDADD (A32).....                                                         | 364 |
| 14.87 QDSUB (A32).....                                                         | 365 |
| 14.88 QSAX (A32).....                                                          | 366 |
| 14.89 QSUB (A32).....                                                          | 367 |
| 14.90 QSUB8 (A32).....                                                         | 368 |
| 14.91 QSUB16 (A32).....                                                        | 369 |
| 14.92 RBIT (A32).....                                                          | 370 |
| 14.93 REV (A32).....                                                           | 371 |
| 14.94 REV16 (A32).....                                                         | 372 |
| 14.95 REVSH (A32).....                                                         | 373 |
| 14.96 RFE (A32).....                                                           | 374 |
| 14.97 ROR (A32).....                                                           | 376 |
| 14.98 RRX (A32).....                                                           | 378 |
| 14.99 RSB (A32).....                                                           | 379 |
| 14.100 RSC (A32).....                                                          | 380 |

|                           |     |
|---------------------------|-----|
| 14.101 SADD8 (A32).....   | 382 |
| 14.102 SADD16 (A32).....  | 383 |
| 14.103 SASX (A32).....    | 384 |
| 14.104 SBC (A32).....     | 386 |
| 14.105 SBFX (A32).....    | 387 |
| 14.106 SDIV (A32).....    | 388 |
| 14.107 SEL (A32).....     | 389 |
| 14.108 SETEND (A32).....  | 391 |
| 14.109 SETPAN (A32).....  | 392 |
| 14.110 SEV (A32).....     | 392 |
| 14.111 SEVL (A32).....    | 393 |
| 14.112 SG (A32).....      | 394 |
| 14.113 SHADD8 (A32).....  | 394 |
| 14.114 SHADD16 (A32)..... | 395 |
| 14.115 SHASX (A32).....   | 396 |
| 14.116 SHSAX (A32).....   | 397 |
| 14.117 SHSUB8 (A32).....  | 398 |
| 14.118 SHSUB16 (A32)..... | 399 |
| 14.119 SMC (A32).....     | 400 |
| 14.120 SMLAx.....         | 400 |
| 14.121 SMLAD (A32).....   | 402 |
| 14.122 SMLAL (A32).....   | 403 |
| 14.123 SMLALD (A32).....  | 404 |
| 14.124 SMLALxy.....       | 405 |
| 14.125 SMLAWy.....        | 406 |
| 14.126 SMLSD (A32).....   | 407 |
| 14.127 SMLS LD (A32)..... | 409 |
| 14.128 SMMLA (A32).....   | 410 |
| 14.129 SMMLS (A32).....   | 411 |
| 14.130 SMMUL (A32).....   | 412 |
| 14.131 SMUAD (A32).....   | 413 |
| 14.132 SMULxy.....        | 414 |
| 14.133 SMULL (A32).....   | 415 |
| 14.134 SMULWy.....        | 416 |
| 14.135 SMUSD (A32).....   | 417 |
| 14.136 SRS (A32).....     | 418 |

|                                          |     |
|------------------------------------------|-----|
| 14.137 SSAT (A32).....                   | 420 |
| 14.138 SSAT16 (A32).....                 | 421 |
| 14.139 SSAX (A32).....                   | 422 |
| 14.140 SSUB8 (A32).....                  | 424 |
| 14.141 SSUB16 (A32).....                 | 425 |
| 14.142 STC and STC2 (A32).....           | 426 |
| 14.143 STL (A32).....                    | 428 |
| 14.144 STLEX (A32).....                  | 429 |
| 14.145 STM (A32).....                    | 431 |
| 14.146 STR (immediate offset) (A32)..... | 434 |
| 14.147 STR (register offset) (A32).....  | 436 |
| 14.148 STR, unprivileged (A32).....      | 438 |
| 14.149 STREX (A32).....                  | 440 |
| 14.150 SUB (A32).....                    | 442 |
| 14.151 SUBS pc, lr (A32).....            | 444 |
| 14.152 SVC (A32).....                    | 446 |
| 14.153 SWP and SWPB (A32).....           | 447 |
| 14.154 SXTAB (A32).....                  | 448 |
| 14.155 SXTAB16 (A32).....                | 449 |
| 14.156 SXTAH (A32).....                  | 451 |
| 14.157 SXTB (A32).....                   | 452 |
| 14.158 SXTB16 (A32).....                 | 453 |
| 14.159 SXTH (A32).....                   | 454 |
| 14.160 SYS (A32).....                    | 456 |
| 14.161 TBB and TBH (A32).....            | 457 |
| 14.162 TEQ (A32).....                    | 458 |
| 14.163 TST (A32).....                    | 459 |
| 14.164 TT, TTT, TTA, TTAT (A32).....     | 460 |
| 14.165 UADD8 (A32).....                  | 463 |
| 14.166 UADD16 (A32).....                 | 464 |
| 14.167 UASX (A32).....                   | 466 |
| 14.168 UBFX (A32).....                   | 467 |
| 14.169 UDF (A32).....                    | 468 |
| 14.170 UDIV (A32).....                   | 469 |
| 14.171 UHADD8 (A32).....                 | 470 |
| 14.172 UHADD16 (A32).....                | 471 |

|                                                                                      |            |
|--------------------------------------------------------------------------------------|------------|
| 14.173 UHASX (A32).....                                                              | 472        |
| 14.174 UHSAX (A32).....                                                              | 473        |
| 14.175 UHSUB8 (A32).....                                                             | 474        |
| 14.176 UHSUB16 (A32).....                                                            | 475        |
| 14.177 UMAAL (A32).....                                                              | 476        |
| 14.178 UMLAL (A32).....                                                              | 477        |
| 14.179 UMULL (A32).....                                                              | 478        |
| 14.180 UND pseudo-instruction (A32).....                                             | 479        |
| 14.181 UQADD8 (A32).....                                                             | 480        |
| 14.182 UQADD16 (A32).....                                                            | 481        |
| 14.183 UQASX (A32).....                                                              | 482        |
| 14.184 UQSAX (A32).....                                                              | 483        |
| 14.185 UQSUB8 (A32).....                                                             | 484        |
| 14.186 UQSUB16 (A32).....                                                            | 485        |
| 14.187 USAD8 (A32).....                                                              | 486        |
| 14.188 USADA8 (A32).....                                                             | 487        |
| 14.189 USAT (A32).....                                                               | 488        |
| 14.190 USAT16 (A32).....                                                             | 489        |
| 14.191 USAX (A32).....                                                               | 490        |
| 14.192 USUB8 (A32).....                                                              | 492        |
| 14.193 USUB16 (A32).....                                                             | 493        |
| 14.194 UXTAB (A32).....                                                              | 494        |
| 14.195 UXTAB16 (A32).....                                                            | 495        |
| 14.196 UXTAH (A32).....                                                              | 497        |
| 14.197 UXTB (A32).....                                                               | 498        |
| 14.198 UXTB16 (A32).....                                                             | 499        |
| 14.199 UXTH (A32).....                                                               | 501        |
| 14.200 WFE (A32).....                                                                | 502        |
| 14.201 WFI (A32).....                                                                | 503        |
| 14.202 YIELD (A32).....                                                              | 504        |
| <b>15. Advanced SIMD Instructions (32-bit).....</b>                                  | <b>505</b> |
| 15.1 Summary of Advanced SIMD instructions.....                                      | 505        |
| 15.2 Summary of shared Advanced SIMD and floating-point instructions.....            | 507        |
| 15.3 Cryptographic instructions.....                                                 | 508        |
| 15.4 Interleaving provided by load and store element and structure instructions..... | 509        |

|                                                                                       |     |
|---------------------------------------------------------------------------------------|-----|
| 15.5 Alignment restrictions in load and store element and structure instructions..... | 509 |
| 15.6 FLDMDBX, FLDMIAX (A32).....                                                      | 510 |
| 15.7 FSTMDBX, FSTMIAX (A32).....                                                      | 511 |
| 15.8 VABA and VABAL (A32).....                                                        | 512 |
| 15.9 VABD and VABDL (A32).....                                                        | 513 |
| 15.10 VABS (A32).....                                                                 | 514 |
| 15.11 VACLE, VACLT, VACGE and VACGT (A32).....                                        | 514 |
| 15.12 VADD (A32).....                                                                 | 515 |
| 15.13 VADDHN (A32).....                                                               | 516 |
| 15.14 VADDL and VADDW (A32).....                                                      | 517 |
| 15.15 VAND (immediate) (A32).....                                                     | 518 |
| 15.16 VAND (register) (A32).....                                                      | 519 |
| 15.17 VBIC (immediate) (A32).....                                                     | 519 |
| 15.18 VBIC (register) (A32).....                                                      | 520 |
| 15.19 VBIF (A32).....                                                                 | 521 |
| 15.20 VBIT (A32).....                                                                 | 522 |
| 15.21 VBSL (A32).....                                                                 | 522 |
| 15.22 VCADD (A32).....                                                                | 523 |
| 15.23 VCEQ (immediate #0) (A32).....                                                  | 524 |
| 15.24 VCEQ (register) (A32).....                                                      | 525 |
| 15.25 VCGE (immediate #0) (A32).....                                                  | 526 |
| 15.26 VCGE (register) (A32).....                                                      | 527 |
| 15.27 VCGT (immediate #0) (A32).....                                                  | 528 |
| 15.28 VCGT (register) (A32).....                                                      | 528 |
| 15.29 VCLE (immediate #0) (A32).....                                                  | 529 |
| 15.30 VCLS (A32).....                                                                 | 530 |
| 15.31 VCLE (register) (A32).....                                                      | 531 |
| 15.32 VCLT (immediate #0) (A32).....                                                  | 532 |
| 15.33 VCLT (register) (A32).....                                                      | 533 |
| 15.34 VCLZ (A32).....                                                                 | 534 |
| 15.35 VCMLA (A32).....                                                                | 534 |
| 15.36 VCMLA (by element) (A32).....                                                   | 535 |
| 15.37 VCNT (A32).....                                                                 | 536 |
| 15.38 VCVT (between fixed-point or integer, and floating-point) (A32).....            | 537 |
| 15.39 VCVT (between half-precision and single-precision floating-point) (A32).....    | 538 |
| 15.40 VCVT (from floating-point to integer with directed rounding modes) (A32).....   | 539 |

|                                                                                              |     |
|----------------------------------------------------------------------------------------------|-----|
| 15.41 VCVTB, VCVTT (between half-precision and double-precision) (A32).....                  | 540 |
| 15.42 VDUP (A32).....                                                                        | 541 |
| 15.43 VEOR (A32).....                                                                        | 542 |
| 15.44 VEXT (A32).....                                                                        | 543 |
| 15.45 VFMA, VFMS (A32).....                                                                  | 544 |
| 15.46 VHADD (A32).....                                                                       | 545 |
| 15.47 VHSUB (A32).....                                                                       | 545 |
| 15.48 VLDn (single n-element structure to one lane) (A32).....                               | 546 |
| 15.49 VLDn (single n-element structure to all lanes) (A32).....                              | 548 |
| 15.50 VLDn (multiple n-element structures) (A32).....                                        | 549 |
| 15.51 VLDM (A32).....                                                                        | 551 |
| 15.52 VLDR (A32).....                                                                        | 552 |
| 15.53 VLDR (post-increment and pre-decrement) (A32).....                                     | 553 |
| 15.54 VLDR pseudo-instruction (A32).....                                                     | 554 |
| 15.55 VMAX and VMIN (A32).....                                                               | 555 |
| 15.56 VMAXNM, VMINNM (A32).....                                                              | 556 |
| 15.57 VMLA (A32).....                                                                        | 557 |
| 15.58 VMLA (by scalar) (A32).....                                                            | 557 |
| 15.59 VMLAL (by scalar) (A32).....                                                           | 558 |
| 15.60 VMLAL (A32).....                                                                       | 559 |
| 15.61 VMLS (by scalar) (A32).....                                                            | 559 |
| 15.62 VMLS (A32).....                                                                        | 560 |
| 15.63 VMLSL (A32).....                                                                       | 561 |
| 15.64 VMLSL (by scalar) (A32).....                                                           | 561 |
| 15.65 VMOV (immediate) (A32).....                                                            | 562 |
| 15.66 VMOV (register) (A32).....                                                             | 563 |
| 15.67 VMOV (between two general-purpose registers and a 64-bit extension register) (A32).... | 563 |
| 15.68 VMOV (between a general-purpose register and an Advanced SIMD scalar) (A32).....       | 564 |
| 15.69 VMOVL (A32).....                                                                       | 565 |
| 15.70 VMOVN (A32).....                                                                       | 565 |
| 15.71 VMOV2 (A32).....                                                                       | 566 |
| 15.72 VMRS (A32).....                                                                        | 567 |
| 15.73 VMSR (A32).....                                                                        | 568 |
| 15.74 VMUL (A32).....                                                                        | 569 |
| 15.75 VMUL (by scalar) (A32).....                                                            | 569 |
| 15.76 VMULL (A32).....                                                                       | 570 |

|                                                               |     |
|---------------------------------------------------------------|-----|
| 15.77 VMULL (by scalar) (A32).....                            | 571 |
| 15.78 VMVN (register) (A32).....                              | 571 |
| 15.79 VMVN (immediate) (A32).....                             | 572 |
| 15.80 VNEG (A32).....                                         | 573 |
| 15.81 VORN (register) (A32).....                              | 573 |
| 15.82 VORN (immediate) (A32).....                             | 574 |
| 15.83 VORR (register) (A32).....                              | 575 |
| 15.84 VORR (immediate) (A32).....                             | 576 |
| 15.85 VPADAL (A32).....                                       | 577 |
| 15.86 VPADD (A32).....                                        | 578 |
| 15.87 VPADDL (A32).....                                       | 579 |
| 15.88 VPMAX and VPMIN (A32).....                              | 580 |
| 15.89 VPOP (A32).....                                         | 581 |
| 15.90 VPUSH (A32).....                                        | 581 |
| 15.91 VQABS (A32).....                                        | 582 |
| 15.92 VQADD (A32).....                                        | 583 |
| 15.93 VQDMLAL and VQDMLSL (by vector or by scalar) (A32)..... | 583 |
| 15.94 VQDMULH (by vector or by scalar) (A32).....             | 584 |
| 15.95 VQDMULL (by vector or by scalar) (A32).....             | 585 |
| 15.96 VQMOVN and VQMOVUN (A32).....                           | 586 |
| 15.97 VQNEG (A32).....                                        | 587 |
| 15.98 VQRDMULH (by vector or by scalar) (A32).....            | 588 |
| 15.99 VQRSHL (by signed variable) (A32).....                  | 589 |
| 15.100 VQRSHRN and VQRSHRUN (by immediate) (A32).....         | 589 |
| 15.101 VQSHL (by signed variable) (A32).....                  | 590 |
| 15.102 VQSHL and VQSHLU (by immediate) (A32).....             | 591 |
| 15.103 VQSHRN and VQSHRUN (by immediate) (A32).....           | 592 |
| 15.104 VQSUB (A32).....                                       | 593 |
| 15.105 VRADDHN (A32).....                                     | 594 |
| 15.106 VRECPE (A32).....                                      | 595 |
| 15.107 VRECPS (A32).....                                      | 596 |
| 15.108 VREV16, VREV32, and VREV64 (A32).....                  | 597 |
| 15.109 VRHADD (A32).....                                      | 598 |
| 15.110 VRSHL (by signed variable) (A32).....                  | 598 |
| 15.111 VRSHR (by immediate) (A32).....                        | 599 |
| 15.112 VRSHRN (by immediate) (A32).....                       | 600 |

|                                                                                       |            |
|---------------------------------------------------------------------------------------|------------|
| 15.113 VRINT (A32).....                                                               | 601        |
| 15.114 VRSQRTE (A32).....                                                             | 602        |
| 15.115 VRSQRTS (A32).....                                                             | 603        |
| 15.116 VRSRA (by immediate) (A32).....                                                | 604        |
| 15.117 VRSUBHN (A32).....                                                             | 605        |
| 15.118 VSHL (by immediate) (A32).....                                                 | 606        |
| 15.119 VSHL (by signed variable) (A32).....                                           | 607        |
| 15.120 VSHLL (by immediate) (A32).....                                                | 608        |
| 15.121 VSHR (by immediate) (A32).....                                                 | 608        |
| 15.122 VSHRN (by immediate) (A32).....                                                | 609        |
| 15.123 VS LI (A32).....                                                               | 610        |
| 15.124 VS RA (by immediate) (A32).....                                                | 611        |
| 15.125 VS RI (A32).....                                                               | 612        |
| 15.126 VSTM (A32).....                                                                | 613        |
| 15.127 VSTn (multiple n-element structures) (A32).....                                | 614        |
| 15.128 VSTn (single n-element structure to one lane) (A32).....                       | 616        |
| 15.129 VSTR (A32).....                                                                | 617        |
| 15.130 VSTR (post-increment and pre-decrement) (A32).....                             | 618        |
| 15.131 VSUB (A32).....                                                                | 619        |
| 15.132 VSUBHN (A32).....                                                              | 620        |
| 15.133 VSUBL and VSUBW (A32).....                                                     | 620        |
| 15.134 VS WP (A32).....                                                               | 621        |
| 15.135 VTBL and VT BX (A32).....                                                      | 622        |
| 15.136 VTRN (A32).....                                                                | 623        |
| 15.137 VTST (A32).....                                                                | 624        |
| 15.138 VUZP (A32).....                                                                | 624        |
| 15.139 VZIP (A32).....                                                                | 625        |
| <b>16. Floating-point Instructions (32-bit).....</b>                                  | <b>627</b> |
| 16.1 Summary of floating-point instructions.....                                      | 627        |
| 16.2 VABS (floating-point) (A32).....                                                 | 628        |
| 16.3 VADD (floating-point) (A32).....                                                 | 629        |
| 16.4 VCMP, VCMPE (A32).....                                                           | 629        |
| 16.5 VCVT (between single-precision and double-precision) (A32).....                  | 630        |
| 16.6 VCVT (between floating-point and integer) (A32).....                             | 631        |
| 16.7 VCVT (from floating-point to integer with directed rounding modes) (A32 FP)..... | 632        |

|                                                                                                                |     |
|----------------------------------------------------------------------------------------------------------------|-----|
| 16.8 VCVT (between floating-point and fixed-point) (A32).....                                                  | 633 |
| 16.9 VCVTB, VCVTT (half-precision extension) (A32).....                                                        | 634 |
| 16.10 VCVTB, VCVTT (between half-precision and double-precision) (A32 FP).....                                 | 635 |
| 16.11 VDIV (A32).....                                                                                          | 636 |
| 16.12 VFMA, VFMS, VFNMA, VFNMS (floating-point) (A32).....                                                     | 637 |
| 16.13 VJCVT (A32).....                                                                                         | 638 |
| 16.14 VLDM (floating-point) (A32).....                                                                         | 639 |
| 16.15 VLDR (floating-point) (A32).....                                                                         | 640 |
| 16.16 VLDR (post-increment and pre-decrement, floating-point) (A32).....                                       | 641 |
| 16.17 VLDR pseudo-instruction (floating-point) (A32).....                                                      | 642 |
| 16.18 VLLDM (A32).....                                                                                         | 643 |
| 16.19 VLSTM (A32).....                                                                                         | 644 |
| 16.20 VMAXNM, VMINNM (floating-point) (A32).....                                                               | 645 |
| 16.21 VMLA (floating-point) (A32).....                                                                         | 646 |
| 16.22 VMLS (floating-point) (A32).....                                                                         | 646 |
| 16.23 VMOV (floating-point) (A32).....                                                                         | 647 |
| 16.24 VMOV (between one general-purpose register and single precision floating-point register) (A32).....      | 648 |
| 16.25 VMOV (between two general-purpose registers and one or two extension registers) (A32).....               | 649 |
| 16.26 VMOV (between a general-purpose register and half a double precision floating-point register) (A32)..... | 650 |
| 16.27 VMRS (floating-point) (A32).....                                                                         | 650 |
| 16.28 VMSR (floating-point) (A32).....                                                                         | 651 |
| 16.29 VMUL (floating-point) (A32).....                                                                         | 652 |
| 16.30 VNNEG (floating-point) (A32).....                                                                        | 653 |
| 16.31 VNMLA (floating-point) (A32).....                                                                        | 653 |
| 16.32 VNMLS (floating-point) (A32).....                                                                        | 654 |
| 16.33 VNMUL (floating-point) (A32).....                                                                        | 655 |
| 16.34 VPOP (floating-point) (A32).....                                                                         | 655 |
| 16.35 VPUSH (floating-point) (A32).....                                                                        | 656 |
| 16.36 VRINT (floating-point) (A32).....                                                                        | 657 |
| 16.37 VSEL (A32).....                                                                                          | 658 |
| 16.38 VSQRT (A32).....                                                                                         | 659 |
| 16.39 VSTM (floating-point) (A32).....                                                                         | 659 |
| 16.40 VSTR (floating-point) (A32).....                                                                         | 660 |
| 16.41 VSTR (post-increment and pre-decrement, floating-point) (A32).....                                       | 661 |

|                                                            |            |
|------------------------------------------------------------|------------|
| 16.42 VSUB (floating-point) (A32).....                     | 662        |
| <b>17. A64 General Instructions.....</b>                   | <b>664</b> |
| 17.1 A64 instructions in alphabetical order.....           | 664        |
| 17.2 Register restrictions for A64 instructions.....       | 669        |
| 17.3 ADC (A64).....                                        | 669        |
| 17.4 ADCS (A64).....                                       | 670        |
| 17.5 ADD (extended register) (A64).....                    | 671        |
| 17.6 ADD (immediate) (A64).....                            | 672        |
| 17.7 ADD (shifted register) (A64).....                     | 673        |
| 17.8 ADDS (extended register) (A64).....                   | 674        |
| 17.9 ADDS (immediate) (A64).....                           | 676        |
| 17.10 ADDS (shifted register) (A64).....                   | 677        |
| 17.11 ADR (A64).....                                       | 678        |
| 17.12 ADRL pseudo-instruction (A64).....                   | 678        |
| 17.13 ADRP (A64).....                                      | 679        |
| 17.14 AND (immediate) (A64).....                           | 680        |
| 17.15 AND (shifted register) (A64).....                    | 680        |
| 17.16 ANDS (immediate) (A64).....                          | 681        |
| 17.17 ANDS (shifted register) (A64).....                   | 682        |
| 17.18 ASR (register) (A64).....                            | 683        |
| 17.19 ASR (immediate) (A64).....                           | 684        |
| 17.20 ASRV (A64).....                                      | 685        |
| 17.21 AT (A64).....                                        | 686        |
| 17.22 AUTDA, AUTDZA (A64).....                             | 687        |
| 17.23 AUTDB, AUTDZB (A64).....                             | 688        |
| 17.24 AUTIA, AUTIZA, AUTIA1716, AUTIASP, AUTIAZ (A64)..... | 689        |
| 17.25 AUTIB, AUTIZB, AUTIB1716, AUTIBSP, AUTIBZ (A64)..... | 690        |
| 17.26 B.cond.....                                          | 691        |
| 17.27 B (A64).....                                         | 692        |
| 17.28 BFC (A64).....                                       | 692        |
| 17.29 BFI (A64).....                                       | 693        |
| 17.30 BFM (A64).....                                       | 694        |
| 17.31 BFXIL (A64).....                                     | 695        |
| 17.32 BIC (shifted register) (A64).....                    | 696        |
| 17.33 BICS (shifted register) (A64).....                   | 697        |

|                                                     |     |
|-----------------------------------------------------|-----|
| 17.34 BL (A64).....                                 | 698 |
| 17.35 BLR (A64).....                                | 699 |
| 17.36 BLRAA, BLRAAZ, BLRAB, BLRABZ (A64).....       | 699 |
| 17.37 BR (A64).....                                 | 700 |
| 17.38 BRAA, BRAAZ, BRAB, BRABZ (A64).....           | 701 |
| 17.39 BRK (A64).....                                | 702 |
| 17.40 CBNZ (A64).....                               | 702 |
| 17.41 CBZ (A64).....                                | 703 |
| 17.42 CCMN (immediate) (A64).....                   | 703 |
| 17.43 CCMN (register) (A64).....                    | 704 |
| 17.44 CCMP (immediate) (A64).....                   | 705 |
| 17.45 CCMP (register) (A64).....                    | 706 |
| 17.46 CINC (A64).....                               | 707 |
| 17.47 CINV (A64).....                               | 708 |
| 17.48 CLREX (A64).....                              | 708 |
| 17.49 CLS (A64).....                                | 709 |
| 17.50 CLZ (A64).....                                | 709 |
| 17.51 CMN (extended register) (A64).....            | 710 |
| 17.52 CMN (immediate) (A64).....                    | 712 |
| 17.53 CMN (shifted register) (A64).....             | 712 |
| 17.54 CMP (extended register) (A64).....            | 713 |
| 17.55 CMP (immediate) (A64).....                    | 715 |
| 17.56 CMP (shifted register) (A64).....             | 716 |
| 17.57 CNEG (A64).....                               | 717 |
| 17.58 CRC32B, CRC32H, CRC32W, CRC32X (A64).....     | 718 |
| 17.59 CRC32CB, CRC32CH, CRC32CW, CRC32CX (A64)..... | 719 |
| 17.60 CSEL (A64).....                               | 720 |
| 17.61 CSET (A64).....                               | 720 |
| 17.62 CSETM (A64).....                              | 721 |
| 17.63 CSINC (A64).....                              | 722 |
| 17.64 CSINV (A64).....                              | 723 |
| 17.65 CSNEG (A64).....                              | 724 |
| 17.66 DC (A64).....                                 | 725 |
| 17.67 DCPS1 (A64).....                              | 726 |
| 17.68 DCPS2 (A64).....                              | 727 |
| 17.69 DCPS3 (A64).....                              | 728 |

|                                                |     |
|------------------------------------------------|-----|
| 17.70 DMB (A64).....                           | 729 |
| 17.71 DRPS (A64).....                          | 730 |
| 17.72 DSB (A64).....                           | 730 |
| 17.73 EON (shifted register) (A64).....        | 732 |
| 17.74 EOR (immediate) (A64).....               | 733 |
| 17.75 EOR (shifted register) (A64).....        | 734 |
| 17.76 ERET (A64).....                          | 735 |
| 17.77 ERETA, ERETAB (A64).....                 | 735 |
| 17.78 ESB (A64).....                           | 736 |
| 17.79 EXTR (A64).....                          | 736 |
| 17.80 HINT (A64).....                          | 737 |
| 17.81 HLT (A64).....                           | 738 |
| 17.82 HVC (A64).....                           | 739 |
| 17.83 IC (A64).....                            | 739 |
| 17.84 ISB (A64).....                           | 740 |
| 17.85 LSL (register) (A64).....                | 741 |
| 17.86 LSL (immediate) (A64).....               | 742 |
| 17.87 LSLV (A64).....                          | 743 |
| 17.88 LSR (register) (A64).....                | 744 |
| 17.89 LSR (immediate) (A64).....               | 745 |
| 17.90 LSRV (A64).....                          | 746 |
| 17.91 MADD (A64).....                          | 747 |
| 17.92 MNEG (A64).....                          | 748 |
| 17.93 MOV (to or from SP) (A64).....           | 748 |
| 17.94 MOV (inverted wide immediate) (A64)..... | 749 |
| 17.95 MOV (wide immediate) (A64).....          | 750 |
| 17.96 MOV (bitmask immediate) (A64).....       | 751 |
| 17.97 MOV (register) (A64).....                | 751 |
| 17.98 MOVK (A64).....                          | 752 |
| 17.99 MOVL pseudo-instruction (A64).....       | 753 |
| 17.100 MOVN (A64).....                         | 754 |
| 17.101 MOVZ (A64).....                         | 755 |
| 17.102 MRS (A64).....                          | 756 |
| 17.103 MSR (immediate) (A64).....              | 757 |
| 17.104 MSR (register) (A64).....               | 757 |
| 17.105 MSUB (A64).....                         | 758 |

|                                                             |     |
|-------------------------------------------------------------|-----|
| 17.106 MUL (A64).....                                       | 759 |
| 17.107 MVN (A64).....                                       | 760 |
| 17.108 NEG (shifted register) (A64).....                    | 761 |
| 17.109 NEGS (A64).....                                      | 762 |
| 17.110 NGC (A64).....                                       | 763 |
| 17.111 NGCS (A64).....                                      | 763 |
| 17.112 NOP (A64).....                                       | 764 |
| 17.113 ORN (shifted register) (A64).....                    | 765 |
| 17.114 ORR (immediate) (A64).....                           | 766 |
| 17.115 ORR (shifted register) (A64).....                    | 766 |
| 17.116 PACDA, PACDZA (A64).....                             | 768 |
| 17.117 PACDB, PACDZB (A64).....                             | 768 |
| 17.118 PACGA (A64).....                                     | 769 |
| 17.119 PACIA, PACIZA, PACIA1716, PACIASP, PACIAZ (A64)..... | 770 |
| 17.120 PACIB, PACIZB, PACIB1716, PACIBSP, PACIBZ (A64)..... | 771 |
| 17.121 PSB (A64).....                                       | 772 |
| 17.122 RBIT (A64).....                                      | 772 |
| 17.123 RET (A64).....                                       | 773 |
| 17.124 RETAA, RETAB (A64).....                              | 773 |
| 17.125 REV16 (A64).....                                     | 774 |
| 17.126 REV32 (A64).....                                     | 774 |
| 17.127 REV64 (A64).....                                     | 775 |
| 17.128 REV (A64).....                                       | 776 |
| 17.129 ROR (immediate) (A64).....                           | 776 |
| 17.130 ROR (register) (A64).....                            | 777 |
| 17.131 RORV (A64).....                                      | 778 |
| 17.132 SBC (A64).....                                       | 779 |
| 17.133 SBCS (A64).....                                      | 780 |
| 17.134 SBFIZ (A64).....                                     | 781 |
| 17.135 SBFM (A64).....                                      | 782 |
| 17.136 SBFX (A64).....                                      | 783 |
| 17.137 SDIV (A64).....                                      | 784 |
| 17.138 SEV (A64).....                                       | 785 |
| 17.139 SEVL (A64).....                                      | 785 |
| 17.140 SMADDL (A64).....                                    | 786 |
| 17.141 SMC (A64).....                                       | 786 |

|                                            |     |
|--------------------------------------------|-----|
| 17.142 SMNEGL (A64).....                   | 787 |
| 17.143 SMSUBL (A64).....                   | 788 |
| 17.144 SMULH (A64).....                    | 789 |
| 17.145 SMULL (A64).....                    | 789 |
| 17.146 SUB (extended register) (A64).....  | 790 |
| 17.147 SUB (immediate) (A64).....          | 791 |
| 17.148 SUB (shifted register) (A64).....   | 792 |
| 17.149 SUBS (extended register) (A64)..... | 793 |
| 17.150 SUBS (immediate) (A64).....         | 795 |
| 17.151 SUBS (shifted register) (A64).....  | 796 |
| 17.152 SVC (A64).....                      | 797 |
| 17.153 SXTB (A64).....                     | 798 |
| 17.154 SXTH (A64).....                     | 798 |
| 17.155 SXTW (A64).....                     | 799 |
| 17.156 SYS (A64).....                      | 800 |
| 17.157 SYSL (A64).....                     | 801 |
| 17.158 TBNZ (A64).....                     | 801 |
| 17.159 TBZ (A64).....                      | 802 |
| 17.160 TLBI (A64).....                     | 803 |
| 17.161 TST (immediate) (A64).....          | 804 |
| 17.162 TST (shifted register) (A64).....   | 805 |
| 17.163 UBFIZ (A64).....                    | 806 |
| 17.164 UBFM (A64).....                     | 807 |
| 17.165 UBFX (A64).....                     | 808 |
| 17.166 UDIV (A64).....                     | 809 |
| 17.167 UMADDL (A64).....                   | 810 |
| 17.168 UMNEGL (A64).....                   | 811 |
| 17.169 UMSUBL (A64).....                   | 812 |
| 17.170 UMULH (A64).....                    | 812 |
| 17.171 UMULL (A64).....                    | 813 |
| 17.172 UXTB (A64).....                     | 814 |
| 17.173 UXTH (A64).....                     | 814 |
| 17.174 WFE (A64).....                      | 815 |
| 17.175 WFI (A64).....                      | 815 |
| 17.176 XPACD, XPACI, XPACLRI (A64).....    | 816 |
| 17.177 YIELD (A64).....                    | 816 |

|                                                                         |            |
|-------------------------------------------------------------------------|------------|
| <b>18. A64 Data Transfer Instructions.....</b>                          | <b>818</b> |
| 18.1 A64 data transfer instructions in alphabetical order.....          | 818        |
| 18.2 CASA, CASAL, CAS, CASL, CASAL, CAS, CASL (A64).....                | 822        |
| 18.3 CASAB, CASALB, CASB, CASLB (A64).....                              | 824        |
| 18.4 CASAH, CASALH, CASH, CASLH (A64).....                              | 825        |
| 18.5 CASPA, CASPAL, CASP, CASPL, CASPAL, CASP, CASPL (A64).....         | 826        |
| 18.6 LDADDA, LDADDAL, LDADD, LDADDL, LDADDAL, LDADD, LDADDL (A64).....  | 828        |
| 18.7 LDADDAB, LDADDALB, LDADDDB, LDADDLB (A64).....                     | 829        |
| 18.8 LDADDAH, LDADDALH, LDADDH, LDADDLH (A64).....                      | 830        |
| 18.9 LDAPR (A64).....                                                   | 831        |
| 18.10 LDAPRB (A64).....                                                 | 832        |
| 18.11 LDAPRH (A64).....                                                 | 833        |
| 18.12 LDAR (A64).....                                                   | 834        |
| 18.13 LDARB (A64).....                                                  | 834        |
| 18.14 LDARH (A64).....                                                  | 835        |
| 18.15 LDAXP (A64).....                                                  | 835        |
| 18.16 LDAXR (A64).....                                                  | 836        |
| 18.17 LDAXRB (A64).....                                                 | 837        |
| 18.18 LDAXRH (A64).....                                                 | 838        |
| 18.19 LDCLRA, LDCLRAL, LDCLR, LDCLRL, LDCLRAL, LDCLR, LDCLRL (A64)..... | 838        |
| 18.20 LDCLRAB, LDCLRALB, LDCLR, LDCLRLB (A64).....                      | 840        |
| 18.21 LDCLRAH, LDCLRALH, LDCLR, LDCLRLH (A64).....                      | 841        |
| 18.22 LDEORA, LDEORAL, LDEOR, LDEORL, LDEORAL, LDEOR, LDEORL (A64)..... | 842        |
| 18.23 LDEORAB, LDEORALB, LDEORB, LDEORLB (A64).....                     | 843        |
| 18.24 LDEORAH, LDEORALH, LDEORH, LDEORLH (A64).....                     | 844        |
| 18.25 LDLAR (A64).....                                                  | 845        |
| 18.26 LDLARB (A64).....                                                 | 846        |
| 18.27 LDLARH (A64).....                                                 | 846        |
| 18.28 LDNP (A64).....                                                   | 847        |
| 18.29 LDP (A64).....                                                    | 848        |
| 18.30 LDPSW (A64).....                                                  | 849        |
| 18.31 LDR (immediate).....                                              | 850        |
| 18.32 LDR (literal).....                                                | 851        |
| 18.33 LDR pseudo-instruction.....                                       | 852        |
| 18.34 LDR (register).....                                               | 854        |
| 18.35 LDRAA, LDRAB, LDRAB (A64).....                                    | 855        |

|                                                                                |     |
|--------------------------------------------------------------------------------|-----|
| 18.36 LDRB (immediate).....                                                    | 856 |
| 18.37 LDRB (register).....                                                     | 856 |
| 18.38 LDRH (immediate).....                                                    | 857 |
| 18.39 LDRH (register).....                                                     | 858 |
| 18.40 LDRSB (immediate).....                                                   | 859 |
| 18.41 LDRSB (register).....                                                    | 860 |
| 18.42 LDRSH (immediate).....                                                   | 861 |
| 18.43 LDRSH (register).....                                                    | 862 |
| 18.44 LDRSW (immediate).....                                                   | 863 |
| 18.45 LDRSW (literal).....                                                     | 864 |
| 18.46 LDRSW (register).....                                                    | 864 |
| 18.47 LDSETA, LDSETAL, LDSET, LDSETL, LDSETAL, LDSET, LDSETL (A64).....        | 865 |
| 18.48 LDSETAB, LDSETALB, LDSETB, LDSETLB (A64).....                            | 867 |
| 18.49 LDSETAH, LDSETALH, LDSETH, LDSETLH (A64).....                            | 868 |
| 18.50 LDSMAXA, LDSMAXAL, LDSMAX, LDSMAXL, LDSMAXAL, LDSMAX, LDSMAXL (A64)...   | 869 |
| 18.51 LDSMAXAB, LDSMAXALB, LDSMAXB, LDSMAXLB (A64).....                        | 870 |
| 18.52 LDSMAXAH, LDSMAXALH, LDSMAXH, LDSMAXLH (A64).....                        | 871 |
| 18.53 LDSMINA, LDSMINAL, LDSMIN, LDSMINL, LDSMINAL, LDSMIN, LDSMINL (A64)..... | 872 |
| 18.54 LDSMINAB, LDSMINALB, LDSMINB, LDSMINLB (A64).....                        | 873 |
| 18.55 LDSMINAH, LDSMINALH, LDSMINH, LDSMINLH (A64).....                        | 875 |
| 18.56 LDTR (A64).....                                                          | 876 |
| 18.57 LDTRB (A64).....                                                         | 876 |
| 18.58 LDTRH (A64).....                                                         | 877 |
| 18.59 LDTRSB (A64).....                                                        | 878 |
| 18.60 LDTRSH (A64).....                                                        | 879 |
| 18.61 LDTRSW (A64).....                                                        | 880 |
| 18.62 LDUMAXA, LDUMAXAL, LDUMAX, LDUMAXL, LDUMAXAL, LDUMAX, LDUMAXL (A64)..... | 880 |
| 18.63 LDUMAXAB, LDUMAXALB, LDUMAXB, LDUMAXLB (A64).....                        | 882 |
| 18.64 LDUMAXAH, LDUMAXALH, LDUMAXH, LDUMAXLH (A64).....                        | 883 |
| 18.65 LDUMINA, LDUMINAL, LDUMIN, LDUMINL, LDUMINAL, LDUMIN, LDUMINL (A64)....  | 884 |
| 18.66 LDUMINAB, LDUMINALB, LDUMINB, LDUMINLB (A64).....                        | 885 |
| 18.67 LDUMINAH, LDUMINALH, LDUMINH, LDUMINLH (A64).....                        | 886 |
| 18.68 LDUR (A64).....                                                          | 887 |
| 18.69 LDURB (A64).....                                                         | 888 |
| 18.70 LDURH (A64).....                                                         | 889 |

|                                          |     |
|------------------------------------------|-----|
| 18.71 LDURSB (A64).....                  | 889 |
| 18.72 LDURSH (A64).....                  | 890 |
| 18.73 LDURSW (A64).....                  | 891 |
| 18.74 LDXP (A64).....                    | 891 |
| 18.75 LDXR (A64).....                    | 892 |
| 18.76 LDXRB (A64).....                   | 893 |
| 18.77 LDXRH (A64).....                   | 893 |
| 18.78 PRFM (immediate) (A64).....        | 894 |
| 18.79 PRFM (literal) (A64).....          | 895 |
| 18.80 PRFM (register) (A64).....         | 897 |
| 18.81 PRFUM (unscaled offset) (A64)..... | 898 |
| 18.82 STADD, STADDL, STADDL (A64).....   | 900 |
| 18.83 STADDB, STADDLB (A64).....         | 901 |
| 18.84 STADDH, STADDLH (A64).....         | 901 |
| 18.85 STCLR, STCLRL, STCLRL (A64).....   | 902 |
| 18.86 STCLRB, STCLRLB (A64).....         | 903 |
| 18.87 STCLRH, STCLRLH (A64).....         | 904 |
| 18.88 STEOR, STEORL, STEORL (A64).....   | 905 |
| 18.89 STEORB, STEORLB (A64).....         | 906 |
| 18.90 STEORH, STEORLH (A64).....         | 906 |
| 18.91 STLLR (A64).....                   | 907 |
| 18.92 STLLRB (A64).....                  | 908 |
| 18.93 STLLRH (A64).....                  | 908 |
| 18.94 STL (A64).....                     | 909 |
| 18.95 STL (A64).....                     | 910 |
| 18.96 STL (A64).....                     | 910 |
| 18.97 STLXP (A64).....                   | 911 |
| 18.98 STLXR (A64).....                   | 912 |
| 18.99 STLXRB (A64).....                  | 914 |
| 18.100 STLXRH (A64).....                 | 915 |
| 18.101 STNP (A64).....                   | 916 |
| 18.102 STP (A64).....                    | 917 |
| 18.103 STR (immediate) (A64).....        | 918 |
| 18.104 STR (register) (A64).....         | 919 |
| 18.105 STRB (immediate) (A64).....       | 920 |
| 18.106 STRB (register) (A64).....        | 921 |

|                                                                 |            |
|-----------------------------------------------------------------|------------|
| 18.107 STRH (immediate) (A64).....                              | 922        |
| 18.108 STRH (register) (A64).....                               | 923        |
| 18.109 STSET, STSETL, STSETL (A64).....                         | 924        |
| 18.110 STSETB, STSETLB (A64).....                               | 925        |
| 18.111 STSETH, STSETLH (A64).....                               | 925        |
| 18.112 STSMAX, STSMAXL, STSMAXL (A64).....                      | 926        |
| 18.113 STSMAXB, STSMAXLB (A64).....                             | 927        |
| 18.114 STSMAXH, STSMAXLH (A64).....                             | 928        |
| 18.115 STSMIN, STSMINL, STSMINL (A64).....                      | 929        |
| 18.116 STSMINB, STSMINLB (A64).....                             | 930        |
| 18.117 STSMINH, STSMINLH (A64).....                             | 930        |
| 18.118 STTR (A64).....                                          | 931        |
| 18.119 STTRB (A64).....                                         | 932        |
| 18.120 STTRH (A64).....                                         | 933        |
| 18.121 STUMAX, STUMAXL, STUMAXL (A64).....                      | 933        |
| 18.122 STUMAXB, STUMAXLB (A64).....                             | 934        |
| 18.123 STUMAXH, STUMAXLH (A64).....                             | 935        |
| 18.124 STUMIN, STUMINL, STUMINL (A64).....                      | 936        |
| 18.125 STUMINB, STUMINLB (A64).....                             | 937        |
| 18.126 STUMINH, STUMINLH (A64).....                             | 938        |
| 18.127 STUR (A64).....                                          | 938        |
| 18.128 STURB (A64).....                                         | 939        |
| 18.129 STURH (A64).....                                         | 940        |
| 18.130 STXP (A64).....                                          | 940        |
| 18.131 STXR (A64).....                                          | 942        |
| 18.132 STXRB (A64).....                                         | 943        |
| 18.133 STXRH (A64).....                                         | 944        |
| 18.134 SWPA, SWPAL, SWP, SWPL, SWPAL, SWP, SWPL (A64).....      | 945        |
| 18.135 SWPAB, SWPALB, SWPB, SWPLB (A64).....                    | 947        |
| 18.136 SWPAH, SWPALH, SWPH, SWPLH (A64).....                    | 948        |
| <b>19. A64 Floating-point Instructions.....</b>                 | <b>950</b> |
| 19.1 A64 floating-point instructions in alphabetical order..... | 950        |
| 19.2 FABS (scalar) (A64 FP).....                                | 952        |
| 19.3 FADD (scalar) (A64).....                                   | 953        |
| 19.4 FCCMP (A64).....                                           | 954        |

|                                               |     |
|-----------------------------------------------|-----|
| 19.5 FCCMPE (A64).....                        | 955 |
| 19.6 FCMP (A64).....                          | 957 |
| 19.7 FCMPE (A64).....                         | 958 |
| 19.8 FCSEL (A64).....                         | 960 |
| 19.9 FCVT (A64).....                          | 961 |
| 19.10 FCVTAS (scalar) (A64).....              | 962 |
| 19.11 FCVTAU (scalar) (A64).....              | 963 |
| 19.12 FCVTMS (scalar) (A64).....              | 964 |
| 19.13 FCVTMU (scalar) (A64).....              | 965 |
| 19.14 FCVTNS (scalar) (A64).....              | 967 |
| 19.15 FCVTNU (scalar) (A64).....              | 968 |
| 19.16 FCVTPS (scalar) (A64).....              | 969 |
| 19.17 FCVTPU (scalar) (A64).....              | 970 |
| 19.18 FCVTZS (scalar, fixed-point) (A64)..... | 971 |
| 19.19 FCVTZS (scalar, integer) (A64).....     | 972 |
| 19.20 FCVTZU (scalar, fixed-point) (A64)..... | 973 |
| 19.21 FCVTZU (scalar, integer) (A64).....     | 975 |
| 19.22 FDIV (scalar) (A64).....                | 976 |
| 19.23 FJCVTZS (A64).....                      | 977 |
| 19.24 FMADD (A64).....                        | 978 |
| 19.25 FMAX (scalar) (A64).....                | 979 |
| 19.26 FMAXNM (scalar) (A64).....              | 980 |
| 19.27 FMIN (scalar) (A64).....                | 981 |
| 19.28 FMINNM (scalar) (A64).....              | 983 |
| 19.29 FMOV (register) (A64).....              | 984 |
| 19.30 FMOV (general) (A64).....               | 985 |
| 19.31 FMOV (scalar, immediate) (A64).....     | 986 |
| 19.32 FMSUB (A64).....                        | 987 |
| 19.33 FMUL (scalar) (A64).....                | 988 |
| 19.34 FNNEG (scalar) (A64).....               | 990 |
| 19.35 FNMMADD (A64).....                      | 991 |
| 19.36 FNMSUB (A64).....                       | 992 |
| 19.37 FNML (scalar) (A64).....                | 993 |
| 19.38 FRINTA (scalar) (A64).....              | 994 |
| 19.39 FRINTI (scalar) (A64).....              | 995 |
| 19.40 FRINTM (scalar) (A64).....              | 997 |

|                                                              |             |
|--------------------------------------------------------------|-------------|
| 19.41 FRINTN (scalar) (A64).....                             | 998         |
| 19.42 FRINTP (scalar) (A64).....                             | 999         |
| 19.43 FRINTX (scalar) (A64).....                             | 1000        |
| 19.44 FRINTZ (scalar) (A64).....                             | 1001        |
| 19.45 FSQRT (scalar) (A64).....                              | 1002        |
| 19.46 FSUB (scalar) (A64).....                               | 1003        |
| 19.47 LDNP (SIMD and FP) (A64).....                          | 1004        |
| 19.48 LDP (SIMD and FP) (A64).....                           | 1006        |
| 19.49 LDR (immediate, SIMD and FP) (A64).....                | 1007        |
| 19.50 LDR (literal, SIMD and FP) (A64).....                  | 1009        |
| 19.51 LDR (register, SIMD and FP) (A64).....                 | 1010        |
| 19.52 LDUR (SIMD and FP) (A64).....                          | 1011        |
| 19.53 SCVTF (scalar, fixed-point) (A64).....                 | 1012        |
| 19.54 SCVTF (scalar, integer) (A64).....                     | 1013        |
| 19.55 STNP (SIMD and FP) (A64).....                          | 1014        |
| 19.56 STP (SIMD and FP) (A64).....                           | 1016        |
| 19.57 STR (immediate, SIMD and FP) (A64).....                | 1017        |
| 19.58 STR (register, SIMD and FP) (A64).....                 | 1019        |
| 19.59 STUR (SIMD and FP) (A64).....                          | 1020        |
| 19.60 UCVTF (scalar, fixed-point) (A64).....                 | 1021        |
| 19.61 UCVTF (scalar, integer) (A64).....                     | 1022        |
| <b>20. A64 SIMD Scalar Instructions.....</b>                 | <b>1024</b> |
| 20.1 A64 SIMD scalar instructions in alphabetical order..... | 1024        |
| 20.2 ABS (scalar) (A64 SIMD).....                            | 1028        |
| 20.3 ADD (scalar) (A64 SIMD).....                            | 1029        |
| 20.4 ADDP (scalar) (A64 SIMD).....                           | 1029        |
| 20.5 CMEQ (scalar, register) (A64 SIMD).....                 | 1030        |
| 20.6 CMEQ (scalar, zero) (A64 SIMD).....                     | 1031        |
| 20.7 CMGE (scalar, register) (A64 SIMD).....                 | 1031        |
| 20.8 CMGE (scalar, zero) (A64 SIMD).....                     | 1032        |
| 20.9 CMGT (scalar, register) (A64 SIMD).....                 | 1033        |
| 20.10 CMGT (scalar, zero) (A64 SIMD).....                    | 1034        |
| 20.11 CMHI (scalar, register) (A64 SIMD).....                | 1034        |
| 20.12 CMHS (scalar, register) (A64 SIMD).....                | 1035        |
| 20.13 CMLE (scalar, zero) (A64 SIMD).....                    | 1036        |

|                                                    |      |
|----------------------------------------------------|------|
| 20.14 CMLT (scalar, zero) (A64 SIMD).....          | 1036 |
| 20.15 CMTST (scalar) (A64 SIMD).....               | 1037 |
| 20.16 DUP (scalar, element) (A64 SIMD).....        | 1038 |
| 20.17 FABD (scalar) (A64 SIMD).....                | 1039 |
| 20.18 FACGE (scalar) (A64 SIMD).....               | 1040 |
| 20.19 FACGT (scalar) (A64 SIMD).....               | 1041 |
| 20.20 FADDP (scalar) (A64 SIMD).....               | 1042 |
| 20.21 FCMEQ (scalar, register) (A64 SIMD).....     | 1043 |
| 20.22 FCMEQ (scalar, zero) (A64 SIMD).....         | 1044 |
| 20.23 FCMGE (scalar, register) (A64 SIMD).....     | 1045 |
| 20.24 FCMGE (scalar, zero) (A64 SIMD).....         | 1046 |
| 20.25 FCMGT (scalar, register) (A64 SIMD).....     | 1047 |
| 20.26 FCMGT (scalar, zero) (A64 SIMD).....         | 1048 |
| 20.27 FCMLA (scalar, by element) (A64 SIMD).....   | 1049 |
| 20.28 FCMLE (scalar, zero) (A64 SIMD).....         | 1051 |
| 20.29 FCMLT (scalar, zero) (A64 SIMD).....         | 1052 |
| 20.30 FCVTAS (scalar) (A64 SIMD).....              | 1053 |
| 20.31 FCVTAU (scalar) (A64 SIMD).....              | 1054 |
| 20.32 FCVTMS (scalar) (A64 SIMD).....              | 1055 |
| 20.33 FCVTMU (scalar) (A64 SIMD).....              | 1056 |
| 20.34 FCVTNS (scalar) (A64 SIMD).....              | 1057 |
| 20.35 FCVTNU (scalar) (A64 SIMD).....              | 1058 |
| 20.36 FCVTPS (scalar) (A64 SIMD).....              | 1059 |
| 20.37 FCVTPU (scalar) (A64 SIMD).....              | 1060 |
| 20.38 FCVTXN (scalar) (A64 SIMD).....              | 1061 |
| 20.39 FCVTZS (scalar, fixed-point) (A64 SIMD)..... | 1062 |
| 20.40 FCVTZS (scalar, integer) (A64 SIMD).....     | 1063 |
| 20.41 FCVTZU (scalar, fixed-point) (A64 SIMD)..... | 1064 |
| 20.42 FCVTZU (scalar, integer) (A64 SIMD).....     | 1065 |
| 20.43 FMAXNMP (scalar) (A64 SIMD).....             | 1066 |
| 20.44 FMAXP (scalar) (A64 SIMD).....               | 1067 |
| 20.45 FMINNMP (scalar) (A64 SIMD).....             | 1068 |
| 20.46 FMINP (scalar) (A64 SIMD).....               | 1070 |
| 20.47 FMLA (scalar, by element) (A64 SIMD).....    | 1071 |
| 20.48 FMLS (scalar, by element) (A64 SIMD).....    | 1072 |
| 20.49 FMUL (scalar, by element) (A64 SIMD).....    | 1073 |

|                                                     |      |
|-----------------------------------------------------|------|
| 20.50 FMULX (scalar, by element) (A64 SIMD).....    | 1074 |
| 20.51 FMULX (scalar) (A64 SIMD).....                | 1075 |
| 20.52 FRECPE (scalar) (A64 SIMD).....               | 1076 |
| 20.53 FRECPS (scalar) (A64 SIMD).....               | 1077 |
| 20.54 FRSQRTE (scalar) (A64 SIMD).....              | 1078 |
| 20.55 FRSQRTS (scalar) (A64 SIMD).....              | 1079 |
| 20.56 MOV (scalar) (A64 SIMD).....                  | 1080 |
| 20.57 NEG (scalar) (A64 SIMD).....                  | 1081 |
| 20.58 SCVTF (scalar, fixed-point) (A64 SIMD).....   | 1082 |
| 20.59 SCVTF (scalar, integer) (A64 SIMD).....       | 1083 |
| 20.60 SHL (scalar) (A64 SIMD).....                  | 1084 |
| 20.61 SLI (scalar) (A64 SIMD).....                  | 1085 |
| 20.62 SQABS (scalar) (A64 SIMD).....                | 1085 |
| 20.63 SQADD (scalar) (A64 SIMD).....                | 1086 |
| 20.64 SQDMLAL (scalar, by element) (A64 SIMD).....  | 1087 |
| 20.65 SQDMLAL (scalar) (A64 SIMD).....              | 1088 |
| 20.66 SQDMLSL (scalar, by element) (A64 SIMD).....  | 1089 |
| 20.67 SQDMLSL (scalar) (A64 SIMD).....              | 1090 |
| 20.68 SQDMULH (scalar, by element) (A64 SIMD).....  | 1091 |
| 20.69 SQDMULH (scalar) (A64 SIMD).....              | 1092 |
| 20.70 SQDMULL (scalar, by element) (A64 SIMD).....  | 1093 |
| 20.71 SQDMULL (scalar) (A64 SIMD).....              | 1094 |
| 20.72 SQNEG (scalar) (A64 SIMD).....                | 1095 |
| 20.73 SQRDMLAH (scalar, by element) (A64 SIMD)..... | 1096 |
| 20.74 SQRDMLAH (scalar) (A64 SIMD).....             | 1097 |
| 20.75 SQRDMLSH (scalar, by element) (A64 SIMD)..... | 1098 |
| 20.76 SQRDMLSH (scalar) (A64 SIMD).....             | 1099 |
| 20.77 SQRDMULH (scalar, by element) (A64 SIMD)..... | 1100 |
| 20.78 SQRDMULH (scalar) (A64 SIMD).....             | 1101 |
| 20.79 SQRSHL (scalar) (A64 SIMD).....               | 1102 |
| 20.80 SQRSHRN (scalar) (A64 SIMD).....              | 1103 |
| 20.81 SQRSHRUN (scalar) (A64 SIMD).....             | 1104 |
| 20.82 SQSHL (scalar, immediate) (A64 SIMD).....     | 1105 |
| 20.83 SQSHL (scalar, register) (A64 SIMD).....      | 1106 |
| 20.84 SQSHLU (scalar) (A64 SIMD).....               | 1107 |
| 20.85 SQSHRN (scalar) (A64 SIMD).....               | 1108 |

|                                                              |             |
|--------------------------------------------------------------|-------------|
| 20.86 SQSHRUN (scalar) (A64 SIMD).....                       | 1109        |
| 20.87 SQSUB (scalar) (A64 SIMD).....                         | 1110        |
| 20.88 SQXTN (scalar) (A64 SIMD).....                         | 1111        |
| 20.89 SQXTUN (scalar) (A64 SIMD).....                        | 1112        |
| 20.90 SRI (scalar) (A64 SIMD).....                           | 1113        |
| 20.91 SRSHL (scalar) (A64 SIMD).....                         | 1114        |
| 20.92 SRSHR (scalar) (A64 SIMD).....                         | 1115        |
| 20.93 SRSRA (scalar) (A64 SIMD).....                         | 1115        |
| 20.94 SSHL (scalar) (A64 SIMD).....                          | 1116        |
| 20.95 SSHR (scalar) (A64 SIMD).....                          | 1117        |
| 20.96 SSRA (scalar) (A64 SIMD).....                          | 1117        |
| 20.97 SUB (scalar) (A64 SIMD).....                           | 1118        |
| 20.98 SUQADD (scalar) (A64 SIMD).....                        | 1119        |
| 20.99 UCVTF (scalar, fixed-point) (A64 SIMD).....            | 1120        |
| 20.100 UCVTF (scalar, integer) (A64 SIMD).....               | 1121        |
| 20.101 UQADD (scalar) (A64 SIMD).....                        | 1122        |
| 20.102 UQRSHL (scalar) (A64 SIMD).....                       | 1122        |
| 20.103 UQRSHRN (scalar) (A64 SIMD).....                      | 1123        |
| 20.104 UQSHL (scalar, immediate) (A64 SIMD).....             | 1124        |
| 20.105 UQSHL (scalar, register) (A64 SIMD).....              | 1125        |
| 20.106 UQSHRN (scalar) (A64 SIMD).....                       | 1126        |
| 20.107 UQSUB (scalar) (A64 SIMD).....                        | 1127        |
| 20.108 UQXTN (scalar) (A64 SIMD).....                        | 1128        |
| 20.109 URSHL (scalar) (A64 SIMD).....                        | 1129        |
| 20.110 URSHR (scalar) (A64 SIMD).....                        | 1130        |
| 20.111 URSRA (scalar) (A64 SIMD).....                        | 1130        |
| 20.112 USHL (scalar) (A64 SIMD).....                         | 1131        |
| 20.113 USHR (scalar) (A64 SIMD).....                         | 1132        |
| 20.114 USQADD (scalar) (A64 SIMD).....                       | 1133        |
| 20.115 USRA (scalar) (A64 SIMD).....                         | 1133        |
| <b>21. A64 SIMD Vector Instructions.....</b>                 | <b>1135</b> |
| 21.1 A64 SIMD Vector instructions in alphabetical order..... | 1135        |
| 21.2 ABS (vector) (A64).....                                 | 1144        |
| 21.3 ADD (vector) (A64).....                                 | 1145        |
| 21.4 ADDHN, ADDHN2 (vector) (A64).....                       | 1145        |

|                                           |      |
|-------------------------------------------|------|
| 21.5 ADDP (vector) (A64).....             | 1147 |
| 21.6 ADDV (vector) (A64).....             | 1147 |
| 21.7 AND (vector) (A64).....              | 1148 |
| 21.8 BIC (vector, immediate) (A64).....   | 1149 |
| 21.9 BIC (vector, register) (A64).....    | 1150 |
| 21.10 BIF (vector) (A64).....             | 1151 |
| 21.11 BIT (vector) (A64).....             | 1151 |
| 21.12 BSL (vector) (A64).....             | 1152 |
| 21.13 CLS (vector) (A64).....             | 1153 |
| 21.14 CLZ (vector) (A64).....             | 1153 |
| 21.15 CMEQ (vector, register) (A64).....  | 1154 |
| 21.16 CMEQ (vector, zero) (A64).....      | 1155 |
| 21.17 CMGE (vector, register) (A64).....  | 1155 |
| 21.18 CMGE (vector, zero) (A64).....      | 1156 |
| 21.19 CMGT (vector, register) (A64).....  | 1157 |
| 21.20 CMGT (vector, zero) (A64).....      | 1157 |
| 21.21 CMHI (vector, register) (A64).....  | 1158 |
| 21.22 CMHS (vector, register) (A64).....  | 1159 |
| 21.23 CMLE (vector, zero) (A64).....      | 1160 |
| 21.24 CMLT (vector, zero) (A64).....      | 1160 |
| 21.25 CMTST (vector) (A64).....           | 1161 |
| 21.26 CNT (vector) (A64).....             | 1162 |
| 21.27 DUP (vector, element) (A64).....    | 1162 |
| 21.28 DUP (vector, general) (A64).....    | 1163 |
| 21.29 EOR (vector) (A64).....             | 1164 |
| 21.30 EXT (vector) (A64).....             | 1165 |
| 21.31 FABD (vector) (A64).....            | 1166 |
| 21.32 FABS (vector) (A64).....            | 1167 |
| 21.33 FACGE (vector) (A64).....           | 1168 |
| 21.34 FACGT (vector) (A64).....           | 1169 |
| 21.35 FADD (vector) (A64).....            | 1170 |
| 21.36 FADDP (vector) (A64).....           | 1171 |
| 21.37 FCADD (vector) (A64).....           | 1172 |
| 21.38 FCMEQ (vector, register) (A64)..... | 1173 |
| 21.39 FCMEQ (vector, zero) (A64).....     | 1174 |
| 21.40 FCMGE (vector, register) (A64)..... | 1175 |

|                                               |      |
|-----------------------------------------------|------|
| 21.41 FCMGE (vector, zero) (A64).....         | 1176 |
| 21.42 FCMGT (vector, register) (A64).....     | 1177 |
| 21.43 FCMGT (vector, zero) (A64).....         | 1178 |
| 21.44 FCMLA (vector) (A64).....               | 1179 |
| 21.45 FCMLE (vector, zero) (A64).....         | 1180 |
| 21.46 FCMLT (vector, zero) (A64).....         | 1181 |
| 21.47 FCVTAS (vector) (A64).....              | 1182 |
| 21.48 FCVTAU (vector) (A64).....              | 1183 |
| 21.49 FCVTL, FCVTL2 (vector) (A64).....       | 1184 |
| 21.50 FCVTPS (vector) (A64).....              | 1185 |
| 21.51 FCVTPU (vector) (A64).....              | 1186 |
| 21.52 FCVTN, FCVTN2 (vector) (A64).....       | 1187 |
| 21.53 FCVTNS (vector) (A64).....              | 1189 |
| 21.54 FCVTNU (vector) (A64).....              | 1190 |
| 21.55 FCVTPS (vector) (A64).....              | 1191 |
| 21.56 FCVTPU (vector) (A64).....              | 1192 |
| 21.57 FCVTXN, FCVTXN2 (vector) (A64).....     | 1193 |
| 21.58 FCVTZS (vector, fixed-point) (A64)..... | 1194 |
| 21.59 FCVTZS (vector, integer) (A64).....     | 1195 |
| 21.60 FCVTZU (vector, fixed-point) (A64)..... | 1196 |
| 21.61 FCVTZU (vector, integer) (A64).....     | 1197 |
| 21.62 FDIV (vector) (A64).....                | 1198 |
| 21.63 FMAX (vector) (A64).....                | 1199 |
| 21.64 FMAXNM (vector) (A64).....              | 1200 |
| 21.65 FMAXNMP (vector) (A64).....             | 1201 |
| 21.66 FMAXNMV (vector) (A64).....             | 1202 |
| 21.67 FMAXP (vector) (A64).....               | 1203 |
| 21.68 FMAXV (vector) (A64).....               | 1204 |
| 21.69 FMIN (vector) (A64).....                | 1206 |
| 21.70 FMINNM (vector) (A64).....              | 1207 |
| 21.71 FMINNMP (vector) (A64).....             | 1208 |
| 21.72 FMINNMV (vector) (A64).....             | 1209 |
| 21.73 FMINP (vector) (A64).....               | 1210 |
| 21.74 FMINV (vector) (A64).....               | 1211 |
| 21.75 FMLA (vector, by element) (A64).....    | 1212 |
| 21.76 FMLA (vector) (A64).....                | 1214 |

|                                                     |      |
|-----------------------------------------------------|------|
| 21.77 FMLS (vector, by element) (A64).....          | 1215 |
| 21.78 FMLS (vector) (A64).....                      | 1216 |
| 21.79 FMOV (vector, immediate) (A64).....           | 1217 |
| 21.80 FMUL (vector, by element) (A64).....          | 1219 |
| 21.81 FMUL (vector) (A64).....                      | 1220 |
| 21.82 FMULX (vector, by element) (A64).....         | 1221 |
| 21.83 FMULX (vector) (A64).....                     | 1223 |
| 21.84 FNNEG (vector) (A64).....                     | 1224 |
| 21.85 FRECPE (vector) (A64).....                    | 1224 |
| 21.86 FRECPSS (vector) (A64).....                   | 1225 |
| 21.87 FRECPX (vector) (A64).....                    | 1226 |
| 21.88 FRINTA (vector) (A64).....                    | 1227 |
| 21.89 FRINTI (vector) (A64).....                    | 1228 |
| 21.90 FRINTM (vector) (A64).....                    | 1229 |
| 21.91 FRINTN (vector) (A64).....                    | 1230 |
| 21.92 FRINTP (vector) (A64).....                    | 1231 |
| 21.93 FRINTX (vector) (A64).....                    | 1232 |
| 21.94 FRINTZ (vector) (A64).....                    | 1233 |
| 21.95 FRSQRTE (vector) (A64).....                   | 1234 |
| 21.96 FRSQRTS (vector) (A64).....                   | 1235 |
| 21.97 FSQRT (vector) (A64).....                     | 1236 |
| 21.98 FSUB (vector) (A64).....                      | 1237 |
| 21.99 INS (vector, element) (A64).....              | 1238 |
| 21.100 INS (vector, general) (A64).....             | 1239 |
| 21.101 LD1 (vector, multiple structures) (A64)..... | 1241 |
| 21.102 LD1 (vector, single structure) (A64).....    | 1243 |
| 21.103 LD1R (vector) (A64).....                     | 1245 |
| 21.104 LD2 (vector, multiple structures) (A64)..... | 1246 |
| 21.105 LD2 (vector, single structure) (A64).....    | 1247 |
| 21.106 LD2R (vector) (A64).....                     | 1248 |
| 21.107 LD3 (vector, multiple structures) (A64)..... | 1249 |
| 21.108 LD3 (vector, single structure) (A64).....    | 1250 |
| 21.109 LD3R (vector) (A64).....                     | 1251 |
| 21.110 LD4 (vector, multiple structures) (A64)..... | 1253 |
| 21.111 LD4 (vector, single structure) (A64).....    | 1254 |
| 21.112 LD4R (vector) (A64).....                     | 1255 |

|                                              |      |
|----------------------------------------------|------|
| 21.113 MLA (vector, by element) (A64).....   | 1257 |
| 21.114 MLA (vector) (A64).....               | 1258 |
| 21.115 MLS (vector, by element) (A64).....   | 1258 |
| 21.116 MLS (vector) (A64).....               | 1259 |
| 21.117 MOV (vector, element) (A64).....      | 1260 |
| 21.118 MOV (vector, from general) (A64)..... | 1261 |
| 21.119 MOV (vector) (A64).....               | 1262 |
| 21.120 MOV (vector, to general) (A64).....   | 1263 |
| 21.121 MOVI (vector) (A64).....              | 1264 |
| 21.122 MUL (vector, by element) (A64).....   | 1265 |
| 21.123 MUL (vector) (A64).....               | 1266 |
| 21.124 MVN (vector) (A64).....               | 1267 |
| 21.125 MVNI (vector) (A64).....              | 1268 |
| 21.126 NEG (vector) (A64).....               | 1269 |
| 21.127 NOT (vector) (A64).....               | 1269 |
| 21.128 ORN (vector) (A64).....               | 1270 |
| 21.129 ORR (vector, immediate) (A64).....    | 1271 |
| 21.130 ORR (vector, register) (A64).....     | 1272 |
| 21.131 PMUL (vector) (A64).....              | 1273 |
| 21.132 PMULL, PMULL2 (vector) (A64).....     | 1273 |
| 21.133 RADDHN, RADDHN2 (vector) (A64).....   | 1274 |
| 21.134 RBIT (vector) (A64).....              | 1276 |
| 21.135 REV16 (vector) (A64).....             | 1276 |
| 21.136 REV32 (vector) (A64).....             | 1277 |
| 21.137 REV64 (vector) (A64).....             | 1278 |
| 21.138 RSHRN, RSHRN2 (vector) (A64).....     | 1278 |
| 21.139 RSUBHN, RSUBHN2 (vector) (A64).....   | 1279 |
| 21.140 SABA (vector) (A64).....              | 1281 |
| 21.141 SABAL, SABAL2 (vector) (A64).....     | 1281 |
| 21.142 SABD (vector) (A64).....              | 1283 |
| 21.143 SABDL, SABDL2 (vector) (A64).....     | 1283 |
| 21.144 SADALP (vector) (A64).....            | 1285 |
| 21.145 SADDL, SADDL2 (vector) (A64).....     | 1286 |
| 21.146 SADDLP (vector) (A64).....            | 1287 |
| 21.147 SADDLV (vector) (A64).....            | 1288 |
| 21.148 SADDW, SADDW2 (vector) (A64).....     | 1289 |

|                                                          |      |
|----------------------------------------------------------|------|
| 21.149 SCVTF (vector, fixed-point) (A64).....            | 1290 |
| 21.150 SCVTF (vector, integer) (A64).....                | 1291 |
| 21.151 SHADD (vector) (A64).....                         | 1292 |
| 21.152 SHL (vector) (A64).....                           | 1293 |
| 21.153 SHLL, SHLL2 (vector) (A64).....                   | 1294 |
| 21.154 SHRN, SHRN2 (vector) (A64).....                   | 1295 |
| 21.155 SHSUB (vector) (A64).....                         | 1296 |
| 21.156 SLI (vector) (A64).....                           | 1297 |
| 21.157 SMAX (vector) (A64).....                          | 1298 |
| 21.158 SMAXP (vector) (A64).....                         | 1298 |
| 21.159 SMAXV (vector) (A64).....                         | 1299 |
| 21.160 SMIN (vector) (A64).....                          | 1300 |
| 21.161 SMINP (vector) (A64).....                         | 1301 |
| 21.162 SMINV (vector) (A64).....                         | 1301 |
| 21.163 SMLAL, SMLAL2 (vector, by element) (A64).....     | 1302 |
| 21.164 SMLAL, SMLAL2 (vector) (A64).....                 | 1304 |
| 21.165 SMLSL, SMLSL2 (vector, by element) (A64).....     | 1305 |
| 21.166 SMLSL, SMLSL2 (vector) (A64).....                 | 1306 |
| 21.167 SMOV (vector) (A64).....                          | 1307 |
| 21.168 SMULL, SMULL2 (vector, by element) (A64).....     | 1308 |
| 21.169 SMULL, SMULL2 (vector) (A64).....                 | 1310 |
| 21.170 SQABS (vector) (A64).....                         | 1311 |
| 21.171 SQADD (vector) (A64).....                         | 1312 |
| 21.172 SQDMLAL, SQDMLAL2 (vector, by element) (A64)..... | 1312 |
| 21.173 SQDMLAL, SQDMLAL2 (vector) (A64).....             | 1314 |
| 21.174 SQDMLSL, SQDMLSL2 (vector, by element) (A64)..... | 1315 |
| 21.175 SQDMLSL, SQDMLSL2 (vector) (A64).....             | 1316 |
| 21.176 SQDMULH (vector, by element) (A64).....           | 1318 |
| 21.177 SQDMULH (vector) (A64).....                       | 1319 |
| 21.178 SQDMULL, SQDMULL2 (vector, by element) (A64)..... | 1320 |
| 21.179 SQDMULL, SQDMULL2 (vector) (A64).....             | 1321 |
| 21.180 SQNEG (vector) (A64).....                         | 1322 |
| 21.181 SQRDMLAH (vector, by element) (A64).....          | 1323 |
| 21.182 SQRDMLAH (vector) (A64).....                      | 1324 |
| 21.183 SQRDMLSH (vector, by element) (A64).....          | 1325 |
| 21.184 SQRDMLSH (vector) (A64).....                      | 1326 |

|                                                     |      |
|-----------------------------------------------------|------|
| 21.185 SQRDMULH (vector, by element) (A64).....     | 1327 |
| 21.186 SQRDMULH (vector) (A64).....                 | 1328 |
| 21.187 SQRSHL (vector) (A64).....                   | 1329 |
| 21.188 SQRSHRN, SQRSHRN2 (vector) (A64).....        | 1330 |
| 21.189 SQRSHRUN, SQRSHRUN2 (vector) (A64).....      | 1331 |
| 21.190 SQSHL (vector, immediate) (A64).....         | 1332 |
| 21.191 SQSHL (vector, register) (A64).....          | 1333 |
| 21.192 SQSHLU (vector) (A64).....                   | 1334 |
| 21.193 SQSHRN, SQSHRN2 (vector) (A64).....          | 1335 |
| 21.194 SQSHRUN, SQSHRUN2 (vector) (A64).....        | 1336 |
| 21.195 SQSUB (vector) (A64).....                    | 1338 |
| 21.196 SQXTN, SQXTN2 (vector) (A64).....            | 1338 |
| 21.197 SQXTUN, SQXTUN2 (vector) (A64).....          | 1340 |
| 21.198 SRHADD (vector) (A64).....                   | 1341 |
| 21.199 SRI (vector) (A64).....                      | 1342 |
| 21.200 SRSHL (vector) (A64).....                    | 1343 |
| 21.201 SRSHR (vector) (A64).....                    | 1343 |
| 21.202 SRSRA (vector) (A64).....                    | 1344 |
| 21.203 SSHL (vector) (A64).....                     | 1345 |
| 21.204 SSHLL, SSHLL2 (vector) (A64).....            | 1346 |
| 21.205 SSHR (vector) (A64).....                     | 1347 |
| 21.206 SSRA (vector) (A64).....                     | 1348 |
| 21.207 SSUBL, SSUBL2 (vector) (A64).....            | 1349 |
| 21.208 SSUBW, SSUBW2 (vector) (A64).....            | 1351 |
| 21.209 ST1 (vector, multiple structures) (A64)..... | 1352 |
| 21.210 ST1 (vector, single structure) (A64).....    | 1355 |
| 21.211 ST2 (vector, multiple structures) (A64)..... | 1356 |
| 21.212 ST2 (vector, single structure) (A64).....    | 1357 |
| 21.213 ST3 (vector, multiple structures) (A64)..... | 1358 |
| 21.214 ST3 (vector, single structure) (A64).....    | 1359 |
| 21.215 ST4 (vector, multiple structures) (A64)..... | 1360 |
| 21.216 ST4 (vector, single structure) (A64).....    | 1361 |
| 21.217 SUB (vector) (A64).....                      | 1363 |
| 21.218 SUBHN, SUBHN2 (vector) (A64).....            | 1363 |
| 21.219 SUQADD (vector) (A64).....                   | 1365 |
| 21.220 SXTL, SXTL2 (vector) (A64).....              | 1365 |

|                                                      |      |
|------------------------------------------------------|------|
| 21.221 TBL (vector) (A64).....                       | 1367 |
| 21.222 TBX (vector) (A64).....                       | 1368 |
| 21.223 TRN1 (vector) (A64).....                      | 1369 |
| 21.224 TRN2 (vector) (A64).....                      | 1370 |
| 21.225 UABA (vector) (A64).....                      | 1371 |
| 21.226 UABAL, UABAL2 (vector) (A64).....             | 1371 |
| 21.227 UABD (vector) (A64).....                      | 1373 |
| 21.228 UABDL, UABDL2 (vector) (A64).....             | 1373 |
| 21.229 UADALP (vector) (A64).....                    | 1375 |
| 21.230 UADDL, UADDL2 (vector) (A64).....             | 1376 |
| 21.231 UADDLP (vector) (A64).....                    | 1377 |
| 21.232 UADDLV (vector) (A64).....                    | 1378 |
| 21.233 UADDW, UADDW2 (vector) (A64).....             | 1379 |
| 21.234 UCVTF (vector, fixed-point) (A64).....        | 1380 |
| 21.235 UCVTF (vector, integer) (A64).....            | 1381 |
| 21.236 UHADD (vector) (A64).....                     | 1382 |
| 21.237 UHSUB (vector) (A64).....                     | 1383 |
| 21.238 UMAX (vector) (A64).....                      | 1383 |
| 21.239 UMAXP (vector) (A64).....                     | 1384 |
| 21.240 UMAXV (vector) (A64).....                     | 1385 |
| 21.241 UMIN (vector) (A64).....                      | 1386 |
| 21.242 UMINP (vector) (A64).....                     | 1386 |
| 21.243 UMINV (vector) (A64).....                     | 1387 |
| 21.244 UMLAL, UMLAL2 (vector, by element) (A64)..... | 1388 |
| 21.245 UMLAL, UMLAL2 (vector) (A64).....             | 1389 |
| 21.246 UMLSL, UMLSL2 (vector, by element) (A64)..... | 1390 |
| 21.247 UMLSL, UMLSL2 (vector) (A64).....             | 1392 |
| 21.248 UMOV (vector) (A64).....                      | 1393 |
| 21.249 UMULL, UMULL2 (vector, by element) (A64)..... | 1394 |
| 21.250 UMULL, UMULL2 (vector) (A64).....             | 1395 |
| 21.251 UQADD (vector) (A64).....                     | 1397 |
| 21.252 UQRSHL (vector) (A64).....                    | 1397 |
| 21.253 UQRSHRN, UQRSHRN2 (vector) (A64).....         | 1398 |
| 21.254 UQSHL (vector, immediate) (A64).....          | 1399 |
| 21.255 UQSHL (vector, register) (A64).....           | 1401 |
| 21.256 UQSHRN, UQSHRN2 (vector) (A64).....           | 1401 |

|                                             |             |
|---------------------------------------------|-------------|
| 21.257 UQSUB (vector) (A64).....            | 1403        |
| 21.258 UQXTN, UQXTN2 (vector) (A64).....    | 1403        |
| 21.259 URECPE (vector) (A64).....           | 1405        |
| 21.260 URHADD (vector) (A64).....           | 1405        |
| 21.261 URSHL (vector) (A64).....            | 1406        |
| 21.262 URSHR (vector) (A64).....            | 1407        |
| 21.263 URSQRTE (vector) (A64).....          | 1408        |
| 21.264 URSRA (vector) (A64).....            | 1408        |
| 21.265 USHL (vector) (A64).....             | 1409        |
| 21.266 USHLL, USHLL2 (vector) (A64).....    | 1410        |
| 21.267 USHR (vector) (A64).....             | 1411        |
| 21.268 USQADD (vector) (A64).....           | 1412        |
| 21.269 USRA (vector) (A64).....             | 1413        |
| 21.270 USUBL, USUBL2 (vector) (A64).....    | 1414        |
| 21.271 USUBW, USUBW2 (vector) (A64).....    | 1415        |
| 21.272 UXTL, UXTL2 (vector) (A64).....      | 1416        |
| 21.273 UZP1 (vector) (A64).....             | 1418        |
| 21.274 UZP2 (vector) (A64).....             | 1418        |
| 21.275 XTN, XTN2 (vector) (A64).....        | 1419        |
| 21.276 ZIP1 (vector) (A64).....             | 1420        |
| 21.277 ZIP2 (vector) (A64).....             | 1421        |
| <b>22. Directives Reference.....</b>        | <b>1423</b> |
| 22.1 Alphabetical list of directives.....   | 1423        |
| 22.2 About assembly control directives..... | 1424        |
| 22.3 About frame directives.....            | 1424        |
| 22.4 ALIAS.....                             | 1425        |
| 22.5 ALIGN.....                             | 1426        |
| 22.6 AREA.....                              | 1428        |
| 22.7 ARM or CODE32 directive.....           | 1432        |
| 22.8 ASSERT.....                            | 1433        |
| 22.9 ATTR.....                              | 1434        |
| 22.10 CN.....                               | 1435        |
| 22.11 CODE16 directive.....                 | 1436        |
| 22.12 COMMON.....                           | 1436        |
| 22.13 CP.....                               | 1437        |

|                                      |      |
|--------------------------------------|------|
| 22.14 DATA.....                      | 1438 |
| 22.15 DCB.....                       | 1438 |
| 22.16 DCD and DCDU.....              | 1439 |
| 22.17 DCDO.....                      | 1440 |
| 22.18 DCFD and DCFDU.....            | 1441 |
| 22.19 DCFS and DCFSU.....            | 1441 |
| 22.20 DCI.....                       | 1442 |
| 22.21 DCQ and DCQU.....              | 1443 |
| 22.22 DCW and DCWU.....              | 1445 |
| 22.23 END.....                       | 1445 |
| 22.24 ENDFUNC or ENDP.....           | 1446 |
| 22.25 ENTRY.....                     | 1446 |
| 22.26 EQU.....                       | 1447 |
| 22.27 EXPORT or GLOBAL.....          | 1448 |
| 22.28 EXPORTAS.....                  | 1450 |
| 22.29 FIELD.....                     | 1451 |
| 22.30 FRAME ADDRESS.....             | 1452 |
| 22.31 FRAME POP.....                 | 1453 |
| 22.32 FRAME PUSH.....                | 1454 |
| 22.33 FRAME REGISTER.....            | 1455 |
| 22.34 FRAME RESTORE.....             | 1455 |
| 22.35 FRAME RETURN ADDRESS.....      | 1456 |
| 22.36 FRAME SAVE.....                | 1457 |
| 22.37 FRAME STATE REMEMBER.....      | 1457 |
| 22.38 FRAME STATE RESTORE.....       | 1458 |
| 22.39 FRAME UNWIND ON.....           | 1458 |
| 22.40 FRAME UNWIND OFF.....          | 1459 |
| 22.41 FUNCTION or PROC.....          | 1459 |
| 22.42 GBLA, GBL, and GBLS.....       | 1461 |
| 22.43 GET or INCLUDE.....            | 1462 |
| 22.44 IF, ELSE, ENDIF, and ELIF..... | 1463 |
| 22.45 IMPORT and EXTERN.....         | 1465 |
| 22.46 INCBIN.....                    | 1467 |
| 22.47 INFO.....                      | 1468 |
| 22.48 KEEP.....                      | 1469 |
| 22.49 LCLA, LCLL, and LCLS.....      | 1469 |

|                                             |             |
|---------------------------------------------|-------------|
| 22.50 LTORG.....                            | 1470        |
| 22.51 MACRO and MEND.....                   | 1471        |
| 22.52 MAP.....                              | 1474        |
| 22.53 MEXIT.....                            | 1475        |
| 22.54 NOFP.....                             | 1475        |
| 22.55 OPT.....                              | 1476        |
| 22.56 QN, DN, and SN.....                   | 1477        |
| 22.57 RELOC.....                            | 1479        |
| 22.58 REQUIRE.....                          | 1480        |
| 22.59 REQUIRE8 and PRESERVE8.....           | 1480        |
| 22.60 RLIST.....                            | 1481        |
| 22.61 RN.....                               | 1482        |
| 22.62 ROUT.....                             | 1483        |
| 22.63 SETA, SETL, and SETS.....             | 1483        |
| 22.64 SPACE or FILL.....                    | 1485        |
| 22.65 THUMB directive.....                  | 1486        |
| 22.66 TTL and SUBT.....                     | 1487        |
| 22.67 WHILE and WEND.....                   | 1487        |
| 22.68 WN and XN.....                        | 1488        |
| <b>23. Via File Syntax.....</b>             | <b>1490</b> |
| 23.1 Overview of via files.....             | 1490        |
| 23.2 Via file syntax rules.....             | 1490        |
| <b>24. armasm User Guide Changes.....</b>   | <b>1493</b> |
| 24.1 Changes for the armasm User Guide..... | 1493        |

# List of Figures

|                                                                                               |     |
|-----------------------------------------------------------------------------------------------|-----|
| Figure 2-1: Integration boundaries in Arm Compiler for Embedded 6.....                        | 68  |
| Figure 4-1: Organization of general-purpose registers and Program Status Registers.....       | 79  |
| Figure 10-1: Extension register bank for Advanced SIMD in AArch32 state.....                  | 161 |
| Figure 10-2: Extension register bank for Advanced SIMD in AArch64 state.....                  | 163 |
| Figure 11-1: Extension register bank for floating-point in AArch32 state.....                 | 177 |
| Figure 11-2: Extension register bank for floating-point in AArch64 state.....                 | 178 |
| Figure 14-1: ASR #3.....                                                                      | 248 |
| Figure 14-2: LSR #3.....                                                                      | 249 |
| Figure 14-3: LSL #3.....                                                                      | 249 |
| Figure 14-4: ROR #3.....                                                                      | 250 |
| Figure 14-5: RRX.....                                                                         | 250 |
| Figure 15-1: De-interleaving an array of 3-element structures.....                            | 509 |
| Figure 15-2: Operation of doubleword VEXT for imm = 3.....                                    | 543 |
| Figure 15-3: Example of operation of VPADAL (in this case for data type I16).....             | 578 |
| Figure 15-4: Example of operation of VPADD (in this case, for data type I16).....             | 578 |
| Figure 15-5: Example of operation of doubleword VPADDL (in this case, for data type S16)..... | 579 |
| Figure 15-6: Operation of quadword VSHL.64 Qd, Qm, #1.....                                    | 607 |
| Figure 15-7: Operation of quadword VSLI.64 Qd, Qm, #1.....                                    | 611 |
| Figure 15-8: Operation of doubleword VSRI.64 Dd, Dm, #2.....                                  | 613 |
| Figure 15-9: Operation of doubleword VTRN.8.....                                              | 623 |
| Figure 15-10: Operation of doubleword VTRN.32.....                                            | 623 |

# List of Tables

|                                                                                 |     |
|---------------------------------------------------------------------------------|-----|
| Table 4-1: Arm processor modes.....                                             | 77  |
| Table 4-2: Predeclared core registers in AArch32 state.....                     | 81  |
| Table 4-3: Predeclared extension registers in AArch32 state.....                | 82  |
| Table 4-4: A32 instruction groups.....                                          | 86  |
| Table 5-1: Predeclared core registers in AArch64 state.....                     | 91  |
| Table 5-2: Predeclared extension registers in AArch64 state.....                | 92  |
| Table 5-3: A64 instruction groups.....                                          | 96  |
| Table 7-1: Syntax differences between UAL and A64 assembly language.....        | 104 |
| Table 7-2: A32 state immediate values (8-bit).....                              | 107 |
| Table 7-3: A32 state immediate values in MOV instructions.....                  | 107 |
| Table 7-4: 32-bit T32 immediate values.....                                     | 108 |
| Table 7-5: 32-bit T32 immediate values in MOV instructions.....                 | 108 |
| Table 7-6: Stack-oriented suffixes and equivalent addressing mode suffixes..... | 119 |
| Table 7-7: Suffixes for load and store multiple instructions.....               | 120 |
| Table 8-1: Condition code suffixes.....                                         | 139 |
| Table 8-2: Condition code suffixes and related flags.....                       | 140 |
| Table 8-3: Condition codes.....                                                 | 141 |
| Table 8-4: Conditional branches only.....                                       | 143 |
| Table 8-5: All instructions conditional.....                                    | 144 |
| Table 9-1: Built-in variables.....                                              | 148 |
| Table 9-2: Built-in Boolean constants.....                                      | 149 |
| Table 9-3: Predefined macros.....                                               | 150 |
| Table 9-4: armclang equivalent command-line options.....                        | 157 |

|                                                                                                          |     |
|----------------------------------------------------------------------------------------------------------|-----|
| Table 10-1: Differences in syntax and mnemonics between A32/T32 and A64 Advanced SIMD instructions.....  | 165 |
| Table 10-2: Advanced SIMD data types.....                                                                | 168 |
| Table 10-3: Advanced SIMD saturation ranges.....                                                         | 171 |
| Table 11-1: Differences in syntax and mnemonics between A32/T32 and A64 floating-point instructions..... | 180 |
| Table 12-1: Supported Arm architectures.....                                                             | 193 |
| Table 12-2: Severity of diagnostic messages.....                                                         | 197 |
| Table 12-3: Specifying a command-line option and an AREA directive for GNU-stack sections.....           | 203 |
| Table 13-1: Unary operators that return strings.....                                                     | 232 |
| Table 13-2: Unary operators that return numeric or logical values.....                                   | 232 |
| Table 13-3: Multiplicative operators.....                                                                | 234 |
| Table 13-4: String manipulation operators.....                                                           | 234 |
| Table 13-5: Shift operators.....                                                                         | 235 |
| Table 13-6: Addition, subtraction, and logical operators.....                                            | 235 |
| Table 13-7: Relational operators.....                                                                    | 236 |
| Table 13-8: Boolean operators.....                                                                       | 237 |
| Table 13-9: Operator precedence in Arm assembly language.....                                            | 238 |
| Table 13-10: Operator precedence in C.....                                                               | 238 |
| Table 14-1: Summary of instructions.....                                                                 | 240 |
| Table 14-2: PC-relative offsets.....                                                                     | 257 |
| Table 14-3: Register-relative offsets.....                                                               | 258 |
| Table 14-4: B instruction availability and range.....                                                    | 264 |
| Table 14-5: BL instruction availability and range.....                                                   | 270 |
| Table 14-6: BLX instruction availability and range.....                                                  | 272 |
| Table 14-7: BX instruction availability and range.....                                                   | 274 |

|                                                                                                          |     |
|----------------------------------------------------------------------------------------------------------|-----|
| Table 14-8: BXJ instruction availability and range.....                                                  | 275 |
| Table 14-9: Permitted instructions inside an IT block.....                                               | 302 |
| Table 14-10: Offsets and architectures, LDR, word, halfword, and byte.....                               | 311 |
| Table 14-11: PC-relative offsets.....                                                                    | 314 |
| Table 14-13: Register-relative offsets.....                                                              | 318 |
| Table 14-15: Offsets and architectures, STR, word, halfword, and byte.....                               | 435 |
| Table 14-19: Range and encoding of expr.....                                                             | 479 |
| Table 15-1: Summary of Advanced SIMD instructions.....                                                   | 505 |
| Table 15-2: Summary of shared Advanced SIMD and floating-point instructions.....                         | 507 |
| Table 15-3: Patterns for immediate value in VBIC (immediate).....                                        | 520 |
| Table 15-4: Permitted combinations of parameters for VLDn (single n-element structure to one lane).....  | 547 |
| Table 15-5: Permitted combinations of parameters for VLDn (single n-element structure to all lanes)..... | 549 |
| Table 15-6: Permitted combinations of parameters for VLDn (multiple n-element structures).....           | 550 |
| Table 15-9: Patterns for immediate value in VORR (immediate).....                                        | 577 |
| Table 15-10: Available immediate ranges in VQRSHRN and VQRSHRUN (by immediate).....                      | 590 |
| Table 15-11: Available immediate ranges in VQSHL and VQSHLU (by immediate).....                          | 592 |
| Table 15-12: Available immediate ranges in VQSHRN and VQSHRUN (by immediate)....                         | 593 |
| Table 15-14: Results for out-of-range inputs in VRECPS.....                                              | 596 |
| Table 15-15: Available immediate ranges in VRSHR (by immediate).....                                     | 600 |
| Table 15-16: Available immediate ranges in VRSHRN (by immediate).....                                    | 600 |
| Table 15-18: Results for out-of-range inputs in VRSQRTS.....                                             | 604 |
| Table 15-19: Available immediate ranges in VRSRA (by immediate).....                                     | 604 |
| Table 15-20: Available immediate ranges in VSHL (by immediate).....                                      | 606 |
| Table 15-21: Available immediate ranges in VSHLL (by immediate).....                                     | 608 |

|                                                                                                          |      |
|----------------------------------------------------------------------------------------------------------|------|
| Table 15-22: Available immediate ranges in VSHR (by immediate).....                                      | 609  |
| Table 15-23: Available immediate ranges in VSHRN (by immediate).....                                     | 610  |
| Table 15-24: Available immediate ranges in VSRA (by immediate).....                                      | 612  |
| Table 15-25: Permitted combinations of parameters for VSTn (multiple n-element structures).....          | 615  |
| Table 15-26: Permitted combinations of parameters for VSTn (single n-element structure to one lane)..... | 617  |
| Table 16-1: Summary of floating-point instructions.....                                                  | 627  |
| Table 17-1: Summary of A64 general instructions.....                                                     | 664  |
| Table 17-2: ADD (64-bit general registers) specifier combinations.....                                   | 672  |
| Table 17-3: ADDS (64-bit general registers) specifier combinations.....                                  | 675  |
| Table 17-4: SYS parameter values corresponding to AT operations.....                                     | 687  |
| Table 17-5: CMN (64-bit general registers) specifier combinations.....                                   | 711  |
| Table 17-6: CMP (64-bit general registers) specifier combinations.....                                   | 714  |
| Table 17-7: SYS parameter values corresponding to DC operations.....                                     | 726  |
| Table 17-8: SYS parameter values corresponding to IC operations.....                                     | 740  |
| Table 17-9: SUB (64-bit general registers) specifier combinations.....                                   | 791  |
| Table 17-10: SUBS (64-bit general registers) specifier combinations.....                                 | 794  |
| Table 17-11: SYS parameter values corresponding to TLBI operations.....                                  | 803  |
| Table 18-1: Summary of A64 data transfer instructions.....                                               | 818  |
| Table 19-1: Summary of A64 floating-point instructions.....                                              | 950  |
| Table 20-1: Summary of A64 SIMD scalar instructions.....                                                 | 1024 |
| Table 20-2: DUP (Scalar) specifier combinations.....                                                     | 1038 |
| Table 20-3: FCMLA (Scalar) specifier combinations.....                                                   | 1051 |
| Table 20-4: FCVTZS (Scalar) specifier combinations.....                                                  | 1063 |
| Table 20-5: FCVTZU (Scalar) specifier combinations.....                                                  | 1065 |

|                                                                                               |      |
|-----------------------------------------------------------------------------------------------|------|
| Table 20-6: FMLA (Scalar, single-precision and double-precision) specifier combinations.....  | 1072 |
| Table 20-7: FMLS (Scalar, single-precision and double-precision) specifier combinations.....  | 1073 |
| Table 20-8: FMUL (Scalar, single-precision and double-precision) specifier combinations.....  | 1074 |
| Table 20-9: FMULX (Scalar, single-precision and double-precision) specifier combinations..... | 1075 |
| Table 20-10: MOV (Scalar) specifier combinations.....                                         | 1081 |
| Table 20-11: SCVTF (Scalar) specifier combinations.....                                       | 1083 |
| Table 20-12: SQDMLAL (Scalar) specifier combinations.....                                     | 1088 |
| Table 20-13: SQDMLAL (Scalar) specifier combinations.....                                     | 1089 |
| Table 20-14: SQDMLSL (Scalar) specifier combinations.....                                     | 1090 |
| Table 20-15: SQDMLSL (Scalar) specifier combinations.....                                     | 1091 |
| Table 20-16: SQDMULH (Scalar) specifier combinations.....                                     | 1092 |
| Table 20-17: SQDMULL (Scalar) specifier combinations.....                                     | 1094 |
| Table 20-18: SQDMULL (Scalar) specifier combinations.....                                     | 1095 |
| Table 20-19: SQRDMLAH (Scalar) specifier combinations.....                                    | 1097 |
| Table 20-20: SQRDMLSH (Scalar) specifier combinations.....                                    | 1099 |
| Table 20-21: SQRDMULH (Scalar) specifier combinations.....                                    | 1101 |
| Table 20-22: SQRSHRN (Scalar) specifier combinations.....                                     | 1104 |
| Table 20-23: SQRSHRUN (Scalar) specifier combinations.....                                    | 1105 |
| Table 20-24: SQSHL (Scalar) specifier combinations.....                                       | 1106 |
| Table 20-25: SQSHLU (Scalar) specifier combinations.....                                      | 1108 |
| Table 20-26: SQSHRN (Scalar) specifier combinations.....                                      | 1109 |
| Table 20-27: SQSHRUN (Scalar) specifier combinations.....                                     | 1110 |
| Table 20-28: SQXTN (Scalar) specifier combinations.....                                       | 1112 |
| Table 20-29: SQXTUN (Scalar) specifier combinations.....                                      | 1113 |

|                                                                                                |      |
|------------------------------------------------------------------------------------------------|------|
| Table 20-30: UCVTF (Scalar) specifier combinations.....                                        | 1120 |
| Table 20-31: UQRSHRN (Scalar) specifier combinations.....                                      | 1124 |
| Table 20-32: UQSHL (Scalar) specifier combinations.....                                        | 1125 |
| Table 20-33: UQSHRN (Scalar) specifier combinations.....                                       | 1127 |
| Table 20-34: UQXTN (Scalar) specifier combinations.....                                        | 1129 |
| Table 21-1: Summary of A64 SIMD Vector instructions.....                                       | 1135 |
| Table 21-2: ADDHN, ADDHN2 (Vector) specifier combinations.....                                 | 1146 |
| Table 21-3: ADDV (Vector) specifier combinations.....                                          | 1148 |
| Table 21-4: DUP (Vector) specifier combinations.....                                           | 1163 |
| Table 21-5: DUP (Vector) specifier combinations.....                                           | 1164 |
| Table 21-6: EXT (Vector) specifier combinations.....                                           | 1166 |
| Table 21-7: FCVTL, FCVTL2 (Vector) specifier combinations.....                                 | 1185 |
| Table 21-8: FCVTN, FCVTN2 (Vector) specifier combinations.....                                 | 1188 |
| Table 21-9: FCVTXN{2} (Vector) specifier combinations.....                                     | 1193 |
| Table 21-10: FCVTZS (Vector) specifier combinations.....                                       | 1194 |
| Table 21-11: FCVTZU (Vector) specifier combinations.....                                       | 1197 |
| Table 21-12: FMLA (Vector, single-precision and double-precision) specifier combinations.....  | 1213 |
| Table 21-13: FMLS (Vector, single-precision and double-precision) specifier combinations.....  | 1216 |
| Table 21-14: FMUL (Vector, single-precision and double-precision) specifier combinations.....  | 1220 |
| Table 21-15: FMULX (Vector, single-precision and double-precision) specifier combinations..... | 1222 |
| Table 21-16: INS (Vector) specifier combinations.....                                          | 1239 |
| Table 21-17: INS (Vector) specifier combinations.....                                          | 1240 |
| Table 21-18: LD1 (One register, immediate offset) specifier combinations.....                  | 1242 |
| Table 21-19: LD1 (Two registers, immediate offset) specifier combinations.....                 | 1242 |

|                                                                                  |      |
|----------------------------------------------------------------------------------|------|
| Table 21-20: LD1 (Three registers, immediate offset) specifier combinations..... | 1243 |
| Table 21-21: LD1 (Four registers, immediate offset) specifier combinations.....  | 1243 |
| Table 21-22: LD1R (Immediate offset) specifier combinations.....                 | 1245 |
| Table 21-23: LD2R (Immediate offset) specifier combinations.....                 | 1249 |
| Table 21-24: LD3R (Immediate offset) specifier combinations.....                 | 1252 |
| Table 21-25: LD4R (Immediate offset) specifier combinations.....                 | 1256 |
| Table 21-26: MLA (Vector) specifier combinations.....                            | 1257 |
| Table 21-27: MLS (Vector) specifier combinations.....                            | 1259 |
| Table 21-28: MOV (Vector) specifier combinations.....                            | 1261 |
| Table 21-29: MOV (Vector) specifier combinations.....                            | 1262 |
| Table 21-30: MUL (Vector) specifier combinations.....                            | 1266 |
| Table 21-31: PMULL, PMULL2 (Vector) specifier combinations.....                  | 1274 |
| Table 21-32: RADDHN, RADDHN2 (Vector) specifier combinations.....                | 1275 |
| Table 21-33: RSHRN, RSHRN2 (Vector) specifier combinations.....                  | 1279 |
| Table 21-34: RSUBHN, RSUBHN2 (Vector) specifier combinations.....                | 1280 |
| Table 21-35: SABAL, SABAL2 (Vector) specifier combinations.....                  | 1282 |
| Table 21-36: SABDL, SABDL2 (Vector) specifier combinations.....                  | 1284 |
| Table 21-37: SADALP (Vector) specifier combinations.....                         | 1285 |
| Table 21-38: SADDL, SADDL2 (Vector) specifier combinations.....                  | 1286 |
| Table 21-39: SADDLP (Vector) specifier combinations.....                         | 1287 |
| Table 21-40: SADDLV (Vector) specifier combinations.....                         | 1288 |
| Table 21-41: SADDW, SADDW2 (Vector) specifier combinations.....                  | 1290 |
| Table 21-42: SCVTF (Vector) specifier combinations.....                          | 1291 |
| Table 21-43: SHL (Vector) specifier combinations.....                            | 1293 |
| Table 21-44: SHLL, SHLL2 (Vector) specifier combinations.....                    | 1294 |
| Table 21-45: SHRN, SHRN2 (Vector) specifier combinations.....                    | 1296 |

|                                                                 |      |
|-----------------------------------------------------------------|------|
| Table 21-46: SLI (Vector) specifier combinations.....           | 1297 |
| Table 21-47: SMAXV (Vector) specifier combinations.....         | 1300 |
| Table 21-48: SMINV (Vector) specifier combinations.....         | 1302 |
| Table 21-49: SMLAL, SMLAL2 (Vector) specifier combinations..... | 1303 |
| Table 21-50: SMLAL, SMLAL2 (Vector) specifier combinations..... | 1304 |
| Table 21-51: SMLSL, SMLSL2 (Vector) specifier combinations..... | 1306 |
| Table 21-52: SMLSL, SMLSL2 (Vector) specifier combinations..... | 1307 |
| Table 21-53: SMOV (32-bit) specifier combinations.....          | 1308 |
| Table 21-54: SMOV (64-bit) specifier combinations.....          | 1308 |
| Table 21-55: SMULL, SMULL2 (Vector) specifier combinations..... | 1309 |
| Table 21-56: SMULL, SMULL2 (Vector) specifier combinations..... | 1311 |
| Table 21-57: SQDMLAL{2} (Vector) specifier combinations.....    | 1313 |
| Table 21-58: SQDMLAL{2} (Vector) specifier combinations.....    | 1315 |
| Table 21-59: SQDMLSL{2} (Vector) specifier combinations.....    | 1316 |
| Table 21-60: SQDMLSL{2} (Vector) specifier combinations.....    | 1317 |
| Table 21-61: SQDMULH (Vector) specifier combinations.....       | 1318 |
| Table 21-62: SQDMULL{2} (Vector) specifier combinations.....    | 1321 |
| Table 21-63: SQDMULL{2} (Vector) specifier combinations.....    | 1322 |
| Table 21-64: SQRDMLAH (Vector) specifier combinations.....      | 1324 |
| Table 21-65: SQRDMLSH (Vector) specifier combinations.....      | 1326 |
| Table 21-66: SQRDMULH (Vector) specifier combinations.....      | 1328 |
| Table 21-67: SQRSHRN{2} (Vector) specifier combinations.....    | 1331 |
| Table 21-68: SQRSHRUN{2} (Vector) specifier combinations.....   | 1332 |
| Table 21-69: SQSHL (Vector) specifier combinations.....         | 1333 |
| Table 21-70: SQSHLU (Vector) specifier combinations.....        | 1335 |
| Table 21-71: SQSHRN{2} (Vector) specifier combinations.....     | 1336 |

|                                                                                  |      |
|----------------------------------------------------------------------------------|------|
| Table 21-72: SQSHRUN{2} (Vector) specifier combinations.....                     | 1337 |
| Table 21-73: SQXTN{2} (Vector) specifier combinations.....                       | 1339 |
| Table 21-74: SQXTUN{2} (Vector) specifier combinations.....                      | 1340 |
| Table 21-75: SRI (Vector) specifier combinations.....                            | 1342 |
| Table 21-76: SRSHR (Vector) specifier combinations.....                          | 1344 |
| Table 21-77: SRSRA (Vector) specifier combinations.....                          | 1345 |
| Table 21-78: SSHLL, SSHLL2 (Vector) specifier combinations.....                  | 1347 |
| Table 21-79: SSHR (Vector) specifier combinations.....                           | 1348 |
| Table 21-80: SSRA (Vector) specifier combinations.....                           | 1349 |
| Table 21-81: SSUBL, SSUBL2 (Vector) specifier combinations.....                  | 1350 |
| Table 21-82: SSUBW, SSUBW2 (Vector) specifier combinations.....                  | 1351 |
| Table 21-83: ST1 (One register, immediate offset) specifier combinations.....    | 1353 |
| Table 21-84: ST1 (Two registers, immediate offset) specifier combinations.....   | 1354 |
| Table 21-85: ST1 (Three registers, immediate offset) specifier combinations..... | 1354 |
| Table 21-86: ST1 (Four registers, immediate offset) specifier combinations.....  | 1354 |
| Table 21-87: SUBHN, SUBHN2 (Vector) specifier combinations.....                  | 1364 |
| Table 21-88: SXTL, SXTL2 (Vector) specifier combinations.....                    | 1366 |
| Table 21-89: UABAL, UABAL2 (Vector) specifier combinations.....                  | 1372 |
| Table 21-90: UABDL, UABDL2 (Vector) specifier combinations.....                  | 1374 |
| Table 21-91: UADALP (Vector) specifier combinations.....                         | 1375 |
| Table 21-92: UADDL, UADDL2 (Vector) specifier combinations.....                  | 1376 |
| Table 21-93: UADDLP (Vector) specifier combinations.....                         | 1377 |
| Table 21-94: UADDLV (Vector) specifier combinations.....                         | 1378 |
| Table 21-95: UADDW, UADDW2 (Vector) specifier combinations.....                  | 1380 |
| Table 21-96: UCVTF (Vector) specifier combinations.....                          | 1381 |
| Table 21-97: UMAXV (Vector) specifier combinations.....                          | 1385 |

|                                                                            |      |
|----------------------------------------------------------------------------|------|
| Table 21-98: UMINV (Vector) specifier combinations.....                    | 1388 |
| Table 21-99: UMLAL, UMLAL2 (Vector) specifier combinations.....            | 1389 |
| Table 21-100: UMLAL, UMLAL2 (Vector) specifier combinations.....           | 1390 |
| Table 21-101: UMLSL, UMLSL2 (Vector) specifier combinations.....           | 1391 |
| Table 21-102: UMLSL, UMLSL2 (Vector) specifier combinations.....           | 1392 |
| Table 21-103: UMOV (32-bit) specifier combinations.....                    | 1394 |
| Table 21-104: UMULL, UMULL2 (Vector) specifier combinations.....           | 1395 |
| Table 21-105: UMULL, UMULL2 (Vector) specifier combinations.....           | 1396 |
| Table 21-106: UQRSHRN{2} (Vector) specifier combinations.....              | 1399 |
| Table 21-107: UQSHL (Vector) specifier combinations.....                   | 1400 |
| Table 21-108: UQSHRN{2} (Vector) specifier combinations.....               | 1402 |
| Table 21-109: UQXTN{2} (Vector) specifier combinations.....                | 1404 |
| Table 21-110: URSHR (Vector) specifier combinations.....                   | 1407 |
| Table 21-111: URSRA (Vector) specifier combinations.....                   | 1409 |
| Table 21-112: USHLL, USHLL2 (Vector) specifier combinations.....           | 1411 |
| Table 21-113: USHR (Vector) specifier combinations.....                    | 1412 |
| Table 21-114: USRA (Vector) specifier combinations.....                    | 1414 |
| Table 21-115: USUBL, USUBL2 (Vector) specifier combinations.....           | 1415 |
| Table 21-116: USUBW, USUBW2 (Vector) specifier combinations.....           | 1416 |
| Table 21-117: UXTL, UXTL2 (Vector) specifier combinations.....             | 1417 |
| Table 21-118: XTN, XTN2 (Vector) specifier combinations.....               | 1420 |
| Table 22-1: List of directives.....                                        | 1423 |
| Table 22-2: OPT directive settings.....                                    | 1476 |
| Table 24-1: Changes between 6.6.5 (revision L) and 6.6.4 (revision K)..... | 1493 |

# 1. Introduction

Arm® Compiler armasm User Guide. This document provides topic based documentation for using the Arm assembler (`armasm`). It contains information on command line options, assembler directives, and supports the Armv6-M, Armv7, and Armv8 architectures.

## 1.1 Conventions

The following subsections describe conventions used in Arm documents.

### Glossary

The Arm Glossary is a list of terms used in Arm documentation, together with definitions for those terms. The Arm Glossary does not contain terms that are industry standard unless the Arm meaning differs from the generally accepted meaning.

See the Arm® Glossary for more information: [developer.arm.com/glossary](https://developer.arm.com/glossary).

### Typographic conventions

Arm documentation uses typographical conventions to convey specific meaning.

| Convention                                                                                 | Use                                                                                                                                                                                                     |
|--------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <i>italic</i>                                                                              | Citations.                                                                                                                                                                                              |
| <b>bold</b>                                                                                | Interface elements, such as menu names.<br><br>Terms in descriptive lists, where appropriate.                                                                                                           |
| <code>monospace</code>                                                                     | Text that you can enter at the keyboard, such as commands, file and program names, and source code.                                                                                                     |
| <code>monospace <u>underline</u></code>                                                    | A permitted abbreviation for a command or option. You can enter the underlined text instead of the full command or option name.                                                                         |
| <and>                                                                                      | Encloses replaceable terms for assembler syntax where they appear in code or code fragments.<br><br>For example:<br><br><code>MRC p15, 0, &lt;Rd&gt;, &lt;CRn&gt;, &lt;CRm&gt;, &lt;Opcode_2&gt;</code> |
| <b>SMALL CAPITALS</b>                                                                      | Terms that have specific technical meanings as defined in the Arm® Glossary. For example, <b>IMPLEMENTATION DEFINED</b> , <b>IMPLEMENTATION SPECIFIC</b> , <b>UNKNOWN</b> , and <b>UNPREDICTABLE</b> .  |
|  Caution | Recommendations. Not following these recommendations might lead to system failure or damage.                                                                                                            |
|  Warning | Requirements for the system. Not following these requirements might result in system failure or damage.                                                                                                 |
|  Danger  | Requirements for the system. Not following these requirements will result in system failure or damage.                                                                                                  |

| Convention                                                                                   | Use                                                                                |
|----------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| <br>Note     | An important piece of information that needs your attention.                       |
| <br>Tip      | A useful tip that might make it easier, better or faster to perform a task.        |
| <br>Remember | A reminder of something important that relates to the information you are reading. |

## 1.2 Other information

See the Arm website for other relevant information.

- [Arm® Developer](#).
- [Arm® Documentation](#).
- [Technical Support](#).
- [Arm® Glossary](#).

## 2. Overview of the Assembler

Gives an overview of the assemblers provided with Arm Compiler toolchain.

### 2.1 About the Arm Compiler toolchain assemblers

The Arm® Compiler toolchain provides different assemblers.

They are:

- The freestanding legacy assembler, `armasm`. Use `armasm` to assemble existing A64, A32, and T32 assembly language code written in armasm syntax.
- The `armclang` integrated assembler. Use the integrated assembler to assemble assembly language code written in GNU syntax.
- An optimizing inline assembler built into `armclang`. Use the inline assembler to assemble assembly language code written in GNU syntax that is used inline in C or C++ source code.



This book only applies to `armasm`. For information on `armclang`, see the *armclang Reference Guide*.

Note



Be aware of the following:

- Generated code might be different between two Arm Compiler releases.
- For a feature release, there might be significant code generation differences.



The command-line option descriptions and related information in the individual Arm Compiler tools documents describe all the features that Arm Compiler supports. Any features not documented are not supported and are used at your own risk. You are responsible for making sure that any generated code using [Support level definitions](#) is operating correctly.

Note

#### Related information

[Arm Compiler armclang Reference Guide](#)

[Mixing Assembly Code with C or C++ Code](#)

[Assembling armasm and GNU syntax assembly code](#)

## 2.2 Key features of the armasm assembler

The `armasm` assembler supports instructions, directives, and user-defined macros.

It supports:

- Unified Assembly Language (UAL) for both A32 and T32 code.
- Assembly language for A64 code.
- Advanced SIMD instructions in A64, A32, and T32 code.
- Floating-point instructions in A64, A32, and T32 code.
- Directives in assembly source code.
- Processing of user-defined macros.

### Related information

[How the assembler works](#) on page 62

[About the Unified Assembler Language](#) on page 103

[Architecture support for Advanced SIMD](#) on page 160

[Use of macros](#) on page 124

[Advanced SIMD Programming](#) on page 160

[Directives Reference](#) on page 1423

## 2.3 How the assembler works

`armasm` reads the assembly language source code twice before it outputs object code. Each read of the source code is called a pass.

This is because assembly language source code often contains forward references. A forward reference occurs when a label is used as an operand, for example as a branch target, earlier in the code than the definition of the label. The assembler cannot know the address of the forward reference label until it reads the definition of the label.

During each pass, the assembler performs different functions. In the first pass, the assembler:

- Checks the syntax of the instruction or directive. It faults if there is an error in the syntax, for example if a label is specified on a directive that does not accept one.
- Determines the size of the instruction and data being assembled and reserves space.
- Determines offsets of labels within sections.
- Creates a symbol table containing label definitions and their memory addresses.

In the second pass, the assembler:

- Faults if an undefined reference is specified in an instruction operand or directive.
- Encodes the instructions using the label offsets from pass 1, where applicable.

- Generates relocations.
- Generates debug information if requested.
- Outputs the object file.

Memory addresses of labels are determined and finalized in the first pass. Therefore, the assembly code must not change during the second pass. All instructions must be seen in both passes.

Therefore you must not define a symbol after a `:DEF:` test for the symbol. The assembler faults if it sees code in pass 2 that was not seen in pass 1.

### Line not seen in pass 1

The following example shows that `num EQU 42` is not seen in pass 1 but is seen in pass 2:

```
AREA x, CODE
[ :DEF: foo
num EQU 42
]
foo DCD num
END
```

Assembling this code generates the error:

```
A1903E: Line not seen in first pass; cannot be assembled.
```

### Line not seen in pass 2

The following example shows that `MOV r1, r2` is seen in pass 1 but not in pass 2:

```
AREA x, CODE
[ :LNOT: :DEF: foo
MOV r1, r2
]
foo MOV r3, r4
END
```

Assembling this code generates the error:

```
A1909E: Line not seen in second pass; cannot be assembled.
```

### Related information

[Directives that can be omitted in pass 2 of the assembler](#) on page 63

[Two pass assembler diagnostics](#) on page 155

[Instruction and directive relocations](#) on page 127

[--diag\\_error=tag\[,tag,...\]](#) on page 197

[--debug](#) on page 196

## 2.4 Directives that can be omitted in pass 2 of the assembler

Most directives must appear in both passes of the assembly process. You can omit some directives from the second pass over the source code by the assembler. However, omitting some directives from the second pass is strongly discouraged.

Directives that can be omitted from pass 2 are:

- GBLA, GBLL, GBLS.
- LCLA, LCLL, LCLS.
- SETA, SETL, SETS.
- RN, RLIST.
- CN, CP.
- SN, DN, QN.
- EQU.
- MAP, FIELD.
- GET, INCLUDE.
- IF, ELSE, ELIF, ENDIF.
- WHILE, WEND.
- ASSERT.
- ATTR.
- COMMON.
- EXPORTAS.
- IMPORT.
- EXTERN.
- KEEP.
- MACRO, MEND, MEXIT.
- REQUIRE8.
- PRESERVE8.



Macros that appear only in pass 1 and not in pass 2 must contain only these directives.

Note

## ASSERT directive appears in pass 1 only

The code in the following example assembles without error although the `ASSERT` directive does not appear in pass 2:

```
x     AREA  ||.text||,CODE
x     EQU 42
IF :LNOT: :DEF: sym
    ASSERT x == 42
ENDIF
sym EQU 1
END
```

## Use of ELSE and ELIF directives

Directives that appear in pass 2 but do not appear in pass 1 cause an assembly error. However, this does not cause an assembly error when using the `ELSE` and `ELIF` directives if their matching `IF` directive appears in pass 1. The following example assembles without error because the `IF` directive appears in pass 1:

```
x     AREA  ||.text||,CODE
x     EQU 42
IF :DEF: sym
ELSE
    ASSERT x == 42
ENDIF
sym EQU 1
END
```

## Related information

[How the assembler works](#) on page 62

[Two pass assembler diagnostics](#) on page 155

## 2.5 Support level definitions

This describes the levels of support for various Arm® Compiler 6 features.

Arm Compiler 6 is built on Clang and LLVM technology. Therefore, it has more functionality than the set of product features described in the documentation. The following definitions clarify the levels of support and guarantees on functionality that are expected from these features.

Arm welcomes feedback regarding the use of all Arm Compiler 6 features, and intends to support users to a level that is appropriate for that feature. You can contact support at <https://developer.arm.com/support>.

### Identification in the documentation

All features that are documented in the Arm Compiler 6 documentation are product features, except where explicitly stated. The limitations of non-product features are explicitly stated.

## Product features

Product features are suitable for use in a production environment. The functionality is well tested, and is expected to be stable across feature and update releases.

- Arm intends to give advance notice of significant functionality changes to product features.
- If you have a support and maintenance contract, Arm provides full support for use of all product features.
- Arm welcomes feedback on product features.
- Any issues with product features that Arm encounters or is made aware of are considered for fixing in future versions of Arm Compiler.

In addition to fully supported product features, some product features are only alpha or beta quality.

### Beta product features

Beta product features are implementation complete, but have not been sufficiently tested to be regarded as suitable for use in production environments.

Beta product features are identified with [BETA].

- Arm endeavors to document known limitations on beta product features.
- Beta product features are expected to eventually become product features in a future release of Arm Compiler 6.
- Arm encourages the use of beta product features, and welcomes feedback on them.
- Any issues with beta product features that Arm encounters or is made aware of are considered for fixing in future versions of Arm Compiler.

### Alpha product features

Alpha product features are not implementation complete, and are subject to change in future releases, therefore the stability level is lower than in beta product features.

Alpha product features are identified with [ALPHA].

- Arm endeavors to document known limitations of alpha product features.
- Arm encourages the use of alpha product features, and welcomes feedback on them.
- Any issues with alpha product features that Arm encounters or is made aware of are considered for fixing in future versions of Arm Compiler.

## Community features

Arm Compiler 6 is built on LLVM technology and preserves the functionality of that technology where possible. This means that there are more features available in Arm Compiler that are not listed in the documentation. These extra features are known as community features. For information on these community features, see the [Clang Compiler User's Manual](#).

Where community features are referenced in the documentation, they are identified with [COMMUNITY].

- Arm makes no claims about the quality level or the degree of functionality of these features, except when explicitly stated in this documentation.
- Functionality might change significantly between feature releases.
- Arm makes no guarantees that community features remain functional across update releases, although changes are expected to be unlikely.

Some community features might become product features in the future, but Arm provides no roadmap for such features. Arm is interested in understanding your use of these features, and welcomes feedback on them. Arm supports customers using these features on a best-effort basis, unless the features are unsupported. Arm accepts defect reports on these features, but does not guarantee that these issues are to be fixed in future releases.

### Guidance on use of community features

There are several factors to consider when assessing the likelihood of a community feature being functional:

- The following figure shows the structure of the Arm Compiler 6 toolchain:

**Figure 2-1: Integration boundaries in Arm Compiler for Embedded 6.**

The dashed boxes are toolchain components, and any interaction between these components is an integration boundary. Community features that span an integration boundary might have significant limitations in functionality. The exception to such features is if the interaction is codified in one of the standards supported by Arm Compiler 6. See [Application Binary Interface \(ABI\)](#). Community features that do not span integration boundaries are more likely to work as expected.

- Features primarily used when targeting hosted environments such as Linux or BSD might have significant limitations, or might not be applicable, when targeting bare-metal environments.

- The Clang implementations of compiler features, particularly those features that have been present for a long time in other toolchains, are likely to be mature. The functionality of new features, such as support for new language features, is likely to be less mature and therefore more likely to have limited functionality.

## Deprecated features

A deprecated feature is one that Arm plans to remove from a future release of Arm Compiler. Arm does not make any guarantee regarding the testing or maintenance of deprecated features. Therefore, Arm does not recommend using a feature after it is deprecated.

For information on replacing deprecated features with supported features, see the Arm Compiler documentation and Release Notes. Where appropriate, each Arm Compiler document includes notes for features that are deprecated, and also provides entries in the changes appendix of that document.

## Unsupported features

With both the product and community feature categories, specific features and use-cases are known not to function correctly, or are not intended for use with Arm Compiler 6.

Limitations of product features are stated in the documentation. Arm cannot provide an exhaustive list of unsupported features or use-cases for community features. The known limitations on community features are listed in [Community features](#).

## List of known unsupported features

The following is an incomplete list of unsupported features, and might change over time:

- The Clang option `-stdlib=libstdc++` is not supported.
- C++ static initialization of local variables is not thread-safe when linked against the standard C++ libraries. For thread-safety, you must provide your own implementation of thread-safe functions as described in [Standard C++ library implementation definition](#).



This restriction does not apply to the [ALPHA]-supported multithreaded C++ libraries.

- 
- Use of C11 library features is unsupported.
  - Any community feature that is exclusively related to non-Arm architectures is not supported.
  - Except for Armv6-M, compilation for targets that implement architectures lower than Armv7 is not supported.
  - The `long double` data type is not supported for AArch64 state because of limitations in the current Arm C library.
  - C complex arithmetic is not supported, because of limitations in the current Arm C library.
  - Complex numbers are defined in C++ as a template, `std::complex`. Arm Compiler supports `std::complex` with the `float` and `double` types, but not the `long double` type because of limitations in the current Arm C library.



For C code that uses complex numbers, it is not sufficient to recompile with the C++ compiler to make that code work. How you can use complex numbers depends on whether you are building for Armv8-M architecture-based processors.

- You must take care when mixing translation units that are compiled with and without the [COMMUNITY] -fsigned-char option, and that share interfaces or data structures.



The Arm ABI defines `char` as an unsigned byte, and this is the interpretation used by the C libraries supplied with the Arm compilation tools.

## Alternatives to C complex numbers not being supported

If you are building for Armv8-M architecture-based processors, consider using the free and Open Source CMSIS-DSP library that includes a data type and library functions for complex number support in C. For more information about CMSIS-DSP and complex number support see the following sections of the CMSIS documentation:

- [Complex Math Functions](#)
- [Complex Matrix Multiplication](#)
- [Complex FFT Functions](#)

If you are not building for Armv8-M architecture-based processors, consider modifying the affected part of your project to use the C++ standard template library type `std::complex` instead.

# 3. Overview of the Armv8 Architecture

Gives an overview of the Arm®v8 architecture.

## 3.1 About the Arm architecture

The Arm® architecture is a load/store architecture. The addressing range depends on whether you are using the 32-bit or the 64-bit architecture.

Arm processors are typical of RISC processors in that only load and store instructions can access memory. Data processing instructions operate on register contents only.

Armv8 is the next major architectural update after Armv7. It introduces a 64-bit architecture, but maintains compatibility with existing 32-bit architectures. It uses two execution states:

### AArch32

In AArch32 state, code has access to 32-bit general purpose registers.

Code executing in AArch32 state can only use the A32 and T32 instruction sets. This state is broadly compatible with the Armv7-A architecture.

### AArch64

In AArch64 state, code has access to 64-bit general purpose registers. The AArch64 state exists only in the Armv8 architecture.

Code executing in AArch64 state can only use the A64 instruction set.

In the AArch32 execution state, there are the following instruction set states:

#### A32 state

The state that executes A32 instructions.

#### T32 state

The state that executes T32 instructions.

### Related information

[A-Profile Architectures](#)

## 3.2 A32 and T32 instruction sets

A32 instructions are 32 bits wide. T32 instructions are 32-bits wide with 16-bit instructions in some architectures.

The A32 instruction set provides a comprehensive range of operations.

Most of the functionality of the 32-bit A32 instruction set is available, but some operations require more instructions. The T32 instruction set provides better code density, at the expense of performance.

The 32-bit and 16-bit T32 instructions together provide almost the same functionality as the A32 instruction set. The T32 instruction set achieves the high performance of A32 code along with the benefits of better code density.

Arm®v6-M, Armv7-M, Armv8-M.baseline, and Armv8-M.mainline do not support the A32 instruction set. On these architectures, instructions must not attempt to change to A32 state. Armv7-A, Armv7-R, Armv8-A, and Armv8-R support both A32 and T32 instruction sets.



Except for Armv6-M and Armv6S-M, assembling code for architectures earlier than Armv7 is not supported in Arm Compiler 6.

---

In Armv8, the A32 and T32 instruction sets are largely unchanged from Armv7. They are only available when the processor is in AArch32 state. The main changes in Armv8 are the addition of a few new instructions and the deprecation of some behavior, including many uses of the `IT` instruction.

Armv8 also defines an optional Crypto Extension. This extension provides cryptographic and hash instructions in the A32 instruction set.



- The term A32 is an alias for the Arm instruction set.
- The term T32 is an alias for the Thumb® instruction set.

---

## Related information

[A32 and T32 instruction set overview](#) on page 85

## 3.3 A64 instruction set

A64 instructions are 32 bits wide.

Arm®v8 introduces a new set of 32-bit instructions called A64, with new encodings and assembly language. A64 is only available when the processor is in AArch64 state. It provides similar functionality to the A32 and T32 instruction sets, but gives access to a larger virtual address space, and has some other changes, including reduced conditionality.

Armv8 also defines an optional Crypto Extension. This extension provides cryptographic and hash instructions in the A64 instruction set.

## Related information

[A64 instruction set overview](#) on page 95

## 3.4 Changing between AArch64 and AArch32 states

The processor must be in the correct execution state for the instructions it is executing.

A processor that is executing A64 instructions is operating in AArch64 state. In this state, the instructions can access both the 64-bit and 32-bit registers.

A processor that is executing A32 or T32 instructions is operating in AArch32 state. In this state, the instructions can only access the 32-bit registers, and not the 64-bit registers.

A processor based on the Arm®v8 architecture can run applications built for AArch32 and AArch64 states but a change between AArch32 and AArch64 states can only happen at exception boundaries.

Arm Compiler toolchain builds images for either the AArch32 state or AArch64 state. Therefore, an image built with Arm Compiler toolchain can either contain only A32 and T32 instructions or only A64 instructions.

A processor can only execute instructions from the instruction set that matches its current execution state. A processor in AArch32 state cannot execute A64 instructions, and a processor in AArch64 state cannot execute A32 or T32 instructions. You must ensure that the processor never receives instructions from the wrong instruction set for the current execution state.

## Related information

[BLX, BLXNS \(A32\)](#) on page 271

[BX, BXNS \(A32\)](#) on page 273

[ARM or CODE32 directive](#) on page 1432

[CODE16 directive](#) on page 1436

[THUMB directive](#) on page 1486

## 3.5 Advanced SIMD

Advanced SIMD is a 64-bit and 128-bit hybrid Single Instruction Multiple Data (SIMD) technology targeted at advanced media and signal processing applications and embedded processors.

Advanced SIMD is implemented as part of an Arm®-based processor, but has its own execution pipelines and a register bank that is distinct from the general-purpose register bank.

Advanced SIMD instructions are available in both A32 and A64. The A64 Advanced SIMD instructions are based on the instructions in A32. The main differences are the following:

- Different instruction mnemonics and syntax.

- 32 128-bit vector registers, increased from 16 in A32.
- A different register packing scheme:
  - In A64, smaller registers occupy the low-order bits of larger registers. For example, S31 maps to bits[31:0] of D31.
  - In A32, smaller registers are packed into larger registers. For example, S31 maps to bits[63:32] of D15.
- A64 Advanced SIMD instructions support both single-precision and double-precision floating-point data types and arithmetic.
- A32 Advanced SIMD instructions support only single-precision floating-point data types.

## Related information

[Architecture support for Advanced SIMD](#) on page 160

[Views of the Advanced SIMD register bank in AArch32 state](#) on page 164

[Views of the Advanced SIMD register bank in AArch64 state](#) on page 164

[Advanced SIMD Programming](#) on page 160

## 3.6 Floating-point hardware

There are several floating-point architecture versions and variants.

The floating-point hardware, together with associated support code, provides single-precision and double-precision floating-point arithmetic, as defined by *IEEE Std. 754-2008 IEEE Standard for Floating-Point Arithmetic*. This document is referred to as the IEEE 754 standard.

The floating-point hardware uses a register bank that is distinct from the Arm® core register bank.



The floating-point register bank is shared with the SIMD register bank.

Note

---

In AArch32 state, floating-point support is largely unchanged from VFPv4, apart from the addition of a few instructions for compliance with the IEEE 754 standard.

The floating-point architecture in AArch64 state is also based on VFPv4. The main differences are the following:

- In AArch64 state, the number of 128-bit SIMD and floating-point registers increases from 16 to 32.
- Single-precision registers are no longer packed into double-precision registers, so register S  $x$  is D  $x$  [31:0].
- The presence of floating-point hardware is mandated, so software floating-point linkage is not supported.

- Earlier versions of the floating-point architecture, for instance VFPv2, VFPv3, and VFPv4, are not supported in AArch64 state.
- VFP vector mode is not supported in either AArch32 or AArch64 state. Use Advanced SIMD instructions for vector floating-point.
- Some new instructions have been added, including:
  - Direct conversion between half-precision and double-precision.
  - Load and store pair, replacing load and store multiple.
  - Fused multiply-add and multiply-subtract.
  - Instructions for IEEE 754-2008 compatibility.

## Related information

[Architecture support for floating-point](#) on page 176

[Views of the floating-point extension register bank in AArch32 state](#) on page 179

[Views of the floating-point extension register bank in AArch64 state](#) on page 179

[Floating-point Programming](#) on page 176

# 4. Overview of AArch32 state

Gives an overview of the AArch32 state of Armv8.

## 4.1 Changing between A32 and T32 instruction set states

A processor that is executing A32 instructions is operating in A32 instruction set state. A processor that is executing T32 instructions is operating in T32 instruction set state. For brevity, this document refers to them as the A32 state and T32 state respectively.

A processor in A32 state cannot execute T32 instructions, and a processor in T32 state cannot execute A32 instructions. You must ensure that the processor never receives instructions of the wrong instruction set for the current state.

The initial state after reset depends on the processor being used and its configuration.

To direct `armasm` to generate A32 or T32 instruction encodings, you must set the assembler mode using an `ARM` or `THUMB` directive. Assembly code using `CODE32` and `CODE16` directives can still be assembled, but Arm recommends you use the `ARM` and `THUMB` directives for new code.

These directives do not change the instruction set state of the processor. To change the instruction set state, you must use an appropriate instruction, for example `BX` or `BLX` to change between A32 and T32 states when performing a branch.

### Related information

[BLX, BLXNS \(A32\)](#) on page 271

[BX, BXNS \(A32\)](#) on page 273

[ARM or CODE32 directive](#) on page 1432

[CODE16 directive](#) on page 1436

[THUMB directive](#) on page 1486

## 4.2 Processor modes, and privileged and unprivileged software execution

The Arm® architecture supports different levels of execution privilege. The privilege level depends on the processor mode.



Armv6-M, Armv7-M, Armv8-M.baseline, and Armv8-M.mainline do not support the same modes as other Arm architectures and profiles. Some of the processor modes listed here do not apply to these architectures.

**Table 4-1: Arm processor modes**

| Processor mode | Mode number |
|----------------|-------------|
| User           | 0b10000     |
| FIQ            | 0b10001     |
| IRQ            | 0b10010     |
| Supervisor     | 0b10011     |
| Monitor        | 0b10110     |
| Abort          | 0b10111     |
| Hyp            | 0b11010     |
| Undefined      | 0b11011     |
| System         | 0b11111     |

User mode is an unprivileged mode, and has restricted access to system resources. All other modes have full access to system resources in the current security state, can change mode freely, and execute software as privileged.

Applications that require task protection usually execute in User mode. Some embedded applications might run entirely in any mode other than User mode. An application that requires full access to system resources usually executes in System mode.

Modes other than User mode are entered to service exceptions, or to access privileged resources.

Code can run in either a Secure state or in a Non-secure state. Hypervisor (Hyp) mode has privileged execution in Non-secure state.

### Related information

[Processor modes in Armv6-M, Armv7-M, and Armv8-M](#) on page 77

[A-Profile Architectures](#)

## 4.3 Processor modes in Armv6-M, Armv7-M, and Armv8-M

The processor modes available in Arm®v6-M, Armv7-M, Armv8-M.baseline, and Armv8-M.mainline are Thread mode and Handler mode.

Thread mode is the normal mode that programs run in. Thread mode can be privileged or unprivileged software execution. Handler mode is the mode that exceptions are handled in. It is always privileged software execution.

### Related information

[Processor modes, and privileged and unprivileged software execution](#) on page 76

[A-Profile Architectures](#)

## 4.4 Registers in AArch32 state

Arm® processors provide general-purpose and special-purpose registers. Some additional registers are available in privileged execution modes.

In all Arm processors in AArch32 state, the following registers are available and accessible in any processor mode:

- 15 general-purpose registers R0-R12, the *Stack Pointer* (SP), and *Link Register* (LR).
- 1 *Program Counter* (PC).
- 1 *Application Program Status Register* (APSR).



SP and LR can be used as general-purpose registers, although Arm deprecates using SP other than as a stack pointer.

**Note**

Other registers are available in privileged software execution. Arm processors have a total of 43 registers. The registers are arranged in partially overlapping banks. There is a different register bank for each processor mode. The banked registers give rapid context switching for dealing with processor exceptions and privileged operations.

The extra registers in Arm processors are:

- 2 supervisor mode registers for banked SP and LR.
- 2 abort mode registers for banked SP and LR.
- 2 undefined mode registers for banked SP and LR.
- 2 interrupt mode registers for banked SP and LR.
- 7 FIQ mode registers for banked R8-R12, SP and LR.
- 2 monitor mode registers for banked SP and LR.
- 1 Hyp mode register for banked SP.
- 7 *Saved Program Status Register* (SPSRs), one for each exception mode.
- 1 Hyp mode register for ELR\_Hyp to store the preferred return address from Hyp mode.



In privileged software execution, CPSR is an alias for APSR and gives access to additional bits.

**Note**

The following figure shows how the registers are banked in the Arm architecture.

**Figure 4-1: Organization of general-purpose registers and Program Status Registers**

| Application level view |         | System level view |            |          |           |                      |          |          |  |
|------------------------|---------|-------------------|------------|----------|-----------|----------------------|----------|----------|--|
| User                   | System  | Hyp <sup>†</sup>  | Supervisor | Abort    | Undefined | Monitor <sup>‡</sup> | IRQ      | FIQ      |  |
| R0                     | R0_usr  |                   |            |          |           |                      |          |          |  |
| R1                     | R1_usr  |                   |            |          |           |                      |          |          |  |
| R2                     | R2_usr  |                   |            |          |           |                      |          |          |  |
| R3                     | R3_usr  |                   |            |          |           |                      |          |          |  |
| R4                     | R4_usr  |                   |            |          |           |                      |          |          |  |
| R5                     | R5_usr  |                   |            |          |           |                      |          |          |  |
| R6                     | R6_usr  |                   |            |          |           |                      |          |          |  |
| R7                     | R7_usr  |                   |            |          |           |                      |          |          |  |
| R8                     | R8_usr  |                   |            |          |           |                      | R8_fiq   |          |  |
| R9                     | R9_usr  |                   |            |          |           |                      | R9_fiq   |          |  |
| R10                    | R10_usr |                   |            |          |           |                      | R10_fiq  |          |  |
| R11                    | R11_usr |                   |            |          |           |                      | R11_fiq  |          |  |
| R12                    | R12_usr |                   |            |          |           |                      | R12_fiq  |          |  |
| SP                     | SP_usr  | SP_hyp            | SP_svc     | SP_abt   | SP_und    | SP_mon               | SP_irq   | SP_fiq   |  |
| LR                     | LR_usr  |                   | LR_svc     | LR_abt   | LR_und    | LR_mon               | LR_irq   | LR_fiq   |  |
| PC                     | PC      |                   |            |          |           |                      |          |          |  |
| APSR                   | CPSR    |                   |            |          |           |                      |          |          |  |
|                        |         | SPSR_hyp          | SPSR_svc   | SPSR_abt | SPSR_und  | SPSR_mon             | SPSR_irq | SPSR_fiq |  |
|                        |         | ELR_hyp           |            |          |           |                      |          |          |  |

<sup>‡</sup> Exists only in Secure state.<sup>†</sup> Exists only in Non-secure state.

Cells with no entry indicate that the User mode register is used.

In Armv6-M, Armv7-M, Armv8-M.baseline, and Armv8-M.mainline based processors, SP is an alias for the two banked stack pointer registers:

- Main stack pointer register, that is only available in privileged software execution.
- Process stack pointer register.

## Related information

[General-purpose registers in AArch32 state](#) on page 79

[Program Counter in AArch32 state](#) on page 82

[Application Program Status Register](#) on page 83

[Saved Program Status Registers in AArch32 state](#) on page 85

[Current Program Status Register in AArch32 state](#) on page 84

[Processor modes, and privileged and unprivileged software execution](#) on page 76

[A-Profile Architectures](#)

## 4.5 General-purpose registers in AArch32 state

There are restrictions on the use of SP and LR as general-purpose registers.

Except for Arm®v6-M, Armv7-M, Armv8-M.baseline, and Armv8-M.mainline based processors, there are 33 general-purpose 32-bit registers, including the banked SP and LR registers. 15 general-purpose registers are visible at any one time, depending on the current processor mode. These registers are R0-R12, SP, and LR. The PC (R15) is not considered a general-purpose register.

SP (or R13) is the stack pointer. The C and C++ compilers always use SP as the stack pointer. Arm deprecates most uses of SP as a general purpose register. In T32 state, SP is strictly defined as the stack pointer. The instruction descriptions in [A32 and T32 Instructions](#) describe when SP and PC can be used.

In User mode, LR (or R14) is used as a link register to store the return address when a subroutine call is made. It can also be used as a general-purpose register if the return address is stored on the stack.

In the exception handling modes, LR holds the return address for the exception, or a subroutine return address if subroutine calls are executed within an exception. LR can be used as a general-purpose register if the return address is stored on the stack.

### Related information

[Predeclared core register names in AArch32 state](#) on page 81

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

[Program Counter in AArch32 state](#) on page 82

[Register accesses in AArch32 state](#) on page 80

## 4.6 Register accesses in AArch32 state

16-bit T32 instructions can access only a limited set of registers. There are also some restrictions on the use of special-purpose registers by A32 and 32-bit T32 instructions.

Most 16-bit T32 instructions can only access R0 to R7. Only a few T32 instructions can access R8-R12, SP, LR, and PC. Registers R0 to R7 are called Lo registers. Registers R8-R12, SP, LR, and PC are called Hi registers.

All 32-bit T32 instructions can access R0 to R12, and LR. However, apart from a few designated stack manipulation instructions, most T32 instructions cannot use SP. Except for a few specific instructions where PC is useful, most T32 instructions cannot use PC.

In A32 state, all instructions can access R0 to R12, SP, and LR, and most instructions can also access PC (R15). However, the use of the SP in an A32 instruction, in any way that is not possible in the corresponding T32 instruction, is deprecated. Explicit use of the PC in an A32 instruction is

not usually useful, and except for specific instances that are useful, such use is deprecated. Implicit use of the PC, for example in branch instructions or load (literal) instructions, is never deprecated.

The `MRS` instructions can move the contents of a status register to a general-purpose register, where normal data processing operations can manipulate them. You can use the `MSR` instruction to move the contents of a general-purpose register to a status register.

## Related information

[General-purpose registers in AArch32 state](#) on page 79

[Program Counter in AArch32 state](#) on page 82

[Application Program Status Register](#) on page 83

[Current Program Status Register in AArch32 state](#) on page 84

[Saved Program Status Registers in AArch32 state](#) on page 85

[Predeclared core register names in AArch32 state](#) on page 81

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

[The Read-Modify-Write operation](#) on page 123

## 4.7 Predeclared core register names in AArch32 state

Many of the core register names have synonyms.

The following table shows the predeclared core registers:

**Table 4-2: Predeclared core registers in AArch32 state**

| Register names    | Meaning                                                                            |
|-------------------|------------------------------------------------------------------------------------|
| r0–r15 and R0–R15 | General purpose registers.                                                         |
| a1–a4             | Argument, result, or scratch registers. These registers are synonyms for R0 to R3. |
| v1–v8             | Variable registers. These registers are synonyms for R4 to R11.                    |
| SB                | Static base register. This register is a synonym for R9.                           |
| IP                | Intra-procedure call scratch register. This is a synonym for R12.                  |
| SP                | Stack pointer. This register is a synonym for R13.                                 |
| LR                | Link register. This register is a synonym for R14.                                 |
| PC                | Program counter. This register is a synonym for R15.                               |

Except for `a1–a4` and `v1–v8`, you can write the register names either in all uppercase or all lowercase.

## Related information

[General-purpose registers in AArch32 state](#) on page 79

## 4.8 Predeclared extension register names in AArch32 state

You can write the names of Advanced SIMD and floating-point registers either in uppercase or lowercase.

The following table shows the predeclared extension register names:

**Table 4-3: Predeclared extension registers in AArch32 state**

| Register names | Meaning                                                                       |
|----------------|-------------------------------------------------------------------------------|
| Q0-Q15         | Advanced SIMD quadword registers                                              |
| D0-D31         | Advanced SIMD doubleword registers, floating-point double-precision registers |
| S0-S31         | Floating-point single-precision registers                                     |

You can write the register names either in uppercase or lowercase.

### Related information

[Extension register bank mapping for Advanced SIMD in AArch32 state](#) on page 160

## 4.9 Program Counter in AArch32 state

You can use the Program Counter explicitly, for example in some T32 data processing instructions, and implicitly, for example in branch instructions.

The Program Counter (PC) is accessed as PC (or R15). It is incremented by the size of the instruction executed, which is always four bytes in A32 state. Branch instructions load the destination address into the PC. You can also load the PC directly using data operation instructions. For example, to branch to the address in a general purpose register, use:

```
MOV PC, R0
```

During execution, the PC does not contain the address of the currently executing instruction. The address of the currently executing instruction is typically PC-8 for A32, or PC-4 for T32.



Arm recommends you use the `BX` instruction to jump to an address or to return from a function, rather than writing to the PC directly.

### Related information

[Register-relative and PC-relative expressions](#) on page 222

[B \(A32\)](#) on page 264

[BX, BXNS \(A32\)](#) on page 273

[CBZ and CBNZ \(A32\)](#) on page 275

[TBB and TBH \(A32\)](#) on page 457

## 4.10 The Q flag in AArch32 state

The Q flag indicates overflow or saturation. It is one of the program status flags held in the APSR.

The Q flag is set to 1 when saturation occurs in saturating arithmetic instructions, or when overflow occurs in certain multiply instructions.

The Q flag is a sticky flag. Although the saturating and certain multiply instructions can set the flag, they cannot clear it. You can execute a series of such instructions, and then test the flag to find out whether saturation or overflow occurred at any point in the series, without having to check the flag after each instruction.

To clear the Q flag, use an `MSR` instruction to read-modify-write the APSR:

```
MRS r5, APSR
BIC r5, r5, #(1<<27)
MSR APSR_nzcvq, r5
```

The state of the Q flag cannot be tested directly by the condition codes. To read the state of the Q flag, use an `MRS` instruction.

```
MRS r6, APSR
TST r6, #(1<<27); Z is clear if Q flag was set
```

### Related information

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

[QADD \(A32\)](#) on page 360

[SMULxy](#) on page 414

[SMULWY](#) on page 416

[The Read-Modify-Write operation](#) on page 123

## 4.11 Application Program Status Register

The Application Program Status Register (APSR) holds the program status flags that are accessible in any processor mode.

It holds copies of the N, Z, C, and V condition flags. The processor uses them to determine whether to execute conditional instructions.

The APSR also holds:

- The Q (saturation) flag.

- The APSR also holds the GE (Greater than or Equal) flags. The GE flags can be set by the parallel add and subtract instructions. They are used by the `SEL` instruction to perform byte-based selection from two registers.

These flags are accessible in all modes, using the `MSR` and `MRS` instructions.

### Related information

[Conditional instructions](#) on page 132

[Updates to the condition flags in A32/T32 code](#) on page 135

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

[SEL \(A32\)](#) on page 389

## 4.12 Current Program Status Register in AArch32 state

The *Current Program Status Register* (CPSR) holds the same program status flags as the APSR, and some additional information.

It holds:

- The APSR flags.
- The processor mode.
- The interrupt disable flags.
- Either:
  - The instruction set state for the Arm®v8 architecture (A32 or T32).
  - The instruction set state for the Armv7 architecture (A32 or T32).
- The endianness state.
- The execution state bits for the IT block.

The execution state bits control conditional execution in the IT block.

Only the APSR flags are accessible in all modes. Arm deprecates using an `MSR` instruction to change the endianness bit (E) of the CPSR, in any mode. Each exception level can have its own endianness, but mixed endianness within an exception level is deprecated.

The `SETEND` instruction is deprecated in A32 and T32 and has no equivalent in A64.

The execution state bits for the IT block (IT[1:0]) and the T32 bit (T) can be accessed by `MRS` only in Debug state.

### Related information

[IT \(A32\)](#) on page 300

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

[SETEND \(A32\)](#) on page 390

[Updates to the condition flags in A32/T32 code](#) on page 135

[Saved Program Status Registers in AArch32 state](#) on page 85

## 4.13 Saved Program Status Registers in AArch32 state

The *Saved Program Status Register* (SPSR) stores the current value of the CPSR when an exception is taken so that it can be restored after handling the exception.

Each exception handling mode can access its own SPSR. User mode and System mode do not have an SPSR because they are not exception handling modes.

The execution state bits, including the endianness state and current instruction set state can be accessed from the SPSR in any exception mode, using the `MSR` and `MRS` instructions. You cannot access the SPSR using `MSR` or `MRS` in User or System mode.

### Related information

[Current Program Status Register in AArch32 state](#) on page 84

## 4.14 A32 and T32 instruction set overview

A32 and T32 instructions can be grouped by functional area.

All A32 instructions are 32 bits long. Instructions are stored word-aligned, so the least significant two bits of instruction addresses are always zero in A32 state.

T32 instructions are either 16 or 32 bits long. Instructions are stored half-word aligned. Some instructions use the least significant bit of the address to determine whether the code being branched to is T32 or A32.

Before the introduction of 32-bit T32 instructions, the T32 instruction set was limited to a restricted subset of the functionality of the A32 instruction set. Almost all T32 instructions were 16-bit. Together, the 32-bit and 16-bit T32 instructions provide functionality that is almost identical to that of the A32 instruction set.

The following table describes some of the functional groupings of the available instructions.

**Table 4-4: A32 instruction groups**

| Instruction group                | Description                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Branch and control               | <p>These instructions do the following:</p> <ul style="list-style-type: none"> <li>• Branch to subroutines.</li> <li>• Branch backwards to form loops.</li> <li>• Branch forward in conditional structures.</li> <li>• Make the following instruction conditional without branching.</li> <li>• Change the processor between A32 state and T32 state.</li> </ul>                                                                                 |
| Data processing                  | <p>These instructions operate on the general-purpose registers. They can perform operations such as addition, subtraction, or bitwise logic on the contents of two registers and place the result in a third register. They can also operate on the value in a single register, or on a value in a register and an immediate value supplied within the instruction.</p> <p>Long multiply instructions give a 64-bit result in two registers.</p> |
| Register load and store          | <p>These instructions load or store the value of a single register from or to memory. They can load or store a 32-bit word, a 16-bit halfword, or an 8-bit unsigned byte. Byte and halfword loads can either be sign extended or zero extended to fill the 32-bit register.</p> <p>A few instructions are also defined that can load or store 64-bit doubleword values into two 32-bit registers.</p>                                            |
| Multiple register load and store | These instructions load or store any subset of the general-purpose registers from or to memory.                                                                                                                                                                                                                                                                                                                                                  |
| Status register access           | These instructions move the contents of a status register to or from a general-purpose register.                                                                                                                                                                                                                                                                                                                                                 |

## Related information

[Load and store multiple register instructions](#) on page 117

## 4.15 Access to the inline barrel shifter in AArch32 state

The AArch32 arithmetic logic unit has a 32-bit barrel shifter that is capable of shift and rotate operations.

The second operand to many A32 and T32 data-processing and single register data-transfer instructions can be shifted, before the data-processing or data-transfer is executed, as part of the instruction. This supports, but is not limited to:

- Scaled addressing.
- Multiplication by an immediate value.
- Constructing immediate values.

32-bit T32 instructions give almost the same access to the barrel shifter as A32 instructions.

16-bit T32 instructions only allow access to the barrel shifter using separate instructions.

## Related information

[Load immediate values](#) on page 106

[Load immediate values using MOV and MVN](#) on page 106

# 5. Overview of AArch64 state

Gives an overview of the AArch64 state of Armv8.

## 5.1 Registers in AArch64 state

Arm® processors provide general-purpose and special-purpose registers. Some additional registers are available in privileged execution modes.

In AArch64 state, the following registers are available:

- Thirty-one 64-bit general-purpose registers X0-X30, the bottom halves of which are accessible as W0-W30.
- Four stack pointer registers SP\_ELO, SP\_EL1, SP\_EL2, SP\_EL3.
- Three exception link registers ELR\_EL1, ELR\_EL2, ELR\_EL3.
- Three saved program status registers SPSR\_EL1, SPSR\_EL2, SPSR\_EL3.
- One program counter.

All these registers are 64 bits wide except SPSR\_EL1, SPSR\_EL2, and SPSR\_EL3, which are 32 bits wide.

Most A64 integer instructions can operate on either 32-bit or 64-bit registers. The register width is determined by the register identifier, where *w* means 32-bit and *x* means 64-bit. The names *w<sub>n</sub>* and *x<sub>n</sub>*, where *n* is in the range 0-30, refer to the same register. When you use the 32-bit form of an instruction, the upper 32 bits of the source registers are ignored and the upper 32 bits of the destination register are set to zero.

There is no register named W31 or X31. Depending on the instruction, register 31 is either the stack pointer or the zero register. When used as the stack pointer, you refer to it as SP. When used as the zero register, you refer to it as WZR in a 32-bit context or XZR in a 64-bit context.

### Related information

[Exception levels](#) on page 88

[Link registers](#) on page 89

[Stack Pointer register](#) on page 90

[Program Counter in AArch64 state](#) on page 92

[Conditional execution in AArch64 state](#) on page 93

[Saved Program Status Registers in AArch64 state](#) on page 94

## 5.2 Exception levels

The Armv8 architecture defines four exception levels, EL0 to EL3, where EL3 is the highest exception level with the most execution privilege. When taking an exception, the exception level can either increase or remain the same, and when returning from an exception, it can either decrease or remain the same.

The following is a common usage model for the exception levels:

### EL0

Applications.

### EL1

OS kernels and associated functions that are typically described as privileged.

### EL2

Hypervisor.

### EL3

Secure monitor.

When taking an exception to a higher exception level, the execution state can either remain the same, or change from AArch32 to AArch64.

When returning to a lower exception level, the execution state can either remain the same or change from AArch64 to AArch32.

The only way the execution state can change is by taking or returning from an exception. It is not possible to change between execution states in the same way as changing between A32 and T32 code in AArch32 state.

On powerup and on reset, the processor enters the highest implemented exception level. The execution state for this exception level is a property of the implementation, and might be determined by a configuration input signal.

For exception levels other than EL0, the execution state is determined by one or more control register configuration bits. These bits can be set only in a higher exception level.

For EL0, the execution state is determined as part of the exception return to EL0, under the control of the exception level that the execution is returning from.

### Related information

[Link registers](#) on page 89

[Saved Program Status Registers in AArch64 state](#) on page 94

[Changing between AArch64 and AArch32 states](#) on page 73

[Process State](#) on page 94

## 5.3 Link registers

In AArch64 state, the Link Register (LR) stores the return address when a subroutine call is made. It can also be used as a general-purpose register if the return address is stored on the stack. The LR maps to register 30. Unlike in AArch32 state, the LR is distinct from the Exception Link Registers (ELRs) and is therefore unbanked.

There are three Exception Link Registers, ELR\_EL1, ELR\_EL2, and ELR\_EL3, that correspond to each of the exception levels. When an exception is taken, the Exception Link Register for the target exception level stores the return address to jump to after the handling of that exception completes. If the exception was taken from AArch32 state, the top 32 bits in the ELR are all set to zero. Subroutine calls within the exception level use the LR to store the return address from the subroutine.

For example when the exception level changes from ELO to EL1, the return address is stored in ELR\_EL1.

When in an exception level, if you enable interrupts that use the same exception level, you must ensure you store the ELR on the stack because it is overwritten with a new return address when the interrupt is taken.

### Related information

[Program Counter in AArch64 state](#) on page 92

[Predeclared core register names in AArch64 state](#) on page 91

[General-purpose registers in AArch32 state](#) on page 79

## 5.4 Stack Pointer register

In AArch64 state, SP represents the 64-bit Stack Pointer. SP\_ELO is an alias for SP. Do not use SP as a general purpose register.

You can only use SP as an operand in the following instructions:

- As the base register for loads and stores. In this case it must be quadword-aligned before adding any offset, or a stack alignment exception occurs.
- As a source or destination for arithmetic instructions, but it cannot be used as the destination in instructions that set the condition flags.
- In logical instructions, for example in order to align it.

There is a separate stack pointer for each of the three exception levels, SP\_EL1, SP\_EL2, and SP\_EL3. Within an exception level, you can either use the dedicated stack pointer for that exception level or you can use SP\_ELO, the stack pointer associated with ELO. You can use the SPSel register to select which stack pointer to use in the exception level.

The choice of stack pointer is indicated by the letter t or h appended to the exception level name, for example EL0t or EL3h. The t suffix indicates that the exception level uses SP\_ELO and the h

suffix indicates it uses SP\_ELx, where x is the current exception level number. EL0 always uses SP\_ELO so cannot have an h suffix.

## Related information

[General-purpose registers in AArch32 state](#) on page 79

[Exception levels](#) on page 88

[Registers in AArch64 state](#) on page 88

[Process State](#) on page 94

## 5.5 Predeclared core register names in AArch64 state

In AArch64 state, the predeclared core registers are different from the corresponding registers in AArch32 state.

The following table shows the predeclared core registers in AArch64 state:

**Table 5-1: Predeclared core registers in AArch64 state**

| Register names | Meaning                                                                                                                 |
|----------------|-------------------------------------------------------------------------------------------------------------------------|
| W0–W30         | 32-bit general purpose registers.                                                                                       |
| X0–X30         | 64-bit general purpose registers.                                                                                       |
| WZR            | 32-bit RAZ/WI register. This name is the name for register 31 when it is used as the zero register in a 32-bit context. |
| XZR            | 64-bit RAZ/WI register. This name is the name for register 31 when it is used as the zero register in a 64-bit context. |
| WSP            | 32-bit stack pointer. This name is the name for register 31 when it is used as the stack pointer in a 32-bit context.   |
| SP             | 64-bit stack pointer. This name is the name for register 31 when it is used as the stack pointer in a 64-bit context.   |
| LR             | Link register. This name is a synonym for X30.                                                                          |

You can write the register names either in all uppercase or all lowercase.



In AArch64 state, the PC is not a general purpose register and you cannot access it by name.

## Related information

[Predeclared core register names in AArch32 state](#) on page 81

[Registers in AArch64 state](#) on page 88

[Link registers](#) on page 89

[Stack Pointer register](#) on page 90

[Program Counter in AArch64 state](#) on page 92

## 5.6 Predeclared extension register names in AArch64 state

You can write the names of Advanced SIMD and floating-point registers either in uppercase or lowercase.

The following table shows the predeclared extension register names in AArch64 state:

**Table 5-2: Predeclared extension registers in AArch64 state**

| Register names | Meaning                                                                                     |
|----------------|---------------------------------------------------------------------------------------------|
| V0–V31         | Advanced SIMD 128-bit vector registers.                                                     |
| Q0–Q31         | Advanced SIMD registers holding a 128-bit scalar.                                           |
| D0–D31         | Advanced SIMD registers holding a 64-bit scalar, floating-point double-precision registers. |
| S0–S31         | Advanced SIMD registers holding a 32-bit scalar, floating-point single-precision registers. |
| H0–H31         | Advanced SIMD registers holding a 16-bit scalar, floating-point half-precision registers.   |
| B0–B31         | Advanced SIMD registers holding an 8-bit scalar.                                            |

### Related information

[Predeclared extension register names in AArch32 state](#) on page 81

[Registers in AArch64 state](#) on page 88

[Extension register bank mapping for Advanced SIMD in AArch64 state](#) on page 162

## 5.7 Program Counter in AArch64 state

In AArch64 state, the *Program Counter* (PC) contains the address of the currently executing instruction. It is incremented by the size of the instruction executed, which is always four bytes.

In AArch64 state, the PC is not a general purpose register and you cannot access it explicitly. The following types of instructions read it implicitly:

- Instructions that compute a PC-relative address.
- PC-relative literal loads.
- Direct branches to a PC-relative label.
- Branch and link instructions, which store it in the procedure link register.

The only types of instructions that can write to the PC are:

- Conditional and unconditional branches.
- Exception generation and exception returns.

Branch instructions load the destination address into the PC.

## Related information

[Program Counter in AArch32 state](#) on page 82  
[Register-relative and PC-relative expressions](#) on page 222  
[B \(A32\)](#) on page 264  
[BL \(A32\)](#) on page 269  
[BLX, BLXNS \(A32\)](#) on page 271  
[BX, BXNS \(A32\)](#) on page 273

## 5.8 Conditional execution in AArch64 state

In AArch64 state, the NZCV register holds copies of the N, Z, C, and V condition flags. The processor uses them to determine whether to execute conditional instructions. The NZCV register contains the flags in bits[31:28].

The condition flags are accessible in all exception levels, using the `MSR` and `MRS` instructions.

A64 makes less use of conditionality than A32. For example, in A64:

- Only a few instructions can set or test the condition flags.
- There is no equivalent of the T32 `IT` instruction.
- The only conditionally executed instruction, which behaves as a NOP if the condition is false, is the conditional branch, `B.cond`.

## Related information

[Application Program Status Register](#) on page 83  
[Updates to the condition flags in A32/T32 code](#) on page 135  
[Updates to the condition flags in A64 code](#) on page 136  
[Conditional instructions](#) on page 132  
[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340  
[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

## 5.9 The Q flag in AArch64 state

In AArch64 state, you cannot read or write to the Q flag because in A64 there are no saturating arithmetic instructions that operate on the general purpose registers.

The Advanced SIMD saturating arithmetic instructions set the QC bit in the floating-point status register (FPSR) to indicate that saturation has occurred. You can identify such instructions by the Q mnemonic modifier, for example `SQADD`.

## Related information

[A64 SIMD Scalar Instructions](#) on page 1024  
[A64 SIMD Vector Instructions](#) on page 1135

## 5.10 Process State

In AArch64 state, there is no Current Program Status Register (CPSR). You can access the different components of the traditional CPSR independently as Process State fields.

The Process State fields are:

- N, Z, C, and V condition flags (NZCV).
- Current register width (nRW).
- Stack pointer selection bit (SPSel).
- Interrupt disable flags (DAIF).
- Current exception level (EL).
- Single-step process state bit (SS).
- Illegal exception return state bit (IL).

You can use `MSR` to write to:

- The N, Z, C,, and V flags in the NZCV register.
- The interrupt disable flags in the DAIF register.
- The SP selection bit in the SPSel register, in EL1 or higher.

You can use `MRS` to read:

- The N, Z, C,, and V flags in the NZCV register.
- The interrupt disable flags in the DAIF register.
- The exception level bits in the CurrentEL register, in EL1 or higher.
- The SP selection bit in the SPSel register, in EL1 or higher.

When an exception occurs, all Process State fields associated with the current exception level are stored in a single register associated with the target exception level, the SPSR. You can access the SS, IL, and nRW bits only from the SPSR.

### Related information

[Current Program Status Register in AArch32 state](#) on page 84

[Saved Program Status Registers in AArch32 state](#) on page 85

[Updates to the condition flags in A32/T32 code](#) on page 135

[Updates to the condition flags in A64 code](#) on page 136

[Saved Program Status Registers in AArch64 state](#) on page 94

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

## 5.11 Saved Program Status Registers in AArch64 state

The *Saved Program Status Registers* (SPSRs) are 32-bit registers that store the process state of the current exception level when an exception is taken to an exception level that uses AArch64 state. This allows the process state to be restored after the exception has been handled.

In AArch64 state, each target exception level has its own SPSR:

- SPSR\_EL1.
- SPSR\_EL2.
- SPSR\_EL3.

When taking an exception, the process state of the current exception level is stored in the SPSR of the target exception level. On returning from an exception, the exception handler uses the SPSR of the exception level that is being returned from to restore the process state of the exception level that is being returned to.



On returning from an exception, the preferred return address is restored from the ELR associated with the exception level that is being returned from.

Note

The SPSRs store the following information:

- N, Z, C, and V flags.
- D, A, I, and F interrupt disable bits.
- The register width.
- The execution mode.
- The IL and SS bits.

### Related information

[Stack Pointer register](#) on page 90

[Process State](#) on page 94

[Saved Program Status Registers in AArch32 state](#) on page 85

## 5.12 A64 instruction set overview

A64 instructions can be grouped by functional area.

The following table describes some of the functional groupings of the instructions in A64.

**Table 5-3: A64 instruction groups**

| Instruction group       | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|-------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Branch and control      | <p>These instructions do the following:<sup>*</sup> Branch to and return from subroutines.</p> <ul style="list-style-type: none"> <li>• Branch backwards to form loops.</li> <li>• Branch forward in conditional structures.</li> <li>• Generate and return from exceptions.</li> </ul>                                                                                                                                                                                                                                                                                                                                                       |
| Data processing         | <p>These instructions operate on the general-purpose registers. They can perform operations such as addition, subtraction, or bitwise logic on the contents of two registers and place the result in a third register. They can also operate on the value in a single register, or on a value in a register and an immediate value supplied within the instruction.</p> <p>The addition and subtraction instructions can optionally left shift the immediate operand, or can sign or zero-extend and shift the final source operand register.</p> <p>A64 includes signed and unsigned 32-bit and 64-bit multiply and divide instructions.</p> |
| Register load and store | <p>These instructions load or store the value of a single register or pair of registers from or to memory. You can load or store a single 64-bit doubleword, 32-bit word, 16-bit halfword, or 8-bit byte, or a pair of words or doublewords. Byte and halfword loads can either be sign-extended or zero-extended to fill the 32-bit register. You can also load and sign-extend a signed byte, halfword or word into a 64-bit register, or load a pair of signed words into two 64-bit registers.</p>                                                                                                                                        |
| System register access  | <p>These instructions move the contents of a system register to or from a general-purpose register.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |

## Related information

[A32 and T32 instruction set overview](#) on page 85

[A64 General Instructions](#) on page 664

[A64 Data Transfer Instructions](#) on page 818

# 6. Structure of Assembly Language Modules

Describes the structure of assembly language source files.

## 6.1 Syntax of source lines in assembly language

The assembler parses and assembles assembly language to produce object code.

### Syntax

Each line of assembly language source code has this general form:

```
{symbol} {instruction|directive|pseudo-instruction} {;comment}
```

All three sections of the source line are optional.

*symbol* is usually a label. In instructions and pseudo-instructions it is always a label. In some directives, it is a symbol for a variable or a constant. The description of the directive makes this clear in each case.

*symbol* must begin in the first column. It cannot contain any white space character such as a space or a tab unless it is enclosed by bars (|).

Labels are symbolic representations of addresses. You can use labels to mark specific addresses that you want to refer to from other parts of the code. Numeric local labels are a subclass of labels that begin with a number in the range 0-99. Unlike other labels, a numeric local label can be defined many times. This makes them useful when generating labels with a macro.

Directives provide important information to the assembler that either affects the assembly process or affects the final output image.

Instructions and pseudo-instructions make up the code a processor uses to perform tasks.



Instructions, pseudo-instructions, and directives must be preceded by white space, such as a space or a tab, irrespective of whether there is a preceding label or not.

Some directives do not allow the use of a label.

A comment is the final part of a source line. The first semicolon on a line marks the beginning of a comment except where the semicolon appears inside a string literal. The end of the line is the end of the comment. A comment alone is a valid line. The assembler ignores all comments. You can use blank lines to make your code more readable.

## Considerations when writing assembly language source code

You must write instruction mnemonics, pseudo-instructions, directives, and symbolic register names (except `a1-a4` and `v1-v8` in A32 or T32 instructions) in either all uppercase or all lowercase. You must not use mixed case. Labels and comments can be in uppercase, lowercase, or mixed case.

```

AREA      A32ex, CODE, READONLY
; Name this block of code A32ex

start    ENTRY           ; Mark first instruction to execute
        MOV    r0, #10      ; Set up parameters
        MOV    r1, #3
        ADD    r0, r0, r1   ; r0 = r0 + r1
stop     MOV    r0, #0x18    ; angel_SWIreason_ReportException
        LDR    r1, =0x20026  ; ADP_Stopped_ApplicationExit
        SVC    #0x123456    ; AArch32 semihosting (formerly SWI)
        END
;
```

To make source files easier to read, you can split a long line of source into several lines by placing a backslash character (\) at the end of the line. The backslash must not be followed by any other characters, including spaces and tabs. The assembler treats the backslash followed by end-of-line sequence as white space. You can also use blank lines to make your code more readable.



Do not use the backslash followed by end-of-line sequence within quoted strings.

**Note**

---

The limit on the length of lines, including any extensions using backslashes, is 4095 characters.

### Related information

[Literals](#) on page 98

[Labels](#) on page 223

[Numeric local labels](#) on page 225

[String literals](#) on page 227

[Symbol naming rules](#) on page 219

[Syntax of numeric literals](#) on page 229

## 6.2 Literals

Assembly language source code can contain numeric, string, Boolean, and single character literals.

Literals can be expressed as:

- Decimal numbers, for example 123.
- Hexadecimal numbers, for example 7B.
- Numbers in any base from 2 to 9, for example 5\_204 is a number in base 5.

- Floating point numbers, for example 123.4.
- Boolean values {TRUE} or {FALSE}.
- Single character values enclosed by single quotes, for example 'w'.
- Strings enclosed in double quotes, for example "This is a string".



In most cases, a string containing a single character is accepted as a single character value. For example `ADD r0,r1,#"a"` is accepted, but `ADD r0,r1,#"ab"` is faulted.

You can also use variables and names to represent literals.

## Related information

[Syntax of source lines in assembly language](#) on page 97

## 6.3 ELF sections and the AREA directive

Object files produced by the assembler are divided into sections. In assembly source code, you use the `AREA` directive to mark the start of a section.

ELF sections are independent, named, indivisible sequences of code or data. A single code section is the minimum required to produce an application.

The output of an assembly or compilation can include:

- One or more code sections. These are usually read-only sections.
- One or more data sections. These are usually read-write sections. They might be zero-initialized (ZI).

The linker places each section in a program image according to section placement rules. Sections that are adjacent in source files are not necessarily adjacent in the application image

Use the `AREA` directive to name the section and set its attributes. The attributes are placed after the name, separated by commas.

You can choose any name for your sections. However, names starting with any non-alphabetic character must be enclosed in bars, or an `AREA name missing` error is generated. For example, `|_DataArea|`.

The following example defines a single read-only section called `A32ex` that contains code:

```
AREA A32ex, CODE, READONLY ; Name this block of code A32ex
```

## Related information

[An example armasm syntax assembly language module](#) on page 100

[AREA](#) on page 1428  
[Information about scatter files](#)

## 6.4 An example armasm syntax assembly language module

An `armasm` syntax assembly language module has several constituent parts.

These are:

- ELF sections (defined by the `AREA` directive).
- Application entry (defined by the `ENTRY` directive).
- Application execution.
- Application termination.
- Program end (defined by the `END` directive).

### Constituents of an A32 assembly language module

The following example defines a single section called `A32ex` that contains code and is marked as being `READONLY`. This example uses the A32 instruction set.

```

        AREA      A32ex, CODE, READONLY
        ENTRY
start      MOV      r0, #10           ; Set up parameters
          MOV      r1, #3
          ADD      r0, r0, r1       ; r0 = r0 + r1
stop       MOV      r0, #0x18         ; angel_SWIreason_ReportException
          LDR      r1, =0x20026     ; ADP_Stopped_ApplicationExit
          SVC      #0x123456       ; AArch32 semihosting (formerly SWI)
          END

```

### Constituents of an A64 assembly language module

The following example defines a single section called `A64ex` that contains code and is marked as being `READONLY`. This example uses the A64 instruction set.

```

        AREA      A64ex, CODE, READONLY
        ENTRY
start      MOV      w0, #10           ; Set up parameters
          MOV      w1, #3
          ADD      w0, w0, w1       ; w0 = w0 + w1
stop       MOV      x1, #0x26
          MOVK    x1, #2, LSL #16
          STR      x1, [sp,#0]      ; ADP_Stopped_ApplicationExit
          MOV      x0, #0
          STR      x0, [sp,#8]      ; Exit status code
          MOV      x1, sp
          MOV      w0, #0x18         ; angel_SWIreason_ReportException
          HLT      0xf000           ; AArch64 semihosting
          END

```

## Constituents of a T32 assembly language module

The following example defines a single section called `T32ex` that contains code and is marked as being `READONLY`. This example uses the T32 instruction set.

```

AREA      T32ex, CODE, READONLY
                    ; Name this block of code T32ex
ENTRY      THUMB          ; Mark first instruction to execute
start
        MOV    r0, #10      ; Set up parameters
        MOV    r1, #3
        ADD    r0, r0, r1    ; r0 = r0 + r1
stop
        MOV    r0, #0x18      ; angel_SWIreason_ReportException
        LDR    r1, =0x20026   ; ADP_Stopped_ApplicationExit
        SVC    #0xab         ; AArch32 semihosting (formerly SWI)
        ALIGN  4             ; Aligned on 4-byte boundary
        END

```

### Application entry

The `ENTRY` directive declares an entry point to the program. It marks the first instruction to be executed. In applications using the C library, an entry point is also contained within the C library initialization code. Initialization code and exception handlers also contain entry points.

### Application execution in A32 or T32 code

The application code begins executing at the label `start`, where it loads the decimal values 10 and 3 into registers `r0` and `r1`. These registers are added together and the result placed in `r0`.

### Application execution in A64 code

The application code begins executing at the label `start`, where it loads the decimal values 10 and 3 into registers `w0` and `w1`. These registers are added together and the result placed in `w0`.

### Application termination

After executing the main code, the application terminates by returning control to the debugger.

#### A32 and T32 code

You do this in A32 and T32 code using the semihosting `svc` instruction:

- In A32 code, the semihosting `svc` instruction is `0x123456` by default.
- In T32 code, use the semihosting `svc` instruction is `0xAB` by default.

A32 and T32 code uses the following parameters:

- R0 equal to `angel_SWIreason_ReportException` (`0x18`).
- R1 equal to `ADP_Stopped_ApplicationExit` (`0x20026`).

#### A64 code

In A64 code, use `HLT` instruction `0xF000` to invoke the semihosting interface.

A64 code uses the following parameters:

- W0 equal to `angel_SWIreason_ReportException` (`0x18`).

- X1 is the address of a block of two parameters. The first is the exception type, `ADP_Stopped_ApplicationExit` (0x20026) and the second is the exit status code.

## Program end

The `END` directive instructs the assembler to stop processing this source file. Every assembly language source module must finish with an `END` directive on a line by itself. Any lines following the `END` directive are ignored by the assembler.

## Related information

[END](#) on page 1445

[ENTRY](#) on page 1446

[ELF sections and the AREA directive](#) on page 99

[Semihosting for AArch32 and AArch64](#)

# 7. Writing A32/T32 Assembly Language

Describes the use of a few basic A32 and T32 instructions and the use of macros.

## 7.1 About the Unified Assembler Language

Unified Assembler Language (UAL) is a common syntax for A32 and T32 instructions. It supersedes earlier versions of both the A32 and T32 assembler languages.

Code that is written using UAL can be assembled for A32 or T32 for any Arm® processor. `armasm` faults the use of unavailable instructions.

`armasm` can assemble code that is written in pre-UAL and UAL syntax.

By default, `armasm` expects source code to be written in UAL. `armasm` accepts UAL syntax if any of the directives `CODE32`, `ARM`, or `THUMB` is used or if you assemble with any of the `--32`, `--arm`, or `--thumb` command-line options. `armasm` also accepts source code that is written in pre-UAL A32 assembly language when you assemble with the `CODE32` or `ARM` directive.

`armasm` accepts source code that is written in pre-UAL T32 assembly language when you assemble using the `--16` command-line option, or the `CODE16` directive in the source code.



The pre-UAL T32 assembly language does not support 32-bit T32 instructions.

Note

### Related information

[-16](#) on page 187

[ARM or CODE32 directive](#) on page 1432

[CODE16 directive](#) on page 1436

[THUMB directive](#) on page 1486

[--32](#) on page 187

[--arm](#) on page 189

[--thumb](#) on page 214

## 7.2 Syntax differences between UAL and A64 assembly language

UAL is the assembler syntax that is used by the A32 and T32 instruction sets. A64 assembly language is the assembler syntax that is used by the A64 instruction set.

UAL in Arm®v8 is unchanged from Armv7.

The general statement format and operand order of A64 assembly language is the same as UAL, but there are some differences between them. The following table describes the main differences:

**Table 7-1: Syntax differences between UAL and A64 assembly language**

| UAL                                                                                                                                                 | A64                                                                                                                                                                                                                                                                                                              |
|-----------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| You make an instruction conditional by appending a condition code suffix directly to the mnemonic, with no delimiter. For example:<br><br>BEQ label | For conditionally executed instructions, you separate the condition code suffix from the mnemonic using a . delimiter. For example:<br><br>B.EQ label                                                                                                                                                            |
| Apart from the IT instruction, there are no unconditionally executed integer instructions that use a condition code as an operand.                  | A64 provides several unconditionally executed instructions that use a condition code as an operand. For these instructions, you specify the condition code to test for in the final operand position. For example:<br><br>CSEL w1,w2,w3,EQ                                                                       |
| The .W and .N instruction width specifiers control whether the assembler generates a 32-bit or 16-bit encoding for a T32 instruction.               | A64 is a fixed width 32-bit instruction set so does not support .W and .N qualifiers.                                                                                                                                                                                                                            |
| The core register names are R0-R15.                                                                                                                 | Qualify register names to indicate the operand data size, either 32-bit (W0-W31) or 64-bit (X0-X31).                                                                                                                                                                                                             |
| You can refer to registers R13, R14, and R15 as synonyms for SP, LR, and PC respectively.                                                           | In AArch64, there is no register that is named W31 or X31. Instead, you can refer to register 31 as SP, WZR, or XZR, depending on the context. You cannot refer to PC either by name or number. LR is an alias for register 30.                                                                                  |
| A32 has no equivalent of the extend operators.                                                                                                      | You can specify an extend operator in several instructions to control how a portion of the second source register value is sign or zero extended. For example, in the following instruction, UXTR is the extend type (zero extend, byte) and #2 is an optional left shift amount:<br><br>ADD X1, X2, W3, UXTR #2 |

## 7.3 Register usage in subroutine calls

You use branch instructions to call and return from subroutines. The Procedure Call Standard for the Arm Architecture defines how to use registers in subroutine calls.

A subroutine is a block of code that performs a task based on some arguments and optionally returns a result. By convention, you use registers R0 to R3 to pass arguments to subroutines, and R0 to pass a result back to the callers. A subroutine that requires more than four inputs uses the stack for the additional inputs.

To call subroutines, use a branch and link instruction. The syntax is:

```
BL destination
```

where *destination* is usually the label on the first instruction of the subroutine.

*destination* can also be a PC-relative expression.

The `BL` instruction:

- Places the return address in the link register.
- Sets the PC to the address of the subroutine.

After the subroutine code has executed, you can use a `BX LR` instruction to return.



Calls between separately assembled or compiled modules must comply with the restrictions and conventions defined by the *Procedure Call Standard for the Arm Architecture*.

### Example

The following example shows a subroutine, `doadd`, that adds the values of two arguments and returns a result in `R0`:

```

      AREA    subrout, CODE, READONLY ; Name this block of code
      ENTRY
start   MOV     r0, #10          ; Mark first instruction to execute
        MOV     r1, #3           ; Set up parameters
        BL      doadd          ; Call subroutine
stop    MOV     r0, #0x18         ; angel_SWIreason_ReportException
        LDR     r1, =0x20026    ; ADP_Stopped_ApplicationExit
        SVC     #0x123456       ; AArch32 semihosting (formerly SWI)
doadd   ADD     r0, r0, r1      ; Subroutine code
        BX     lr              ; Return from subroutine
        END

```

### Related information

[Stack operations for nested subroutines](#) on page 120

[BL \(A32\)](#) on page 269

[BX, BXNS \(A32\)](#) on page 273

[Procedure Call Standard for the Arm Architecture](#)

[Procedure Call Standard for the Arm 64-bit Architecture \(AArch64\)](#)

## 7.4 Load immediate values

To represent some immediate values, you might have to use a sequence of instructions rather than a single instruction.

A32 and T32 instructions can only be 32 bits wide. You can use a `MOV` or `MVN` instruction to load a register with an immediate value from a range that depends on the instruction set. Certain 32-bit values cannot be represented as an immediate operand to a single 32-bit instruction, although you can load these values from memory in a single instruction.

You can load any 32-bit immediate value into a register with two instructions, a `MOV` followed by a `MOVT`. Or, you can use a pseudo-instruction, `MOV32`, to construct the instruction sequence for you.

You can also use the `LDR` pseudo-instruction to load immediate values into a register.

You can include many commonly-used immediate values directly as operands within data processing instructions, without a separate load operation. The range of immediate values that you can include as operands in 16-bit T32 instructions is much smaller.

### Related information

[Load immediate values using MOV and MVN](#) on page 106

[Load immediate values using MOV32](#) on page 109

[Load immediate values using LDR Rd, =const](#) on page 109

[LDR pseudo-instruction \(A32\)](#) on page 319

## 7.5 Load immediate values using MOV and MVN

The `MOV` and `MVN` instructions can write a range of immediate values to a register.

In A32:

- `MOV` can load any 8-bit immediate value, giving a range of `0x0-0xFF` (0-255).

It can also rotate these values by any even number.

These values are also available as immediate operands in many data processing operations, without being loaded in a separate instruction.

- `MVN` can load the bitwise complements of these values. The numerical values are  $- (n+1)$ , where  $n$  is the value available in `MOV`.
- `MOV` can load any 16-bit number, giving a range of `0x0-0xFFFF` (0-65535).

The following table shows the range of 8-bit values that can be loaded in a single A32 `MOV` or `MVN` instruction (for data processing operations). The value to load must be a multiple of the value shown in the Step column.

**Table 7-2: A32 state immediate values (8-bit)**

| Binary                                        | Decimal               | Step     | Hexadecimal    | MVN value <sup>a</sup> | Notes         |
|-----------------------------------------------|-----------------------|----------|----------------|------------------------|---------------|
| 00000000000000000000000000000000abcdefg       | 0-255                 | 1        | 0x0-0xFF       | -1 to -256             | -             |
| 00000000000000000000000000000000abcdefg00     | 0-1020                | 4        | 0x0-0x3FC      | -4 to -1024            | -             |
| 00000000000000000000000000000000abcdefg0000   | 0-4080                | 16       | 0x0-0xFF0      | -16 to -4096           | -             |
| 00000000000000000000000000000000abcdefg000000 | 0-16320               | 64       | 0x0-0x3FC0     | -64 to -16384          | -             |
| ...                                           | ...                   | ...      | ...            | ...                    | -             |
| abcdefg00000000000000000000000000000000       | 0-255 $\times 2^{24}$ | $2^{24}$ | 0x0-0xFF000000 | $1-256 \times -2^{24}$ | -             |
| cdefgh00000000000000000000000000ab            | (bit pattern)         | -        | -              | (bit pattern)          | See b in Note |
| efgh00000000000000000000000000abcd            | (bit pattern)         | -        | -              | (bit pattern)          | See b in Note |
| gh00000000000000000000000000000000abcdef      | (bit pattern)         | -        | -              | (bit pattern)          | See b in Note |

The following table shows the range of 16-bit values that can be loaded in a single `MOV A32` instruction:

**Table 7-3: A32 state immediate values in MOV instructions**

| Binary                         | Decimal | Step | Hexadecimal | MVN value | Notes         |
|--------------------------------|---------|------|-------------|-----------|---------------|
| 0000000000000000abcdefijklmnop | 0-65535 | 1    | 0x0-0xFFFF  | -         | See c in Note |

---

These notes give extra information on both tables.

**a**

The `MVN` values are only available directly as operands in `MVN` instructions.

**b**

These values are available in A32 only. All the other values in this table are also available in 32-bit T32 instructions.

**c**

These values are not available directly as operands in other instructions.

---

In T32:

- The 32-bit `MOV` instruction can load:
  - Any 8-bit immediate value, giving a range of `0x0-0xFF` (0-255).
  - Any 8-bit immediate value, shifted left by any number.
  - Any 8-bit pattern duplicated in all four bytes of a register.
  - Any 8-bit pattern duplicated in bytes 0 and 2, with bytes 1 and 3 set to 0.
  - Any 8-bit pattern duplicated in bytes 1 and 3, with bytes 0 and 2 set to 0.

These values are also available as immediate operands in many data processing operations, without being loaded in a separate instruction.

- The 32-bit `MVN` instruction can load the bitwise complements of these values. The numerical values are  $-(n+1)$ , where  $n$  is the value available in `Mov`.
  - The 32-bit `Mov` instruction can load any 16-bit number, giving a range of `0x0-0xFFFF` (0-65535). These values are not available as immediate operands in data processing operations.

In architectures with T32, the 16-bit T32 `mov` instruction can load any immediate value in the range 0-255.

The following table shows the range of values that can be loaded in a single 32-bit T32 `MOV` or `MVN` instruction (for data processing operations). The value to load must be a multiple of the value shown in the Step column.

**Table 7-4: 32-bit T32 immediate values**

| Binary                                                                              | Decimal               | Step     | Hexadecimal    | MVN value <sup>a</sup> | Notes         |
|-------------------------------------------------------------------------------------|-----------------------|----------|----------------|------------------------|---------------|
| 00000000000000000000000000000000abcdefg <sub>h</sub>                                | 0-255                 | 1        | 0x0-0xFF       | -1 to -256             | -             |
| 00000000000000000000000000000000abcdefg <sub>h0</sub>                               | 0-510                 | 2        | 0x0-0x1FE      | -2 to -512             | -             |
| 00000000000000000000000000000000abcdefg <sub>h00</sub>                              | 0-1020                | 4        | 0x0-0x3FC      | -4 to -1024            | -             |
| ...                                                                                 | ...                   | ...      | ...            | ...                    | -             |
| 0abcdefg <sub>h</sub> 00000000000000000000000000000000                              | $0-255 \times 2^{23}$ | $2^{23}$ | 0x0-0x7F800000 | $1-256 \times 2^{23}$  | -             |
| abcdefg <sub>h</sub> 00000000000000000000000000000000                               | $0-255 \times 2^{24}$ | $2^{24}$ | 0x0-0xFF000000 | $1-256 \times 2^{24}$  | -             |
| abcdefg <sub>h</sub> abcde <sub>fgh</sub> abcde <sub>fgh</sub> abcde <sub>fgh</sub> | (bit pattern)         | -        | 0xXYXYXYXY     | 0xXYXYXYXY             | -             |
| 00000000abcdefg <sub>h</sub> 00000000abcdefg <sub>h</sub>                           | (bit pattern)         | -        | 0x00XY00XY     | 0xFFXYFFXY             | -             |
| abcdefg <sub>h</sub> 00000000abcdefg <sub>h</sub> 00000000                          | (bit pattern)         | -        | 0XXY00XY00     | 0XYFFXYFF              | -             |
| 0000000000000000abcdefg <sub>hijkl</sub>                                            | 0-4095                | 1        | 0x0-0xFFFF     | -                      | See b in Note |

The following table shows the range of 16-bit values that can be loaded by the `mov` 32-bit T32 instruction:

Table 7-5: 32-bit T32 immediate values in MOV instructions

| Binary                           | Decimal | Step | Hexadecimal | MVN value | Notes         |
|----------------------------------|---------|------|-------------|-----------|---------------|
| 0000000000000000abcdefghijklmnop | 0-65535 | 1    | 0x0-0xFFFF  | -         | See c in Note |

These notes give extra information on the tables.



a

The `MVN` values are only available directly as operands in `MVN` instructions.

b

These values are available directly as operands in `ADD`, `SUB`, and `MOV` instructions, but not in `MVN` or any other data processing instructions.

**C**

These values are only available in `mov` instructions.

---

In both A32 and T32, you do not have to decide whether to use `mov` or `mvn`. The assembler uses whichever is appropriate. This is useful if the value is an assembly-time variable.

If you write an instruction with an immediate value that is not available, the assembler reports the error: `Immediate n out of range for this operation.`

#### Related information

[Load immediate values](#) on page 106

## 7.6 Load immediate values using MOV32

To load any 32-bit immediate value, a pair of `mov` and `movt` instructions is equivalent to a `MOV32` pseudo-instruction.

Both A32 and T32 instruction sets include:

- A `mov` instruction that can load any value in the range `0x00000000` to `0x0000FFFF` into a register.
- A `movt` instruction that can load any value in the range `0x0000` to `0xFFFF` into the most significant half of a register, without altering the contents of the least significant half.

You can use these two instructions to construct any 32-bit immediate value in a register.

Alternatively, you can use the `MOV32` pseudo-instruction. The assembler generates the `mov`, `movt` instruction pair for you.

You can also use the `MOV32` instruction to load addresses into registers by using a label or any PC-relative expression in place of an immediate value. The assembler puts a relocation directive into the object file for the linker to resolve the address at link-time.

#### Related information

[Register-relative and PC-relative expressions](#) on page 222

[MOV32 pseudo-instruction \(A32\)](#) on page 335

## 7.7 Load immediate values using LDR Rd, =const

The `LDR Rd, =const` pseudo-instruction generates the most efficient single instruction to load any 32-bit number.

You can use this pseudo-instruction to generate constants that are out of range of the `mov` and `mvn` instructions.

The `LDR` pseudo-instruction generates the most efficient single instruction for the specified immediate value:

- If the immediate value can be constructed with a single `MOV` or `MVN` instruction, the assembler generates the appropriate instruction.
- If the immediate value cannot be constructed with a single `MOV` or `MVN` instruction, the assembler:
  - Places the value in a *literal pool* (a portion of memory embedded in the code to hold constant values).
  - Generates an `LDR` instruction with a PC-relative address that reads the constant from the literal pool.

For example:

```
LDR      rn, [pc, #offset to literal pool]
          ; load register n with one word
          ; from the address [pc + offset]
```

You must ensure that there is a literal pool within range of the `LDR` instruction generated by the assembler.

## Related information

[Literal pools](#) on page 110

[LDR pseudo-instruction \(A32\)](#) on page 319

## 7.8 Literal pools

The assembler uses literal pools to store some constant data in code sections. You can use the `LTORG` directive to ensure a literal pool is within range.

The assembler places a literal pool at the end of each section. The end of a section is defined either by the `END` directive at the end of the assembly or by the `AREA` directive at the start of the following section. The `END` directive at the end of an included file does not signal the end of a section.

In large sections the default literal pool can be out of range of one or more `LDR` instructions. The offset from the PC to the constant must be:

- Less than 4KB in A32 or T32 code when the 32-bit `LDR` instruction is available, but can be in either direction.
- Forward and less than 1KB when only the 16-bit T32 `LDR` instruction is available.

When an `LDR Rd,=const` pseudo-instruction requires the immediate value to be placed in a literal pool, the assembler:

- Checks if the value is available and addressable in any previous literal pools. If so, it addresses the existing constant.
- Attempts to place the value in the next literal pool if it is not already available.

If the next literal pool is out of range, the assembler generates an error message. In this case you must use the `LTORG` directive to place an additional literal pool in the code. Place the `LTORG` directive after the failed `LDR` pseudo-instruction, and within the valid range for an `LDR` instruction.

You must place literal pools where the processor does not attempt to execute them as instructions. Place them after unconditional branch instructions, or after the return instruction at the end of a subroutine.

## Example of placing literal pools

The following example shows the placement of literal pools. The instructions listed as comments are the A32 instructions generated by the assembler.

```

        AREA   Loadcon, CODE, READONLY
        ENTRY
start      ; Mark first instruction to execute
        BL     func1           ; Branch to first subroutine
        BL     func2           ; Branch to second subroutine
stop       MOV    r0, #0x18          ; angel_SWIreason_ReportException
        LDR    r1, =0x20026      ; ADP_Stopped_ApplicationExit
        SVC    #0x123456         ; AArch32 semihosting (formerly SWI)
func1      LDR    r0, =42            ; => MOV R0, #42
        LDR    r1, =0x55555555      ; => LDR R1, [PC, #offset to
                                    ; Literal Pool 1]
        LDR    r2, =0xFFFFFFFF      ; => MVN R2, #0
        BX    lr                ; Literal Pool 1 contains
                                ; literal 0x55555555
        LTORG
func2      LDR    r3, =0x55555555      ; => LDR R3, [PC, #offset to
                                ; Literal Pool 1]
        ; LDR r4, =0x66666666      ; If this is uncommented it
                                ; fails, because Literal Pool 2
                                ; is out of reach
        BX    lr
LargeTable SPACE 4200             ; Starting at the current location,
                                ; clears a 4200 byte area of memory
                                ; to zero
        END

```

## Related information

[LTORG](#) on page 1470

[Load immediate values using LDR Rd, =const](#) on page 109

## 7.9 Load addresses into registers

It is often necessary to load an address into a register. There are several ways to do this.

For example, you might have to load the address of a variable, a string literal, or the start location of a jump table.

Addresses are normally expressed as offsets from a label, or from the current PC or other register.

You can load an address into a register either:

- Using the instruction `ADR`.
- Using the pseudo-instruction `ADRL`.
- Using the pseudo-instruction `MOV32`.
- From a literal pool using the pseudo-instruction `LDR Rd, =Label`.

### Related information

[Load addresses to a register using `ADR`](#) on page 112

[Load addresses to a register using `ADRL`](#) on page 114

[Load immediate values using `MOV32`](#) on page 109

[Load addresses to a register using `LDR Rd, =label`](#) on page 114

## 7.10 Load addresses to a register using `ADR`

The `ADR` instruction loads an address within a certain range, without performing a data load.

`ADR` accepts a PC-relative expression, that is, a label with an optional offset where the address of the label is relative to the PC.



The label used with `ADR` must be within the same code section. The assembler faults references to labels that are out of range in the same section.

The available range of addresses for the `ADR` instruction depends on the instruction set and encoding:

### A32

Any value that can be produced by rotating an 8-bit value right by any even number of bits within a 32-bit word. The range is relative to the PC.

### 32-bit T32 encoding

±4095 bytes to a byte, halfword, or word-aligned address.

### 16-bit T32 encoding

0 to 1020 bytes. `label` must be word-aligned. You can use the `ALIGN` directive to ensure this.

### Example of a jump table implementation with `ADR`

This example shows A32 code that implements a jump table. Here, the `ADR` instruction loads the address of the jump table.

```

        AREA   Jump, CODE, READONLY ; Name this block of code
        ARM
num     EQU    2                 ; Number of entries in jump table

```

```

start      ENTRY                                ; Mark first instruction to execute
          MOV r0, #0
          MOV r1, #3
          MOV r2, #2
          BL arithfunc                         ; Set up the three arguments
                                                ; Call the function
stop       MOV r0, #0x18
          LDR r1, =0x20026
          SVC #0x123456                         ; angel_SWIreason_ReportException
                                                ; ADP_Stopped_ApplicationExit
                                                ; AArch32 semihosting (formerly SWI)
arithfunc   CMP r0, #num                         ; Label the function
          ; Treat function code as unsigned
          ; integer
          BXHS lr                               ; If code is >= num then return
          ADR r3, JumpTable                     ; Load address of jump table
          LDR pc, [r3,r0,LSL#2]                 ; Jump to the appropriate routine
JumpTable   DCD DoAdd
          DCD DoSub
DoAdd      ADD r0, r1, r2                         ; Operation 0
          BX lr                               ; Return
DoSub      SUB r0, r1, r2                         ; Operation 1
          BX lr                               ; Return
          END                                ; Mark the end of this file

```

In this example, the function `arithfunc` takes three arguments and returns a result in `r0`. The first argument determines the operation to be carried out on the second and third arguments:

### **argument1=0**

Result = argument2 + argument3.

### **argument1=1**

Result = argument2 - argument3.

The jump table is implemented with the following instructions and assembler directives:

#### **EQU**

Is an assembler directive. You use it to give a value to a symbol. In this example, it assigns the value 2 to `num`. When `num` is used elsewhere in the code, the value 2 is substituted. Using `EQU` in this way is similar to using `#define` to define a constant in C.

#### **DCD**

Declares one or more words of store. In this example, each `DCD` stores the address of a routine that handles a particular clause of the jump table.

#### **LDR**

The `LDR PC, [R3,R0,LSL#2]` instruction loads the address of the required clause of the jump table into the PC. It:

- Multiplies the clause number in `R0` by 4 to give a word offset.
- Adds the result to the address of the jump table.
- Loads the contents of the combined address into the PC.

## Related information

[Load addresses to a register using LDR Rd, =label](#) on page 114

[Load addresses to a register using ADRL](#) on page 114

[ADR \(PC-relative\) \(A32\)](#) on page 256

## 7.11 Load addresses to a register using ADRL

The `ADRL` pseudo-instruction loads an address within a certain range, without performing a data load. The range is wider than that of the `ADR` instruction.

`ADRL` accepts a PC-relative expression, that is, a label with an optional offset where the address of the label is relative to the current PC.



The label used with `ADRL` must be within the same code section. The assembler faults references to labels that are out of range in the same section.

Note

The assembler converts an `ADRL rn, label` pseudo-instruction by generating:

- Two data processing instructions that load the address, if it is in range.
- An error message if the address cannot be constructed in two instructions.

The available range depends on the instruction set and encoding.

### A32

Any value that can be generated by two `ADD` or two `SUB` instructions. That is, any value that can be produced by the addition of two values, each of which is 8 bits rotated right by any even number of bits within a 32-bit word. The range is relative to the PC.

#### 32-bit T32 encoding

±1MB to a byte, halfword, or word-aligned address.

#### 16-bit T32 encoding

`ADRL` is not available.

### Related information

[Load addresses to a register using ADR](#) on page 112

[Load addresses to a register using LDR Rd, =label](#) on page 114

## 7.12 Load addresses to a register using LDR Rd, =label

The `LDR Rd,=label` pseudo-instruction places an address in a literal pool and then loads the address into a register.

`LDR Rd,=label` can load any 32-bit numeric value into a register. It also accepts PC-relative expressions such as labels, and labels with offsets.

The assembler converts an `LDR Rd,=label` pseudo-instruction by:

- Placing the address of `label` in a literal pool (a portion of memory embedded in the code to hold constant values).
- Generating a PC-relative `LDR` instruction that reads the address from the literal pool, for example:

```
LDR rn [pc, #offset_to_literal_pool]
; Load register n with one word
; from the address [pc + offset]
```

You must ensure that the literal pool is within range of the `LDR` pseudo-instruction that needs to access it.

### Example of loading using LDR Rd, =label

The following example shows a section with two literal pools. The final `LDR` pseudo-instruction needs to access the second literal pool, but it is out of range. Uncommenting this line causes the assembler to generate an error.

The instructions listed in the comments are the A32 instructions generated by the assembler.

```
AREA LDRlabel, CODE, READONLY
ENTRY ; Mark first instruction to execute
start
    BL func1 ; Branch to first subroutine
    BL func2 ; Branch to second subroutine
stop
    MOV r0, #0x18 ; angel_SWIreason_ReportException
    LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit
    SVC #0x123456 ; AArch32 semihosting (formerly SWI)
func1
    LDR r0, =start ; => LDR r0,[PC, #offset into Literal Pool 1]
    LDR r1, =Darea + 12 ; => LDR r1,[PC, #offset into Literal Pool 1]
    LDR r2, =Darea + 6000 ; => LDR r2,[PC, #offset into Literal Pool 1]
    BX lr ; Return
    LTORG ; Literal Pool 1
func2
    LDR r3, =Darea + 6000 ; => LDR r3,[PC, #offset into Literal Pool 1]
    ; LDR r4, =Darea + 6004 ; If uncommented, produces an error because
    ; Literal Pool 2 is out of range.
    BX lr ; Return
Darea
    SPACE 8000 ; Starting at the current location, clears
                 ; a 8000 byte area of memory to zero.
    END ; Literal Pool 2 is automatically inserted
         ; after the END directive.
         ; It is out of range of all the LDR
         ; pseudo-instructions in this example.
```

### Example of string copy

The following example shows an A32 code routine that overwrites one string with another. It uses the `LDR` pseudo-instruction to load the addresses of the two strings from a data section. The following are particularly significant:

#### **DCB**

The `DCB` directive defines one or more bytes of store. In addition to integer values, `DCB` accepts quoted strings. Each character of the string is placed in a consecutive byte.

**LDR, STR**

The `LDR` and `STR` instructions use post-indexed addressing to update their address registers. For example, the instruction:

```
LDRB    r2, [r1], #1
```

loads `r2` with the contents of the address pointed to by `r1` and then increments `r1` by 1.

The example also shows how, unlike the `ADR` and `ADRL` pseudo-instructions, you can use the `LDR` pseudo-instruction with labels that are outside the current section. The assembler places a relocation directive in the object code when the source file is assembled. The relocation directive instructs the linker to resolve the address at link time. The address remains valid wherever the linker places the section containing the `LDR` and the literal pool.

```

        AREA   StrCopy, CODE, READONLY
        ENTRY
start      ; Mark first instruction to execute
        LDR    r1, =srcstr      ; Pointer to first string
        LDR    r0, =dststr      ; Pointer to second string
        BL     strcpy          ; Call subroutine to do copy
stop       MOV    r0, #0x18      ; angel_SWIreason_ReportException
        LDR    r1, =0x20026    ; ADP_Stopped_ApplicationExit
        SVC    #0x123456      ; AArch32 semihosting (formerly SWI)
strcpy    LDRB   r2, [r1],#1    ; Load byte and update address
        STRB   r2, [r0],#1    ; Store byte and update address
        CMP    r2, #0          ; Check for zero terminator
        BNE    strcpy          ; Keep going if not
        MOV    pc,lr          ; Return
        AREA   Strings, DATA, READWRITE
srcstr    DCB    "First string - source",0
dststr    DCB    "Second string - destination",0
        END

```

**Related information**

[Load addresses to a register using ADRL](#) on page 114

[Load immediate values using LDR Rd, =const](#) on page 109

[LDR pseudo-instruction \(A32\)](#) on page 319

[DCB](#) on page 1438

## 7.13 Other ways to load and store registers

You can load and store registers using LDR, STR and MOV (register) instructions.

You can load any 32-bit value from memory into a register with an `LDR` data load instruction. To store registers into memory you can use the `STR` data store instruction.

You can use the `MOV` instruction to move any 32-bit data from one register to another.

## Related information

[Load and store multiple register instructions on page 117](#)

[Load and store multiple register instructions in A32 and T32 on page 117](#)

[MOV \(A32\) on page 333](#)

## 7.14 Load and store multiple register instructions

The A32 and T32 instruction sets include instructions that load and store multiple registers. These instructions can provide a more efficient way of transferring the contents of several registers to and from memory than using single register loads and stores.

Multiple register transfer instructions are most often used for block copy and for stack operations at subroutine entry and exit. The advantages of using a multiple register transfer instruction instead of a series of single data transfer instructions include:

- Smaller code size.
- A single instruction fetch overhead, rather than many instruction fetches.
- On uncached Arm processors, the first word of data transferred by a load or store multiple is always a nonsequential memory cycle, but all subsequent words transferred can be sequential memory cycles. Sequential memory cycles are faster in most systems.



The lowest numbered register is transferred to or from the lowest memory address accessed, and the highest numbered register to or from the highest address accessed. The order of the registers in the register list in the instructions makes no difference.

You can use the `--diag_warning 1206` assembler command line option to check that registers in register lists are specified in increasing order.

## Related information

[Load and store multiple register instructions in A32 and T32 on page 117](#)

[Stack implementation using LDM and STM on page 119](#)

[Stack operations for nested subroutines on page 120](#)

[Block copy with LDM and STM on page 121](#)

## 7.15 Load and store multiple register instructions in A32 and T32

Instructions are available in both the A32 and T32 instruction sets to load and store multiple registers.

They are:

**LDM**

Load Multiple registers.

**STM**

Store Multiple registers.

**PUSH**

Store multiple registers onto the stack and update the stack pointer.

**POP**

Load multiple registers off the stack, and update the stack pointer.

In **LDM** and **STM** instructions:

- The list of registers loaded or stored can include:
  - In A32 instructions, any or all of R0-R12, SP, LR, and PC.
  - In 32-bit T32 instructions, any or all of R0-R12, and optionally LR or PC (**LDM** only) with some restrictions.
  - In 16-bit T32 instructions, any or all of R0-R7.
- The address must be word-aligned. It can be:
  - Incremented after each transfer.
  - Incremented before each transfer (A32 instructions only).
  - Decrement after each transfer (A32 instructions only).
  - Decrement before each transfer (not in 16-bit encoded T32 instructions).
- The base register can be either:
  - Updated to point to the next block of data in memory.
  - Left as it was before the instruction.

When the base register is updated to point to the next block in memory, this is called writeback, that is, the adjusted address is written back to the base register.

In **PUSH** and **POP** instructions:

- The stack pointer (SP) is the base register, and is always updated.
- The address is incremented after each transfer in **POP** instructions, and decremented before each transfer in **PUSH** instructions.
- The list of registers loaded or stored can include:
  - In A32 instructions, any or all of R0-R12, SP, LR, and PC.
  - In 32-bit T32 instructions, any or all of R0-R12, and optionally LR or PC (**POP** only) with some restrictions.
  - In 16-bit T32 instructions, any or all of R0-R7, and optionally LR (**PUSH** only) or PC (**POP** only).



Use of SP in the list of registers in these A32 instructions is deprecated.

**Note**

32 `STM` and `PUSH` instructions that use PC in the list of registers, and A32 `LDM` and `POP` instructions that use both PC and LR in the list of registers are deprecated.

## Related information

[Load and store multiple register instructions on page 117](#)

## 7.16 Stack implementation using LDM and STM

You can use the `LDM` and `STM` instructions to implement pop and push operations respectively. You use a suffix to indicate the stack type.

The load and store multiple instructions can update the base register. For stack operations, the base register is usually the stack pointer, SP. This means that you can use these instructions to implement push and pop operations for any number of registers in a single instruction.

The load and store multiple instructions can be used with several types of stack:

### Descending or ascending

The stack grows downwards, starting with a high address and progressing to a lower one (a descending stack), or upwards, starting from a low address and progressing to a higher address (an ascending stack).

### Full or empty

The stack pointer can either point to the last item in the stack (a full stack), or the next free space on the stack (an empty stack).

To make it easier for the programmer, stack-oriented suffixes can be used instead of the increment or decrement, and before or after suffixes. The following table shows the stack-oriented suffixes and their equivalent addressing mode suffixes for load and store instructions:

**Table 7-6: Stack-oriented suffixes and equivalent addressing mode suffixes**

| Stack-oriented suffix       | For store or push instructions | For load or pop instructions |
|-----------------------------|--------------------------------|------------------------------|
| FD (Full Descending stack)  | DB (Decrement Before)          | IA (Increment After)         |
| FA (Full Ascending stack)   | IB (Increment Before)          | DA (Decrement After)         |
| ED (Empty Descending stack) | DA (Decrement After)           | IB (Increment Before)        |
| EA (Empty Ascending stack)  | IA (Increment After)           | DB (Decrement Before)        |

The following table shows the load and store multiple instructions with the stack-oriented suffixes for the various stack types:

**Table 7-7: Suffixes for load and store multiple instructions**

| Stack type       | Store                           | Load                            |
|------------------|---------------------------------|---------------------------------|
| Full descending  | STMFD (STMDB, Decrement Before) | LDMFD (LDM, increment after)    |
| Full ascending   | STMFA (STMIB, Increment Before) | LDMFA (LDMDA, Decrement After)  |
| Empty descending | STMED (STMDA, Decrement After)  | LDMED (LDMIB, Increment Before) |
| Empty ascending  | STMEA (STM, increment after)    | LDMEA (LDMDB, Decrement Before) |

For example:

```
STMFD    sp!, {r0-r5} ; Push onto a Full Descending Stack
LDMFD    sp!, {r0-r5} ; Pop from a Full Descending Stack
```



The *Procedure Call Standard for the Arm Architecture* (AAPCS), and `armclang` always use a full descending stack.

The `PUSH` and `POP` instructions assume a full descending stack. They are the preferred synonyms for `STMDB` and `LDM` with writeback.

## Related information

[LDM \(A32\)](#) on page 307

[Load and store multiple register instructions](#) on page 117

[Procedure Call Standard for the Arm Architecture](#)

## 7.17 Stack operations for nested subroutines

Stack operations can be very useful at subroutine entry and exit to avoid losing register contents if other subroutines are called.

At the start of a subroutine, any working registers required can be stored on the stack, and at exit they can be popped off again.

In addition, if the link register is pushed onto the stack at entry, additional subroutine calls can be made safely without causing the return address to be lost. If you do this, you can also return from a subroutine by popping the PC off the stack at exit, instead of popping the LR and then moving that value into the PC. For example:

```
subroutine PUSH    {r5-r7,lr} ; Push work registers and lr
; code
BL      somewhere_else
; code
POP     {r5-r7,pc} ; Pop work registers and pc
```

## Related information

[Register usage in subroutine calls](#) on page 104

[Load and store multiple register instructions](#) on page 117

## Procedure Call Standard for the Arm Architecture

### Procedure Call Standard for the Arm 64-bit Architecture (AArch64)

## 7.18 Block copy with LDM and STM

You can sometimes make code more efficient by using LDM and STM instead of LDR and STR instructions.

### Example of block copy without LDM and STM

The following example is an A32 code routine that copies a set of words from a source location to a destination a single word at a time:

```

num      AREA  Word, CODE, READONLY ; name the block of code
EQU      20                ; set number of words to be copied
ENTRY
start
        LDR    r0, =src          ; r0 = pointer to source block
        LDR    r1, =dst          ; r1 = pointer to destination block
        MOV    r2, #num          ; r2 = number of words to copy
wordcopy
        LDR    r3, [r0], #4      ; load a word from the source and
        STR    r3, [r1], #4      ; store it to the destination
        SUBS   r2, r2, #1        ; decrement the counter
        BNE    wordcopy         ; ... copy more
stop
        MOV    r0, #0x18          ; angel_SWIreason_ReportException
        LDR    r1, =0x20026        ; ADP_Stopped_ApplicationExit
        SVC    #0x123456          ; AArch32 semihosting (formerly SWI)
        AREA  BlockData, DATA, READWRITE
src     DCD    1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4
dst     DCD    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
END

```

You can make this module more efficient by using LDM and STM for as much of the copying as possible. Eight is a sensible number of words to transfer at a time, given the number of available registers. You can find the number of eight-word multiples in the block to be copied (if R2 = number of words to be copied) using:

```
MOVS    r3, r2, LSR #3      ; number of eight word multiples
```

You can use this value to control the number of iterations through a loop that copies eight words per iteration. When there are fewer than eight words left, you can find the number of words left (assuming that R2 has not been corrupted) using:

```
ANDS    r2, r2, #7
```

### Example of block copy using LDM and STM

The following example lists the block copy module rewritten to use LDM and STM for copying:

```

num      AREA  Block, CODE, READONLY ; name this block of code
EQU      20                ; set number of words to be copied

```

```

        ENTRY                                ; mark the first instruction called
start    LDR      r0, =src                ; r0 = pointer to source block
        LDR      r1, =dst                ; r1 = pointer to destination block
        MOV      r2, #num               ; r2 = number of words to copy
        MOV      sp, #0x400             ; Set up stack pointer (sp)
blockcopy MOVS     r3,r2, LSR #3       ; Number of eight word multiples
        BEQ      copywords            ; Fewer than eight words to move?
        PUSH    {r4-r11}              ; Save some working registers
octcopy   LDM      r0!, {r4-r11}          ; Load 8 words from the source
        STM      r1!, {r4-r11}          ; and put them at the destination
        SUBS   r3, r3, #1              ; Decrement the counter
        BNE      octcopy              ; ... copy more
        POP      {r4-r11}              ; Don't require these now - restore
                                    ; originals
copywords ANDS     r2, r2, #7           ; Number of odd words to copy
        BEQ      stop                 ; No words left to copy?
wordcopy   LDR      r3, [r0], #4          ; Load a word from the source and
        STR      r3, [r1], #4          ; store it to the destination
        SUBS   r2, r2, #1              ; Decrement the counter
        BNE      wordcopy              ; ... copy more
stop      MOV      r0, #0x18              ; angel_SWIreason_ReportException
        LDR      r1, =0x20026           ; ADP_Stopped_ApplicationExit
        SVC      #0x123456             ; AArch32 semihosting (formerly SWI)
src       AREA    BlockData, DATA, READWRITE
dst       DCD      1,2,3,4,5,6,7,8,1,2,3,4
src       DCD      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
END

```



The purpose of this example is to show the use of the `LDM` and `STM` instructions. There are other ways to perform bulk copy operations, the most efficient of which depends on many factors and is outside the scope of this document.

## 7.19 Memory accesses

Many load and store instructions support different addressing modes.

### Offset addressing

The offset value is applied to an address obtained from the base register. The result is used as the address for the memory access. The base register is unchanged. The assembly language syntax for this mode is:

```
[Rn, offset]
```

## Pre-indexed addressing

The offset value is applied to an address obtained from the base register. The result is used as the address for the memory access, and written back into the base register. The assembly language syntax for this mode is:

```
[Rn, offset]!
```

## Post-indexed addressing

The address obtained from the base register is used, unchanged, as the address for the memory access. The offset value is applied to the address, and written back into the base register. The assembly language syntax for this mode is:

```
[Rn], offset
```

In each case, *Rn* is the base register and *offset* can be:

- An immediate constant.
- An index register, *Rm*.
- A shifted index register, such as *Rm, LSL #shift*.

## Related information

[Registers in AArch32 state](#) on page 77

[Address alignment in A32/T32 code](#) on page 157

## 7.20 The Read-Modify-Write operation

The read-modify-write operation ensures that you modify only the specific bits in a system register that you want to change.

Individual bits in a system register control different system functionality. Modifying the wrong bits in a system register might cause your program to behave incorrectly.

|      |                     |                                                |
|------|---------------------|------------------------------------------------|
| VMRS | r10,FPSCR           | ; copy FPSCR into the general-purpose r10      |
| BIC  | r10,r10,#0x00370000 | ; clear STRIDE bits[21:20] and LEN bits[18:16] |
| ORR  | r10,r10,#0x00030000 | ; set bits[17:16] (STRIDE =1 and LEN = 4)      |
| VMSR | FPSCR,r10           | ; copy r10 back into FPSCR                     |

To read-modify-write a system register, the instruction sequence is:

1. The first instruction copies the value from the target system register to a temporary general-purpose register.
2. The next one or more instructions modify the required bits in the general-purpose register. This can be one or both of:
  - **BIC** to clear to 0 only the bits that must be cleared.
  - **ORR** to set to 1 only the bits that must be set.

3. The final instruction writes the value from the general-purpose register to the target system register.

### Related information

[Register accesses in AArch32 state](#) on page 80

[The Q flag in AArch32 state](#) on page 83

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

[VMRS \(A32\)](#) on page 567

## 7.21 Optional hash with immediate constants

You do not have to specify a hash before an immediate constant in any instruction syntax.

This applies to A32, T32, Advanced SIMD, and floating-point instructions. For example, the following are valid instructions:

```
BKPT 100  
MOVT R1, 256  
VCEQ.I8 Q1, Q2, 0
```

By default, the assembler warns if you do not specify a hash:

```
WARNING: A1865W: '#' not seen before constant expression.
```

You can suppressed this with `--diag_suppress=1865`.

If you use the assembly code with another assembler, you are advised to use the # before all immediates. The disassembler always shows the # for clarity.

### Related information

[A32 and T32 Instructions](#) on page 240

[Advanced SIMD Instructions \(32-bit\)](#) on page 505

## 7.22 Use of macros

A macro definition is a block of code enclosed between MACRO and MEND directives. It defines a name that you can use as a convenient alternative to repeating the block of code.

The main uses for a macro are:

- To make it easier to follow the logic of the source code by replacing a block of code with a single meaningful name.
- To avoid repeating a block of code several times.

## Related information

[Test-and-branch macro example](#) on page 125

[Unsigned integer division macro example](#) on page 125

[MACRO and MEND](#) on page 1471

## 7.23 Test-and-branch macro example

You can use a macro to perform a test-and-branch operation.

In A32 code, a test-and-branch operation requires two instructions to implement.

You can define a macro such as this:

```
MACRO
$label TestAndBranch $dest, $reg, $cc
$label CMP      $reg, #0
$label B$cc    $dest
MEND
```

The line after the `MACRO` directive is the macro prototype statement. This defines the name (`TestAndBranch`) you use to invoke the macro. It also defines parameters (`$label`, `$dest`, `$reg`, and `$cc`). Unspecified parameters are substituted with an empty string. For this macro you must give values for `$dest`, `$reg` and `$cc` to avoid syntax errors. The assembler substitutes the values you give into the code.

This macro can be invoked as follows:

```
test   TestAndBranch NonZero, r0, NE
      ...
      ...
NonZero
```

After substitution this becomes:

```
test   CMP      r0, #0
      BNE    NonZero
      ...
      ...
NonZero
```

## Related information

[Use of macros](#) on page 124

[Unsigned integer division macro example](#) on page 125

[Numeric local labels](#) on page 225

## 7.24 Unsigned integer division macro example

You can use a macro to perform unsigned integer division.

The macro takes the following parameters:

### \$Bot

The register that holds the divisor.

### \$Top

The register that holds the dividend before the instructions are executed. After the instructions are executed, it holds the remainder.

### \$Div

The register where the quotient of the division is placed. It can be `NULL ("")` if only the remainder is required.

### \$Temp

A temporary register used during the calculation.

### Example unsigned integer division with a macro

```

$Lab      MACRO
         DivMod $Div,$Top,$Bot,$Temp
         ASSERT $Top <> $Bot          ; Produce an error message if the
         ASSERT $Top <> $Temp         ; registers supplied are
         ASSERT $Bot <> $Temp         ; not all different
         IF      "$Div" <> ""
             ASSERT $Div <> $Top       ; These three only matter if $Div
             ASSERT $Div <> $Bot       ; is not null ("")
             ASSERT $Div <> $Temp       ;
         ENDIF
$Lab      MOV      $Temp, $Bot           ; Put divisor in $Temp
90       CMP      $Temp, $Top, LSR #1    ; double it until
         MOVL $Temp, $Temp, LSL #1    ; 2 * $Temp > $Top
         CMP      $Temp, $Top, LSR #1
         BLS     %b90                ; The b means search backwards
         IF      "$Div" <> ""        ; Omit next instruction if $Div
                                         ; is null
             MOV      $Div, #0          ; Initialize quotient
         ENDIF
91       CMP      $Top, $Temp          ; Can we subtract $Temp?
         SUBCS $Top, $Top,$Temp       ; If we can, do so
         IF      "$Div" <> ""        ; Omit next instruction if $Div
                                         ; is null
             ADC      $Div, $Div, $Div ; Double $Div
         ENDIF
         MOV      $Temp, $Temp, LSR #1 ; Halve $Temp,
         CMP      $Temp, $Bot          ; and loop until
         BHS     %b91                ; less than divisor
MEND

```

The macro checks that no two parameters use the same register. It also optimizes the code produced if only the remainder is required.

To avoid multiple definitions of labels if `DivMod` is used more than once in the assembler source, the macro uses numeric local labels (90, 91).

The following example shows the code that this macro produces if it is invoked as follows:

```
ratio  DivMod  R0,R5,R4,R2
```

### Output from the example division macro

```

        ASSERT r5 <> r4          ; Produce an error if the
        ASSERT r5 <> r2          ; registers supplied are
        ASSERT r4 <> r2          ; not all different
        ASSERT r0 <> r5          ; These three only matter if $Div
        ASSERT r0 <> r4          ; is not null ("")
        ASSERT r0 <> r2          ;
ratio
        MOV    r2, r4              ; Put divisor in $Temp
        90   CMP    r2, r5, LSR #1 ; double it until
        MOVLS r2, r2, LSL #1     ; 2 * r2 > r5
        CMP    r2, r5, LSR #1
        BLS    %b90               ; The b means search backwards
        MOV    r0, #0               ; Initialize quotient
        91   CMP    r5, r2          ; Can we subtract r2?
        SUBCS r5, r5, r2          ; If we can, do so
        ADC    r0, r0, r0          ; Double r0
        MOV    r2, r2, LSR #1     ; Halve r2,
        CMP    r2, r4               ; and loop until
        BHS    %b91               ; less than divisor

```

### Related information

[Use of macros](#) on page 124

[Test-and-branch macro example](#) on page 125

[Numeric local labels](#) on page 225

## 7.25 Instruction and directive relocations

The assembler can embed relocation directives in object files to indicate labels with addresses that are unknown at assembly time. The assembler can relocate several types of instruction.

A relocation is a directive embedded in the object file that enables source code to refer to a label whose target address is unknown or cannot be calculated at assembly time. The assembler emits a relocation in the object file, and the linker resolves this to the address where the target is placed.

The assembler relocates the data directives DCB, DCW, DCWU, DCD, and DCDU if their syntax contains an external symbol, that is a symbol declared using IMPORT or EXTERN. This causes the bottom 8, 16, or 32 bits of the address to be used at link-time.

The REQUIRE directive emits a relocation to signal to the linker that the target label must be present if the current section is present.

The assembler is permitted to emit a relocation for these instructions:

#### LDR (PC-relative)

All A32 and T32 instructions, except the T32 doubleword instruction, can be relocated.

**PLD, PLDW, and PLI**

All A32 and T32 instructions can be relocated.

**B, BL, and BLX**

All A32 and T32 instructions can be relocated.

**CBZ and CBNZ**

All T32 instructions can be relocated but this is discouraged because of the limited branch range of these instructions.

**LDC and LDC2**

Only A32 instructions can be relocated.

**VLDR**

Only A32 instructions can be relocated.

The assembler emits a relocation for these instructions if the label used meets any of the following requirements, as appropriate for the instruction type:

- The label is `WEAK`.
- The label is not in the same `AREA`.
- The label is external to the object (`IMPORT` or `EXTERN`).

For `B`, `BL`, and `BX` instructions, the assembler emits a relocation also if:

- The label is a function.
- The label is exported using `EXPORT` or `GLOBAL`.



You can use the `RELOC` directive to control the relocation at a finer level, but this requires knowledge of the ABI.

**Note****Example**

```
IMPORT sym      ; sym is an external symbol
DCW sym        ; Because DCW only outputs 16 bits, only the lower
                ; 16 bits of the address of sym are inserted at
                ; link-time.
```

**Related information**

[AREA](#) on page 1428

[EXPORT or GLOBAL](#) on page 1448

[IMPORT and EXTERN](#) on page 1465

[REQUIRE](#) on page 1480

[RELOC](#) on page 1478

[DCB](#) on page 1438

[DCD and DCDU](#) on page 1439

[DCW and DCWU](#) on page 1444

[LDR \(PC-relative\) \(A32\)](#) on page 312  
[ADR \(PC-relative\) \(A32\)](#) on page 256  
[PLD, PLDW, and PLI \(A32\)](#) on page 355  
[B \(A32\)](#) on page 264  
[CBZ and CBNZ \(A32\)](#) on page 275  
[LDC and LDC2 \(A32\)](#) on page 306  
[VLDR \(A32\)](#) on page 552  
[ELF for the Arm Architecture](#)

## 7.26 Symbol versions

The Arm® linker conforms to the *Base Platform ABI for the Arm Architecture* (BPABI) and supports the GNU-extended symbol versioning model.

To add a symbol version to an existing symbol, you must define a version symbol at the same address. A version symbol is of the form:

- `name@ver` if `ver` is a non default version of `name`.
- `name@@ver` if `ver` is the default version of `name`.

The version symbols must be enclosed in vertical bars.

For example, to define a default version:

```
|my_versioned_symbol@@ver2|    ; Default version
my_asm_function PROC
...
BX lr
ENDP
```

To define a non default version:

```
|my_versioned_symbol@ver1|    ; Non default version
my_old_asm_function      PROC
...
BX lr
ENDP
```

### Related information

[Base Platform ABI for the Arm Architecture](#)

[Accessing and managing symbols with armlink](#)

## 7.27 Frame directives

Frame directives provide information in object files that enables debugging and profiling of assembly language functions.

You must use frame directives to describe the way that your code uses the stack if you want to be able to do either of the following:

- Debug your application using stack unwinding.
- Use either flat or call-graph profiling.

The assembler uses frame directives to insert DWARF debug frame information into the object file in ELF format that it produces. This information is required by a debugger for stack unwinding and for profiling.

Be aware of the following:

- Frame directives do not affect the code produced by the assembler.
- The assembler does not validate the information in frame directives against the instructions emitted.

### Related information

[Exception tables and Unwind tables on page 130](#)

[About frame directives on page 1424](#)

[Procedure Call Standard for the Arm Architecture](#)

## 7.28 Exception tables and Unwind tables

You use `FRAME` directives to enable the assembler to generate unwind tables.



Not supported for AArch64 state.

Note

Exception tables are necessary to handle exceptions thrown by functions in high-level languages such as C++. Unwind tables contain debug frame information which is also necessary for the handling of such exceptions. An exception can only propagate through a function with an unwind table.

An assembly language function is code enclosed by either `PROC` and `ENDP` or `FUNC` and `ENDFUNC` directives. Functions written in C++ have unwind information by default. However, for assembly language functions that are called from C++ code, you must ensure that there are exception tables and unwind tables to enable the exceptions to propagate through them.

An exception cannot propagate through a function with a nounwind table. The exception handling runtime environment terminates the program if it encounters a nounwind table during exception processing.

The assembler can generate nounwind table entries for all functions and non-functions. The assembler can generate an unwind table for a function only if the function contains sufficient `FRAME` directives to describe the use of the stack within the function. To be able to create an unwind table for a function, each `POP` or `PUSH` instruction must be followed by a `FRAME POP` or `FRAME PUSH` directive respectively. Functions must conform to the conditions set out in the *Exception Handling ABI for the Arm Architecture (EHABI)*, section 9.1 *Constraints on Use*. If the assembler cannot generate an unwind table it generates a nounwind table.

## Related information

[About frame directives](#) on page 1424

[--exceptions, --no\\_exceptions](#) on page 202

[--exceptions\\_unwind, --no\\_exceptions\\_unwind](#) on page 202

[FRAME UNWIND ON](#) on page 1458

[FRAME UNWIND OFF](#) on page 1459

[FUNCTION or PROC](#) on page 1459

[ENDFUNC or ENDP](#) on page 1446

[Frame directives](#) on page 129

[Exception Handling ABI for the Arm Architecture](#)

# 8. Condition Codes

Describes condition codes and conditional execution of A64, A32, and T32 code.

## 8.1 Conditional instructions

A32 and T32 instructions can execute conditionally on the condition flags set by a previous instruction.

The conditional instruction can occur either:

- Immediately after the instruction that updated the flags.
- After any number of intervening instructions that have not updated the flags.

In AArch32 state, whether an instruction can be conditional or not depends on the instruction set state that the processor is in. Few A64 instructions can be conditionally executed.

To make an instruction conditional, you must add a condition code suffix to the instruction mnemonic. The condition code suffix enables the processor to test a condition based on the flags. If the condition test of a conditional instruction fails, the instruction:

- Does not execute.
- Does not write any value to its destination register.
- Does not affect any of the flags.
- Does not generate any exception.

### Related information

[Condition code suffixes and related flags](#) on page 139

[Updates to the condition flags in A32/T32 code](#) on page 135

[Updates to the condition flags in A64 code](#) on page 136

[Conditional execution in A32 code](#) on page 132

[Conditional execution in T32 code](#) on page 133

## 8.2 Conditional execution in A32 code

Almost all A32 instructions can be executed conditionally on the value of the condition flags in the APSR. You can either add a condition code suffix to the instruction or you can conditionally skip over the instruction using a conditional branch instruction.

Using conditional branch instructions to control the flow of execution can be more efficient when a series of instructions depend on the same condition.

## Conditional instructions to control execution

```
; flags set by a previous instruction
LSLEQ r0, r0, #24
ADDEQ r0, r0, #2
;...
```

## Conditional branch to control execution

```
; flags set by a previous instruction
BNE over
LSL r0, r0, #24
ADD r0, r0, #2
over
;...
```

## Related information

[Conditional execution in T32 code](#) on page 133

## 8.3 Conditional execution in T32 code

In T32 code, there are several ways to achieve conditional execution. You can conditionally skip over the instruction using a conditional branch instruction.

Instructions can also be conditionally executed by using either of the following:

- `CBZ` and `CBNZ`.
- The `IT` (If-Then) instruction.

The T32 `CBZ` (Conditional Branch on Zero) and `CBNZ` (Conditional Branch on Non-Zero) instructions compare the value of a register against zero and branch on the result.

`IT` is a 16-bit instruction that enables a single subsequent 16-bit T32 instruction from a restricted set to be conditionally executed, based on the value of the condition flags, and the condition code suffix specified.

## Conditional instructions using IT block

```
; flags set by a previous instruction
IT EQ
LSLEQ r0, r0, #24
;...
```

The use of the `IT` instruction is deprecated when any of the following are true:

- There is more than one instruction in the `IT` block.
- There is a 32-bit instruction in the `IT` block.
- The instruction in the `IT` block references the PC.

## Related information

[IT \(A32\)](#) on page 300

[CBZ and CBNZ \(A32\)](#) on page 275

[Conditional execution in A32 code](#) on page 132

## 8.4 Conditional execution in A64 code

In the A64 instruction set, there are a few instructions that are truly conditional. Truly conditional means that when the condition is false, the instruction advances the program counter but has no other effect.

The conditional branch, `B.cond` is a truly conditional instruction. The condition code is appended to the instruction with a ":" delimiter, for example `B.EQ`.

There are other truly conditional branch instructions that execute depending on the value of the Zero condition flag. You cannot append any condition code suffix to them. These instructions are:

- `CBNZ`.
- `CBZ`.
- `TBNZ`.
- `TBZ`.

There are a few A64 instructions that are unconditionally executed but use the condition code as a source operand. These instructions always execute but the operation depends on the value of the condition code. These instructions can be categorized as:

- Conditional data processing instructions, for example `CSEL`.
- Conditional comparison instructions, `CCIN` and `CCMP`.

In these instructions, you specify the condition code in the final operand position, for example `CSEL Wd, Wm, Wn, NE`.

### Related information

[Conditional execution in T32 code](#) on page 133

[Conditional execution in A32 code](#) on page 132

## 8.5 Condition flags

The N, Z, C, and V condition flags are held in the APSR.

The condition flags are held in the APSR. They are set or cleared as follows:

### N

Set to 1 when the result of the operation is negative, cleared to 0 otherwise.

### Z

Set to 1 when the result of the operation is zero, cleared to 0 otherwise.

**C**

Set to 1 when the operation results in a carry, or when a subtraction results in no borrow, cleared to 0 otherwise.

**V**

Set to 1 when the operation causes overflow, cleared to 0 otherwise.

C is set in one of the following ways:

- For an addition, including the comparison instruction `CMN`, C is set to 1 if the addition produced a carry (that is, an unsigned overflow), and to 0 otherwise.
- For a subtraction, including the comparison instruction `CMP`, C is set to 0 if the subtraction produced a borrow (that is, an unsigned underflow), and to 1 otherwise.
- For non-addition/subtractions that incorporate a shift operation, C is set to the last bit shifted out of the value by the shifter.
- For other non-addition/subtractions, C is normally left unchanged, but see the individual instruction descriptions for any special cases.

Overflow occurs if the result of a signed add, subtract, or compare is greater than or equal to  $2^{31}$ , or less than  $-2^{31}$ .

### Related information

[Updates to the condition flags in A32/T32 code](#) on page 135

[Updates to the condition flags in A64 code](#) on page 136

[Condition code suffixes and related flags](#) on page 139

## 8.6 Updates to the condition flags in A32/T32 code

In AArch32 state, the condition flags are held in the Application Program Status Register (APSR). You can read and modify the flags using the read-modify-write procedure.

Most A32 and T32 data processing instructions have an option to update the condition flags according to the result of the operation. Instructions with the optional S suffix update the flags. Conditional instructions that are not executed have no effect on the flags.

Which flags are updated depends on the instruction. Some instructions update all flags, and some update a subset of the flags. If a flag is not updated, the original value is preserved. The description of each instruction mentions the effect that it has on the flags.



Most instructions update the condition flags only if the S suffix is specified. The instructions `CMP`, `CMN`, `TEQ`, and `TST` always update the flags.

## Related information

[Condition flags](#) on page 134

[Updates to the condition flags in A64 code](#) on page 136

[Condition code suffixes and related flags](#) on page 139

[Conditional instructions](#) on page 132

[A32 and T32 Instructions](#) on page 240

## 8.7 Updates to the condition flags in A64 code

In AArch64 state, the N, Z, C., and V condition flags are held in the NZCV system register, which is part of the process state. You can access the flags using the `MRS` and `MRS` instructions.



An instruction updates the condition flags only if the S suffix is specified, except the instructions `CMP`, `CMN`, `CCMP`, `CCMN`, and `TST`, which always update the condition flags. The instruction also determines which flags get updated. If a conditional instruction does not execute, it does not affect the flags.

### Example

This example shows the read-modify-write procedure to change some of the condition flags in A64 code.

```

MRS  x1, NZCV          ; copy N, Z, C, and V flags into general-purpose x1
MOV  x2, #0x30000000
BIC  x1,x1,x2          ; clears the C and V flags (bits 29,28)
ORR  x1,x1,#0xC0000000 ; sets the N and Z flags (bits 31,30)
MSR  NZCV, x1           ; copy x1 back into NZCV register to update the condition
                         flags

```

## Related information

[Condition flags](#) on page 134

[Updates to the condition flags in A32/T32 code](#) on page 135

[Condition code suffixes and related flags](#) on page 139

[Conditional instructions](#) on page 132

## 8.8 Floating-point instructions that update the condition flags

The only A32/T32 floating-point instructions that can update the condition flags are `VCMPE` and `VCMPE`. Other floating-point or Advanced SIMD instructions cannot modify the flags.

`VCMPE` and `VCMPE` do not update the flags directly, but update a separate set of flags in the Floating-Point Status and Control Register (FPSCR). To use these flags to control conditional instructions,

including conditional floating-point instructions, you must first update the condition flags yourself. To do this, copy the flags from the FPSCR into the APSR using a `VMRS` instruction:

```
VMRS APSR_nzcv, FPSCR
```

All A64 floating-point comparison instructions can update the condition flags. These instructions update the flags directly in the NZCV register.

## Related information

[Updates to the condition flags in A64 code](#) on page 136

[The Read-Modify-Write operation](#) on page 123

[Carry flag](#) on page 137

[Overflow flag](#) on page 138

[VCMP, VCMPE \(A32\)](#) on page 629

[VMRS \(A32\)](#) on page 567

[VMRS \(floating-point\) \(A32\)](#) on page 650

[A-Profile Architectures](#)

## 8.9 Carry flag

The carry (C) flag is set when an operation results in a carry, or when a subtraction results in no borrow.

In A32/T32 code, C is set in one of the following ways:

- For an addition, including the comparison instruction `CIN`, C is set to 1 if the addition produced a carry (that is, an unsigned overflow), and to 0 otherwise.
- For a subtraction, including the comparison instruction `CMP`, C is set to 0 if the subtraction produced a borrow (that is, an unsigned underflow), and to 1 otherwise.
- For non-additions/subtractions that incorporate a shift operation, C is set to the last bit shifted out of the value by the shifter.
- For other non-additions/subtractions, C is normally left unchanged, but see the individual instruction descriptions for any special cases.
- The floating-point compare instructions, `VCMP` and `VCMPE` set the C flag and the other condition flags in the FPSCR to the result of the comparison.

In A64 code, C is set in one of the following ways:

- For an addition, including the comparison instruction `CIN`, C is set to 1 if the addition produced a carry (that is, an unsigned overflow), and to 0 otherwise.
- For a subtraction, including the comparison instruction `CMP` and the negate instructions `NEGS` and `NEGCS`, C is set to 0 if the subtraction produced a borrow (that is, an unsigned underflow), and to 1 otherwise.

- For the integer and floating-point conditional compare instructions `CCMP`, `CCMN`, `FCCMP`, and `FCCMPE`, C and the other condition flags are set either to the result of the comparison, or directly from an immediate value.
- For the floating-point compare instructions, `FCMP` and `FCMPE`, C and the other condition flags are set to the result of the comparison.
- For other instructions, C is normally left unchanged, but see the individual instruction descriptions for any special cases.

## Related information

[Predeclared core register names in AArch32 state](#) on page 81

[Predeclared core register names in AArch64 state](#) on page 91

[Condition code suffixes and related flags](#) on page 139

[Updates to the condition flags in A32/T32 code](#) on page 135

[Updates to the condition flags in A64 code](#) on page 136

[Overflow flag](#) on page 138

## 8.10 Overflow flag

Overflow can occur for add, subtract, and compare operations.

In A32/T32 code, overflow occurs if the result of the operation is greater than or equal to  $2^{31}$ , or less than  $-2^{31}$ .

In A64 instructions that use the 64-bit X registers, overflow occurs if the result of the operation is greater than or equal to  $2^{63}$ , or less than  $-2^{63}$ .

In A64 instructions that use the 32-bit W registers, overflow occurs if the result of the operation is greater than or equal to  $2^{31}$ , or less than  $-2^{31}$ .

## Related information

[Predeclared core register names in AArch32 state](#) on page 81

[Updates to the condition flags in A32/T32 code](#) on page 135

[Updates to the condition flags in A64 code](#) on page 136

[Carry flag](#) on page 137

## 8.11 Condition code suffixes

Instructions that can be conditional have an optional two character condition code suffix.

Condition codes are shown in syntax descriptions as *cond*. The following table shows the condition codes that you can use:

**Table 8-1: Condition code suffixes**

| Suffix | Meaning                                   |
|--------|-------------------------------------------|
| EQ     | Equal                                     |
| NE     | Not equal                                 |
| CS     | Carry set (identical to HS)               |
| HS     | Unsigned higher or same (identical to CS) |
| CC     | Carry clear (identical to LO)             |
| LO     | Unsigned lower (identical to CC)          |
| MI     | Minus or negative result                  |
| PL     | Positive or zero result                   |
| VS     | Overflow                                  |
| VC     | No overflow                               |
| HI     | Unsigned higher                           |
| LS     | Unsigned lower or same                    |
| GE     | Signed greater than or equal              |
| LT     | Signed less than                          |
| GT     | Signed greater than                       |
| LE     | Signed less than or equal                 |
| AL     | Always (this is the default)              |



The meaning of some of these condition codes depends on whether the instruction that last updated the condition flags is a floating-point or integer instruction.

## Related information

- [Comparison of condition code meanings in integer and floating-point code](#) on page 140
- [Conditional execution of A32/T32 Advanced SIMD instructions](#) on page 166
- [Conditional execution of A32/T32 floating-point instructions](#) on page 180
- [IT \(A32\)](#) on page 300
- [VMRS \(A32\)](#) on page 567
- [VMRS \(floating-point\) \(A32\)](#) on page 650

## 8.12 Condition code suffixes and related flags

Condition code suffixes define the conditions that must be met for the instruction to execute.

The following table shows the condition codes that you can use and the flag settings they depend on:

**Table 8-2: Condition code suffixes and related flags**

| Prefix   | Flags                     | Meaning                                  |
|----------|---------------------------|------------------------------------------|
| EQ       | Z set                     | Equal                                    |
| NE       | Z clear                   | Not equal                                |
| CS or HS | C set                     | Higher or same (unsigned $\geq$ )        |
| CC or LO | C clear                   | Lower (unsigned $<$ )                    |
| MI       | N set                     | Negative                                 |
| PL       | N clear                   | Positive or zero                         |
| VS       | V set                     | Overflow                                 |
| VC       | V clear                   | No overflow                              |
| HI       | C set and Z clear         | Higher (unsigned $>$ )                   |
| LS       | C clear or Z set          | Lower or same (unsigned $\leq$ )         |
| GE       | N and V the same          | Signed $\geq$                            |
| LT       | N and V differ            | Signed $<$                               |
| GT       | Z clear, N and V the same | Signed $>$                               |
| LE       | Z set, N and V differ     | Signed $\leq$                            |
| AL       | Any                       | Always. This suffix is normally omitted. |

The optional condition code is shown in syntax descriptions as {cond}. This condition is encoded in A32 instructions and in A64 instructions. For T32 instructions, the condition is encoded in a preceding IT instruction. An instruction with a condition code is only executed if the condition flags meet the specified condition.

The following is an example of conditional execution in A32 code:

```

ADD      r0, r1, r2      ; r0 = r1 + r2, don't update flags
ADDS     r0, r1, r2      ; r0 = r1 + r2, and update flags
ADDSCS   r0, r1, r2      ; If C flag set then r0 = r1 + r2,
                        ; and update flags
CMP      r0, r1          ; update flags based on r0-r1.

```

## Related information

[Condition flags](#) on page 134

[Comparison of condition code meanings in integer and floating-point code](#) on page 140

[Conditional instructions](#) on page 132

[Updates to the condition flags in A32/T32 code](#) on page 135

[Updates to the condition flags in A64 code](#) on page 136

[A32 and T32 Instructions](#) on page 240

## 8.13 Comparison of condition code meanings in integer and floating-point code

The meaning of the condition code mnemonic suffixes depends on whether the condition flags were set by a floating-point instruction or by an A32 or T32 data processing instruction.

This is because:

- Floating-point values are never unsigned, so the unsigned conditions are not required.
- Not-a-Number (NaN) values have no ordering relationship with numbers or with each other, so additional conditions are required to account for unordered results.

The meaning of the condition code mnemonic suffixes is shown in the following table:

**Table 8-3: Condition codes**

| Suffix | Meaning after integer data processing instruction | Meaning after floating-point instruction |
|--------|---------------------------------------------------|------------------------------------------|
| EQ     | Equal                                             | Equal                                    |
| NE     | Not equal                                         | Not equal, or unordered                  |
| CS     | Carry set                                         | Greater than or equal, or unordered      |
| HS     | Unsigned higher or same                           | Greater than or equal, or unordered      |
| CC     | Carry clear                                       | Less than                                |
| LO     | Unsigned lower                                    | Less than                                |
| MI     | Negative                                          | Less than                                |
| PL     | Positive or zero                                  | Greater than or equal, or unordered      |
| VS     | Overflow                                          | Unordered (at least one NaN operand)     |
| VC     | No overflow                                       | Not unordered                            |
| HI     | Unsigned higher                                   | Greater than, or unordered               |
| LS     | Unsigned lower or same                            | Less than or equal                       |
| GE     | Signed greater than or equal                      | Greater than or equal                    |
| LT     | Signed less than                                  | Less than, or unordered                  |
| GT     | Signed greater than                               | Greater than                             |
| LE     | Signed less than or equal                         | Less than or equal, or unordered         |
| AL     | Always (normally omitted)                         | Always (normally omitted)                |



The type of the instruction that last updated the condition flags determines the meaning of the condition codes.

Note

### Related information

[Condition code suffixes and related flags](#) on page 139

[Conditional instructions](#) on page 132

[Updates to the condition flags in A32/T32 code](#) on page 135

[Updates to the condition flags in A64 code](#) on page 136

[VCMP, VCMPE \(A32\)](#) on page 629

[VMRS \(A32\)](#) on page 567

[VMRS \(floating-point\) \(A32\)](#) on page 650

[A-Profile Architectures](#)

## 8.14 Benefits of using conditional execution in A32 and T32 code

It can be more efficient to use conditional instructions rather than conditional branches.

You can use conditional execution of A32 instructions to reduce the number of branch instructions in your code, and improve code density. The `IT` instruction in T32 achieves a similar improvement.

Branch instructions are also expensive in processor cycles. On Arm® processors without branch prediction hardware, it typically takes three processor cycles to refill the processor pipeline each time a branch is taken.

Some Arm processors have branch prediction hardware. In systems using these processors, the pipeline only has to be flushed and refilled when there is a misprediction.

### Related information

[Example showing the benefits of conditional instructions in A32 and T32 code](#) on page 142

## 8.15 Example showing the benefits of conditional instructions in A32 and T32 code

Using conditional instructions rather than conditional branches can save both code size and cycles.

This example shows the difference between using branches and using conditional instructions. It uses the Euclid algorithm for the Greatest Common Divisor (gcd) to show how conditional instructions improve code size and speed.

In C the gcd algorithm can be expressed as:

```
int gcd(int a, int b)
{
    while (a != b)
    {
        if (a > b)
            a = a - b;
        else
            b = b - a;
    }
    return a;
```

{}

The following examples show implementations of the gcd algorithm with and without conditional instructions.

### Example of conditional execution using branches in A32 code

This example is an A32 code implementation of the gcd algorithm. It achieves conditional execution by using conditional branches, rather than individual conditional instructions:

```

gcd      CMP    r0, r1
        BEQ    end
        BLT    less
        SUBS   r0, r0, r1 ; could be SUB r0, r0, r1 for A32
        B      gcd
less     SUBS   r1, r1, r0 ; could be SUB r1, r1, r0 for A32
        B      gcd
end

```

The code is seven instructions long because of the number of branches. Every time a branch is taken, the processor must refill the pipeline and continue from the new location. The other instructions and non-executed branches use a single cycle each.

The following table shows the number of cycles this implementation uses on an Arm7™ processor when R0 equals 1 and R1 equals 2.

**Table 8-4: Conditional branches only**

| R0: a | R1: b | Instruction    | Cycles ( Arm7)   |
|-------|-------|----------------|------------------|
| 1     | 2     | CMP r0, r1     | 1                |
| 1     | 2     | BEQ end        | 1 (not executed) |
| 1     | 2     | BLT less       | 3                |
| 1     | 2     | SUB r1, r1, r0 | 1                |
| 1     | 2     | B gcd          | 3                |
| 1     | 1     | CMP r0, r1     | 1                |
| 1     | 1     | BEQ end        | 3                |
| -     | -     | -              | Total = 13       |

### Example of conditional execution using conditional instructions in A32 code

This example is an A32 code implementation of the gcd algorithm using individual conditional instructions in A32 code. The gcd algorithm only takes four instructions:

```

gcd      CMP    r0, r1
        SUBGT  r0, r0, r1
        SUBLE  r1, r1, r0
        BNE    gcd

```

In addition to improving code size, in most cases this code executes faster than the version that uses only branches.

The following table shows the number of cycles this implementation uses on an Arm7 processor when R0 equals 1 and R1 equals 2.

**Table 8-5: All instructions conditional**

| R0: a | R1: b | Instruction      | Cycles (Arm7)    |
|-------|-------|------------------|------------------|
| 1     | 2     | CMP r0, r1       | 1                |
| 1     | 2     | SUBGT r0, r0, r1 | 1 (not executed) |
| 1     | 1     | SUBLT r1, r1, r0 | 1                |
| 1     | 1     | BNE gcd          | 3                |
| 1     | 1     | CMP r0, r1       | 1                |
| 1     | 1     | SUBGT r0, r0, r1 | 1 (not executed) |
| 1     | 1     | SUBLT r1, r1, r0 | 1 (not executed) |
| 1     | 1     | BNE gcd          | 1 (not executed) |
| -     | -     | -                | Total = 10       |

Comparing this with the example that uses only branches:

- Replacing branches with conditional execution of all instructions saves three cycles.
- Where R0 equals R1, both implementations execute in the same number of cycles. For all other cases, the implementation that uses conditional instructions executes in fewer cycles than the implementation that uses branches only.

### Example of conditional execution using conditional instructions in T32 code

You can use the `IT` instruction to write conditional instructions in T32 code. The T32 code implementation of the gcd algorithm using conditional instructions is similar to the implementation in A32 code. The implementation in T32 code is:

```
gcd
    CMP    r0, r1
    ITE    GT
    SUBGT r0, r0, r1
    SUBLE r1, r1, r0
    BNE    gcd
```

These instructions assemble equally well to A32 or T32 code. The assembler checks the `IT` instructions, but omits them on assembly to A32 code.

It requires one more instruction in T32 code (the `IT` instruction) than in A32 code, but the overall code size is 10 bytes in T32 code, compared with 16 bytes in A32 code.

### Example of conditional execution code using branches in T32 code

In architectures before Armv6 T2, there is no `IT` instruction and therefore T32 instructions cannot be executed conditionally except for the `B` branch instruction. The gcd algorithm must be written with conditional branches and is similar to the A32 code implementation using branches, without conditional instructions.

The T32 code implementation of the gcd algorithm without conditional instructions requires seven instructions. The overall code size is 14 bytes. This figure is even less than the A32 implementation that uses conditional instructions, which uses 16 bytes.

In addition, on a system using 16-bit memory this T32 implementation runs faster than both A32 implementations because only one memory access is required for each 16-bit T32 instruction, whereas each 32-bit A32 instruction requires two fetches.

### Related information

[Benefits of using conditional execution in A32 and T32 code](#) on page 142

[IT \(A32\)](#) on page 300

[Condition code suffixes and related flags](#) on page 139

[Optimization for execution speed](#) on page 145

[A-Profile Architectures](#)

## 8.16 Optimization for execution speed

To optimize code for execution speed you must have detailed knowledge of the instruction timings, branch prediction logic, and cache behavior of your target system.

For more information, see the Technical Reference Manual for your processor.

### Related information

[A-Profile Architectures](#)

[Further reading](#)

# 9. Using armasm

Describes how to use armasm.

## 9.1 armasm command-line syntax

You can use a command line to invoke armasm. You must specify an input source file and you can specify various options.

The command for invoking the assembler is:

```
armasm {options} inputfile
```

where:

**options**

are commands that instruct the assembler how to assemble the *inputfile*. You can invoke armasm with any combination of options separated by spaces. You can specify values for some options. To specify a value for an option, use either '=' (*option=value*) or a space character (*option value*).

***inputfile***

is an assembly source file. It must contain UAL, pre-UAL A32 or T32, or A64 assembly language.

The assembler command line is case-insensitive, except in filenames and where specified. The assembler uses the same command-line ordering rules as the compiler. This means that if the command line contains options that conflict with each other, then the last option found always takes precedence.

## 9.2 Specify command-line options with an environment variable

The `ARMCOMPILER6_ASMOPT` environment variable can hold command-line options for the assembler.

The syntax is identical to the command-line syntax. The assembler reads the value of `ARMCOMPILER6_ASMOPT` and inserts it at the front of the command string. This means that options specified in `ARMCOMPILER6_ASMOPT` can be overridden by arguments on the command line.

### Related information

[armasm command-line syntax](#) on page 146

[Toolchain environment variables](#)

## 9.3 Using `stdin` to input source code to the assembler

You can use `stdin` to pipe output from another program into `armasm` or to input source code directly on the command line. This is useful if you want to test a short piece of code without having to create a file for it.

### About this task

To use `stdin` to pipe output from another program into `armasm`, invoke the program and the assembler using the pipe character (`|`). Use the minus character (`-`) as the source filename to instruct the assembler to take input from `stdin`. You must specify the output filename using the `-o` option. You can specify the command-line options you want to use. For example to pipe output from `fromelf`:

```
fromelf --disassemble A32input.o | armasm --cpu=8-A.32 -o A32output.o -
```



The source code from `stdin` is stored in an internal cache that can hold up to 8 MB. You can increase this cache size using the `--maxcache` command-line option.

To use `stdin` to input source code directly on the command line:

### Procedure

1. Invoke the assembler with the command-line options you want to use. Use the minus character (`-`) as the source filename to instruct the assembler to take input from `stdin`. You must specify the output filename using the `-o` option. For example:

```
armasm --cpu=8-A.32 -o output.o -
```

2. Enter your input. For example:

```
AREA      A32ex, CODE, READONLY
                    ; Name this block of code A32ex
ENTRY      .          ; Mark first instruction to execute
start
    MOV      r0, #10      ; Set up parameters
    MOV      r1, #3
    ADD      r0, r0, r1    ; r0 = r0 + r1
stop
    MOV      r0, #0x18      ; angel_SWIreason_ReportException
    LDR      r1, =0x20026    ; ADP_Stopped_ApplicationExit
    SVC      #0x123456      ; AArch32 semihosting (formerly SWI)
    END          ; Mark end of file
```

3. Terminate your input by entering:

- `ctrl+z` then `Return` on Microsoft Windows systems.
- `ctrl+d` on Unix-based operating systems.

### Related information

[armasm command-line syntax](#) on page 146

[--maxcache=n](#) on page 209

## 9.4 Built-in variables and constants

`armasm` defines built-in variables that hold information about, for example, the state of `armasm`, the command-line options used, and the target architecture or processor.

The following table lists the built-in variables defined by `armasm`:

**Table 9-1: Built-in variables**

| {ARCHITECTURE}   | Holds the name of the selected Arm architecture.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| {ARCHITECTURE}   | Holds the name of the selected Arm® architecture.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| {AREANAME}       | Holds the name of the current AREA.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| {ARMASM_VERSION} | <p>Holds an integer that increases with each version of <code>armasm</code>. The format of the version number is <i>Mmmuuuxx</i> where:</p> <ul style="list-style-type: none"> <li>• <i>M</i> is the major version number, 6.</li> <li>• <i>mm</i> is the minor version number.</li> <li>• <i>uu</i> is the update number.</li> <li>• <i>xx</i> is reserved for Arm internal use. You can ignore this for the purposes of checking whether the current release is a specific version or within a range of versions.</li> </ul> <p><b>Note:</b><br/>The built-in variable <code> ads\$version </code> is deprecated.</p> |
| ads\$version     | Has the same value as {ARMASM_VERSION}.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| {CODESIZE}       | Is a synonym for {CONFIG}.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| {COMMANDLINE}    | Holds the contents of the command line.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| {CONFIG}         | Has the value: <ul style="list-style-type: none"> <li>• 64 if the assembler is assembling A64 code.</li> <li>• 32 if the assembler is assembling A32 code.</li> <li>• 16 if the assembler is assembling T32 code.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                            |
| {CPU}            | Holds the name of the selected processor. The value of {CPU} is derived from the value specified in the <code>--cpu</code> option on the command line.                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| {ENDIAN}         | Has the value "big" if the assembler is in big-endian mode, or "little" if it is in little-endian mode.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| {FPU}            | Holds the name of the selected FPU. The default in AArch32 state is "FP-ARMv8". The default in AArch64 state is "A64".                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| {INPUTFILE}      | Holds the name of the current source file.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| {INTER}          | Has the Boolean value True if <code>--apcs=/inter</code> is set. The default is {False}.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| {LINENUM}        | Holds an integer indicating the line number in the current source file.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| {LINENUMUP}      | When used in a macro, holds an integer indicating the line number of the current macro. The value is the same as {LINENUM} when used in a non-macro context.                                                                                                                                                                                                                                                                                                                                                                                                                                                            |

| {ARCHITECTURE}  | Holds the name of the selected Arm architecture.                                                                                                                                          |
|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| {LINENUMUPPER}  | When used in a macro, holds an integer indicating the line number of the top macro. The value is the same as {LINENUM} when used in a non-macro context.                                  |
| {OPT}           | Value of the currently-set listing option. You can use the OPT directive to save the current listing option, force a change in it, or restore its original value.                         |
| {PC} or .       | Address of current instruction.                                                                                                                                                           |
| {PCSTOREOFFSET} | Is the offset between the address of the STR PC, [...] or STM Rb, {..., PC} instruction and the value of PC stored out. This varies depending on the processor or architecture specified. |
| {ROPI}          | Has the Boolean value {True} if --apcs=/ropi is set. The default is {False}.                                                                                                              |
| {RWPI}          | Has the Boolean value {True} if --apcs=/rwpi is set. The default is {False}.                                                                                                              |
| {VAR} or @      | Current value of the storage area location counter.                                                                                                                                       |

You can use built-in variables in expressions or conditions in assembly source code. For example:

```
IF {ARCHITECTURE} = "8-A"
```

They cannot be set using the SETA, SETL, or SETS directives.

The names of the built-in variables can be in uppercase, lowercase, or mixed, for example:

```
IF {CpU} = "Generic ARM"
```



All built-in string variables contain case-sensitive values. Relational operations on these built-in variables do not match with strings that contain an incorrect case. Use the command-line options --cpu and --fpu to determine valid values for {CPU}, {ARCHITECTURE}, and {FPU}.

The assembler defines the built-in Boolean constants TRUE and FALSE.

**Table 9-2: Built-in Boolean constants**

| {FALSE} | Logical constant false. |
|---------|-------------------------|
| {FALSE} | Logical constant false. |
| {TRUE}  | Logical constant true.  |

The following table lists the target processor-related built-in variables that are predefined by the assembler. Where the value field is empty, the symbol is a Boolean value and the meaning column describes when its value is {TRUE}.

**Table 9-3: Predefined macros**

| Name                                      | Value   | Meaning                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|-------------------------------------------|---------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| {TARGET_ARCH_AARCH32}                     | boolean | {TRUE} when assembling for AArch32 state. {FALSE} when assembling for AArch64 state.                                                                                                                                                                                                                                                                                                                                                                                                                                |
| {TARGET_ARCH_AARCH64}                     | boolean | {TRUE} when assembling for AArch64 state. {FALSE} when assembling for AArch32 state.                                                                                                                                                                                                                                                                                                                                                                                                                                |
| {TARGET_ARCH_ARM}                         | num     | The number of the A32 base architecture of the target processor irrespective of whether the assembler is assembling for A32 or T32. The value is defined as zero when assembling for A64, and eight when assembling for A32/T32.                                                                                                                                                                                                                                                                                    |
| {TARGET_ARCH_THUMB}                       | num     | The number of the T32 base architecture of the target processor irrespective of whether the assembler is assembling for A32 or T32. The value is defined as zero when assembling for A64, and five when assembling for A32/T32.                                                                                                                                                                                                                                                                                     |
| TARGET_ARCH_{XX}                          | -       | XX represents the target architecture and its value depends on the target processor:<br><br>For the Armv8 architecture: <ul style="list-style-type: none"> <li>• If you specify the assembler option --cpu=8-A.32 or --cpu=8-A.64 then {TARGET_ARCH_8_A} is defined.</li> <li>• If you specify the assembler option --cpu=8.1-A.32 or --cpu=8.1-A.64 then {TARGET_ARCH_8_1_A} is defined.</li> </ul><br>For the Armv7 architecture, if you specify --cpu=Cortex-A8, for example, then {TARGET_ARCH_7_A} is defined. |
| {TARGET_FEATURE_EXTENSION_REGISTER_COUNT} | num     | The number of 64-bit extension registers available in Advanced SIMD or floating-point.                                                                                                                                                                                                                                                                                                                                                                                                                              |
| {TARGET_FEATURE_CLZ}                      | -       | If the target processor supports the CLZ instruction.                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| {TARGET_FEATURE_CRYPTOGRAPHY}             | -       | If the target processor has cryptographic instructions.                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| {TARGET_FEATURE_DIVIDE}                   | -       | If the target processor supports the hardware divide instructions SDIV and UDIV.                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| {TARGET_FEATURE_DOUBLEWORD}               | -       | If the target processor supports doubleword load and store instructions, for example the A32 and T32 instructions LDRD and STRD (except the Armv6-M architecture).                                                                                                                                                                                                                                                                                                                                                  |
| {TARGET_FEATURE_DSPMUL}                   | -       | If the DSP-enhanced multiplier (for example the SMLAxy instruction) is available.                                                                                                                                                                                                                                                                                                                                                                                                                                   |

| Name                             | Value | Meaning                                                                                                                                                                                           |
|----------------------------------|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| {TARGET_FEATURE_MULTIPLY}        | -     | If the target processor supports long multiply instructions, for example the A32 and T32 instructions SMLL, SMLAL, UMULL, and UMLAL (that is, all architectures except the Armv6-M architecture). |
| {TARGET_FEATURE_MULTIPROCESSING} | -     | If assembling for a target processor with Multiprocessing Extensions.                                                                                                                             |
| {TARGET_FEATURE_NEON}            | -     | If the target processor has Advanced SIMD.                                                                                                                                                        |
| {TARGET_FEATURE_NEON_FP16}       | -     | If the target processor has Advanced SIMD with half-precision floating-point operations.                                                                                                          |
| {TARGET_FEATURE_NEON_FP32}       | -     | If the target processor has Advanced SIMD with single-precision floating-point operations.                                                                                                        |
| {TARGET_FEATURE_NEON_INTEGER}    | -     | If the target processor has Advanced SIMD with integer operations.                                                                                                                                |
| {TARGET_FEATURE_UNALIGNED}       | -     | If the target processor has support for unaligned accesses (all architectures except the Armv6-M architecture).                                                                                   |
| {TARGET_FPU_SOFTVFP}             | -     | If assembling with the option --fpu=SoftVFP.                                                                                                                                                      |
| {TARGET_FPU_SOFTVFP_VFP}         | -     | If assembling for a target processor with SoftVFP and floating-point hardware, for example --fpu=SoftVFP+FP-ARMv8.                                                                                |
| {TARGET_FPU_VFP}                 | -     | If assembling for a target processor with floating-point hardware, without using SoftVFP, for example --fpu=FP-ARMv8.                                                                             |
| {TARGET_FPU_VFPV2}               | -     | If assembling for a target processor with VFPv2.                                                                                                                                                  |
| {TARGET_FPU_VFPV3}               | -     | If assembling for a target processor with VFPv3.                                                                                                                                                  |
| {TARGET_FPU_VFPV4}               | -     | If assembling for a target processor with VFPv4.                                                                                                                                                  |
| {TARGET_PROFILE_A}               | -     | If assembling for a Cortex®-A profile processor, for example, if you specify the assembler option --cpu=7-A.                                                                                      |
| {TARGET_PROFILE_M}               | -     | If assembling for a Cortex-M profile processor, for example, if you specify the assembler option --cpu=7-M.                                                                                       |
| {TARGET_PROFILE_R}               | -     | If assembling for a Cortex-R profile processor, for example, if you specify the assembler option --cpu=7-R.                                                                                       |

## Related information

[--cpu=name](#) on page 193

[--fpu=name](#) on page 205

[Identifying versions of armasm in source code](#) on page 151

## 9.5 Identifying versions of armasm in source code

The assembler defines the built-in variable `ARMASM_VERSION` to hold the version number of the assembler.

You can use it as follows:

```
IF ( {ARMASM_VERSION} / 100000) >= 6
    ; using armasm in Arm Compiler 6
ELIF ( {ARMASM_VERSION} / 1000000) = 5
    ; using armasm in Arm Compiler 5
ELSE
    ; using armasm in Arm Compiler 4.1 or earlier
ENDIF
```



The built-in variable `|ads$version|` is deprecated.

**Note**

---

### Related information

[Built-in variables and constants](#) on page 148

## 9.6 Diagnostic messages

The assembler can provide extra error, warning, and remark diagnostic messages in addition to the default ones.

By default, these additional diagnostic messages are not displayed. However, you can enable them using the command-line options `--diag_error`, `--diag_warning`, and `--diag_remark`.

### Related information

[--diag\\_error=tag\[,tag,...\]](#) on page 197

[Interlocks diagnostics](#) on page 152

[Automatic IT block generation in T32 code](#) on page 153

[T32 branch target alignment](#) on page 154

[T32 code size diagnostics](#) on page 154

[A32 and T32 instruction portability diagnostics](#) on page 154

[T32 instruction width diagnostics](#) on page 155

[Two pass assembler diagnostics](#) on page 155

## 9.7 Interlocks diagnostics

`armasm` can report warning messages about possible interlocks in your code caused by the pipeline of the processor chosen by the `--cpu` option.

To do this, use the `--diag_warning 1563` command-line option when invoking `armasm`.



Note

- `armasm` does not have an accurate model of the target processor, so these messages are not reliable when used with a multi-issue processor such as Cortex®-A8.
- Interlocks diagnostics apply to A32 and T32 code, but not to A64 code.

### Related information

[--diag\\_warning=tag\[,tag,...\]](#) on page 200

[Automatic IT block generation in T32 code](#) on page 153

[T32 branch target alignment](#) on page 154

[T32 instruction width diagnostics](#) on page 155

[Diagnostic messages](#) on page 152

## 9.8 Automatic IT block generation in T32 code

`armasm` can automatically insert an IT block for conditional instructions in T32 code, without requiring the use of explicit `IT` instructions.

If you write the following code:

```
AREA x, CODE
THUMB
MOVNE r0,r1
NOP
IT NE
MOVNE r0,r1
END
```

`armasm` generates the following instructions:

```
IT NE
MOVNE r0,r1
NOP
IT NE
MOVNE r0,r1
```

You can receive warning messages about the automatic generation of `IT` blocks when assembling T32 code. To do this, use the `armasm --diag_warning 1763` command-line option when invoking `armasm`.

## Related information

[Diagnostic messages](#) on page 152  
[--diag\\_warning=tag\[,tag,...\]](#) on page 200

## 9.9 T32 branch target alignment

`armasm` can issue warnings about non word-aligned branch targets in T32 code.

On some processors, non word-aligned T32 instructions sometimes take one or more additional cycles to execute in loops. This means that it can be an advantage to ensure that branch targets are word-aligned. To ensure `armasm` reports such warnings, use the `--diag_warning 1604` command-line option when invoking it.

## Related information

[Diagnostic messages](#) on page 152  
[--diag\\_warning=tag\[,tag,...\]](#) on page 200

## 9.10 T32 code size diagnostics

In T32 code, some instructions, for example a branch or `LDR` (PC-relative), can be encoded as either a 32-bit or 16-bit instruction. `armasm` chooses the size of the instruction encoding.

`armasm` can issue a warning when it assembles a T32 instruction to a 32-bit encoding when it could have used a 16-bit encoding.

To enable this warning, use the `--diag_warning 1813` command-line option when invoking `armasm`.

## Related information

[Instruction width selection in T32 code](#) on page 158  
[A32 and T32 instruction sets](#) on page 71  
[--diag\\_warning=tag\[,tag,...\]](#) on page 200  
[Diagnostic messages](#) on page 152

## 9.11 A32 and T32 instruction portability diagnostics

`armasm` can issue warnings about instructions that cannot assemble to both A32 and T32 code.

There are a few UAL instructions that can assemble as either A32 code or T32 code, but not both. You can identify these instructions in the source code using the `--diag_warning 1812` command-line option when invoking `armasm`.

It warns for any instruction that cannot be assembled in the other instruction set. This is only a hint, and other factors, like relocation availability or target distance might affect the accuracy of the message.

### Related information

- [--diag\\_warning=tag\[,tag,...\]](#) on page 200
- [A32 and T32 instruction sets](#) on page 71
- [Diagnostic messages](#) on page 152

## 9.12 T32 instruction width diagnostics

armasm can issue a warning when it assembles a T32 instruction to a 32-bit encoding when it could have used a 16-bit encoding.

If you use the `.w` specifier, the instruction is encoded in 32 bits even if it could be encoded in 16 bits. You can use a diagnostic warning to detect when a branch instruction could have been encoded in 16 bits, but has been encoded in 32 bits. To do this, use the `--diag_warning 1607` command-line option when invoking armasm.



**Note** This diagnostic does not produce a warning for relocated branch instructions, because the final address is not known. The linker might even insert a veneer, if the branch is out of range for a 32-bit instruction.

### Related information

- [Diagnostic messages](#) on page 152
- [--diag\\_warning=tag\[,tag,...\]](#) on page 200

## 9.13 Two pass assembler diagnostics

armasm can issue a warning about code that might not be identical in both assembler passes.

armasm is a two pass assembler and the input code that the assembler reads must be identical in both passes. If a symbol is defined after the `:DEF:` test for that symbol, then the code read in pass one might be different from the code read in pass two. armasm can warn in this situation.

To do this, use the `--diag_warning 1907` command-line option when invoking armasm .

### Example

The following example shows that the symbol `foo` is defined after the `:DEF: foo` test.

```
AREA x, CODE
[ :DEF: foo
]
```

```
foo MOV r3, r4
    END
```

Assembling this code with `--diag_warning 1907` generates the message:

```
Warning A1907W: Test for this symbol has been seen and may cause failure in the
second pass.
```

## Related information

[--diag\\_warning=tag\[,tag,...\]](#) on page 200

[Automatic IT block generation in T32 code](#) on page 153

[T32 branch target alignment](#) on page 154

[T32 instruction width diagnostics](#) on page 155

[Diagnostic messages](#) on page 152

[How the assembler works](#) on page 62

[Directives that can be omitted in pass 2 of the assembler](#) on page 63

## 9.14 Using the C preprocessor

`armasm` can invoke `armclang` to preprocess an assembly language source file before assembling it. This allows you to use C preprocessor commands in assembly source code.

If you do this, you must use the `--cpreproc` command-line option together with the `--cpreproc_opts` command-line option when invoking the assembler. This causes `armasm` to call `armclang` to preprocess the file before assembling it.



As a minimum, you must specify the `armclang` option `--target` and either the `-mcpu` or `-march` option with `--cpreproc_opts`.

`armasm` looks for the `armclang` binary in the same directory as the `armasm` binary. If it does not find the binary, it expects it to be on the PATH.

`armasm` passes the following options by default to `armclang` if present on the command line:

- Basic pre-processor configuration options, such as `-E`.
- User specified include directories, `-I` directives.
- User specified licensing options, such as `--site_license`.
- Anything specified in `--cpreproc_opts`.

Some of the options that `armasm` passes to `armclang` are converted to the `armclang` equivalent beforehand. These are shown in the following table:

**Table 9-4: armclang equivalent command-line options**

| armasm  | armclang |
|---------|----------|
| --thumb | -mthumb  |
| --arm   | -marm    |
| -i      | -I       |

armasm correctly interprets the preprocessed `#line` commands. It can generate error messages and `debug_line` tables using the information in the `#line` commands.

### Preprocessing an assembly language source file

The following example shows the command you write to preprocess and assemble a file, `source.S`. The example also passes the compiler options to define a macro called `RELEASE`, and to undefine a macro called `ALPHA`.

```
armasm --cpu=cortex-m3 --cpreproc --cpreproc_opts=--target=arm-arm-none-eabi,-
mcpu=cortex-a9,-D,RELEASE,-U,ALPHA source.S
```

### Preprocessing an assembly language source file manually

Alternatively, you must manually call `armclang` to preprocess the file before calling `armasm`. The following example shows the commands you write to manually preprocess and assemble a file, `source.S`:

```
armclang --target=arm-arm-none-eabi -mcpu=cortex-m3 -E source.S > preprocessed.S
armasm --cpu=cortex-m3 preprocessed.S
```

In this example, the preprocessor outputs a file called `preprocessed.S`, and `armasm` assembles it.

### Related information

[--cpreproc](#) on page 191

[--cpreproc\\_opts=option\[,option,...\]](#) on page 191

[Specifying a target architecture, processor, and instruction set](#)

[-march armclang option](#)

[-mcpu armclang option](#)

[-target armclang option](#)

## 9.15 Address alignment in A32/T32 code

In Arm®V7-A and Armv7-R, the A bit in the System Control Register (SCTRL) controls whether alignment checking is enabled or disabled. In Armv7-M, the UNALIGN\_TRP bit, bit 3, in the Configuration and Control Register (CCR) controls this.

If alignment checking is enabled, all unaligned word and halfword transfers cause an alignment exception. If disabled, unaligned accesses are permitted for the `LDR`, `LDRH`, `STR`, `STRH`, `LDRSH`, `LDRT`,

`STRT`, `LDRSHT`, `LDRHT`, `STRHT`, and `TBH` instructions. Other data-accessing instructions always cause an alignment except for unaligned data.

For `STRD` and `LDRD`, the specified address must be word-aligned.

If all your data accesses are aligned, you can use the `--no_unaligned_access` command-line option to declare that the output object was not permitted to make unaligned access. The linker can then avoid linking in any library functions that support unaligned access if all input objects declare that they were not permitted to use unaligned accesses.

### Related information

[--unaligned\\_access, --no\\_unaligned\\_access](#) on page 214

## 9.16 Address alignment in A64 code

If alignment checking is not enabled, then unaligned accesses are permitted for all load and store instructions other than exclusive load, exclusive store, load acquire, and store release instructions. If alignment checking is enabled, then unaligned accesses are not permitted.

This means all load and store instructions must use addresses that are aligned to the size of the data being accessed. In other words, addresses for 8-byte transfers must be 8-byte aligned, addresses for 4-byte transfers are 4-byte word aligned, and addresses for 2-byte transfers are 2-byte aligned. Unaligned accesses cause an alignment exception.

For any memory access, if the stack pointer is used as the base register, then it must be quadword aligned. Otherwise it generates a stack alignment exception.

If all your data accesses are aligned, you can use the `--no_unaligned_access` command-line option to declare that the output object was not permitted to make unaligned access. The linker can then avoid linking in any library functions that support unaligned access if all input objects declare that they were not permitted to use unaligned accesses.

## 9.17 Instruction width selection in T32 code

Some T32 instructions can have either a 16-bit encoding or a 32-bit encoding.

If you do not specify the instruction size, by default:

- For forward reference `LDR`, `ADR`, and `B` instructions, `armasm` always generates a 16-bit instruction, even if that results in failure for a target that could be reached using a 32-bit instruction.
- For external reference `LDR` and `B` instructions, `armasm` always generates a 32-bit instruction.
- In all other cases, `armasm` generates the smallest size encoding that can be output.

If you want to override this behavior, you can use the `.w` or `.n` width specifier to ensure a particular instruction size. `armasm` faults if it cannot generate an instruction with the specified width.

The `.w` specifier is ignored when assembling to A32 code, so you can safely use this specifier in code that might assemble to either A32 or T32 code. However, the `.N` specifier is faulted when assembling to A32 code.

## Related information

[Instruction width specifiers](#) on page 244

[T32 code size diagnostics](#) on page 154

# 10. Advanced SIMD Programming

Describes Advanced SIMD assembly language programming.

## 10.1 Architecture support for Advanced SIMD

Advanced SIMD is an optional extension to the Armv8 and Armv7 architectures.

All Advanced SIMD instructions are available on systems that support Advanced SIMD. In A32, some of these instructions are also available on systems that implement the floating-point extension without Advanced SIMD. These are called shared instructions.

In AArch32 state, the Advanced SIMD register bank consists of thirty-two 64-bit registers, and smaller registers are packed into larger ones, as in Arm®v7.

In AArch64 state, the Advanced SIMD register bank includes thirty-two 128-bit registers and has a new register packing model.



Advanced SIMD and floating-point instructions share the same extension register bank.

Note

Advanced SIMD instructions in A64 are closely based on VFPv4 and A32, but with new instruction mnemonics and some functional enhancements.

### Related information

[Floating-point support](#)

[Further reading](#)

## 10.2 Extension register bank mapping for Advanced SIMD in AArch32 state

The Advanced SIMD extension register bank is a collection of registers that can be accessed as either 64-bit or 128-bit registers.

Advanced SIMD and floating-point instructions use the same extension register bank, and is distinct from the Arm® core register bank.

The following figure shows the views of the extension register bank, and the overlap between the different size registers. For example, the 128-bit register `q0` is an alias for two consecutive 64-bit registers `D0` and `D1`. The 128-bit register `q8` is an alias for 2 consecutive 64-bit registers `D16` and `D17`.

**Figure 10-1: Extension register bank for Advanced SIMD in AArch32 state**

If your processor supports both Advanced SIMD and floating-point, all the Advanced SIMD registers overlap with the floating-point registers.

**Note**

The aliased views enable half-precision, single-precision, and double-precision values, and Advanced SIMD vectors to coexist in different non-overlapped registers at the same time.

You can also use the same overlapped registers to store half-precision, single-precision, and double-precision values, and Advanced SIMD vectors at different times.

Do not attempt to use overlapped 64-bit and 128-bit registers at the same time because it creates meaningless results.

The mapping between the registers is as follows:

- $D<2n>$  maps to the least significant half of  $Q<n>$
- $D<2n+1>$  maps to the most significant half of  $Q<n>$ .

For example, you can access the least significant half of the elements of a vector in  $Q6$  by referring to  $D12$ , and the most significant half of the elements by referring to  $D13$ .

### Related information

[Extension register bank mapping for Advanced SIMD in AArch64 state](#) on page 162

[Views of the floating-point extension register bank in AArch32 state](#) on page 179

[Views of the Advanced SIMD register bank in AArch32 state](#) on page 164

## 10.3 Extension register bank mapping for Advanced SIMD in AArch64 state

The extension register bank is a collection of registers that can be accessed as 8-bit, 16-bit, 32-bit, 64-bit, or 128-bit.

Advanced SIMD and floating-point instructions use the same extension register bank, and is distinct from the Arm® core register bank.

The following figure shows the views of the extension register bank, and the overlap between the different size registers.

**Figure 10-2: Extension register bank for Advanced SIMD in AArch64 state**

The mapping between the registers is as follows:

- $D<n>$  maps to the least significant half of  $v<n>$
- $s<n>$  maps to the least significant half of  $D<n>$
- $h<n>$  maps to the least significant half of  $s<n>$
- $b<n>$  maps to the least significant half of  $h<n>$ .

For example, you can access the least significant half of the elements of a vector in  $v7$  by referring to  $D7$ .

Registers  $q0-q31$  map directly to registers  $v0-v31$ .

## Related information

[Extension register bank mapping for Advanced SIMD in AArch32 state](#) on page 160

[Views of the floating-point extension register bank in AArch32 state](#) on page 179

[Views of the Advanced SIMD register bank in AArch32 state](#) on page 164

## 10.4 Views of the Advanced SIMD register bank in AArch32 state

Advanced SIMD can have different views of the extension register bank in AArch32 state.

It can view the extension register bank as:

- Sixteen 128-bit registers,  $Q_0-Q_{15}$ .
- Thirty-two 64-bit registers,  $D_0-D_{31}$ .
- A combination of registers from these views.

Advanced SIMD views each register as containing a vector of 1, 2, 4, 8, or 16 elements, all of the same size and type. Individual elements can also be accessed as scalars.

In Advanced SIMD, the 64-bit registers are called doubleword registers and the 128-bit registers are called quadword registers.

### Related information

[Views of the Advanced SIMD register bank in AArch64 state](#) on page 164

[Extension register bank mapping for Advanced SIMD in AArch32 state](#) on page 160

[Views of the floating-point extension register bank in AArch32 state](#) on page 179

## 10.5 Views of the Advanced SIMD register bank in AArch64 state

Advanced SIMD can have different views of the extension register bank in AArch64 state.

It can view the extension register bank as:

- Thirty-two 128-bit registers  $V_0-V_{31}$ .
- Thirty-two 64-bit registers  $D_0-D_{31}$ .
- Thirty-two 32-bit registers  $S_0-S_{31}$ .
- Thirty-two 16-bit registers  $H_0-H_{31}$ .
- Thirty-two 8-bit registers  $B_0-B_{31}$ .
- A combination of registers from these views.

### Related information

[Views of the Advanced SIMD register bank in AArch32 state](#) on page 164

[Extension register bank mapping for Advanced SIMD in AArch32 state](#) on page 160

[Views of the floating-point extension register bank in AArch32 state](#) on page 179

## 10.6 Differences between A32/T32 and A64 Advanced SIMD instruction syntax

The syntax and mnemonics of A64 Advanced SIMD instructions are based on those in A32/T32 but with some differences.

The following table describes the main differences.

**Table 10-1: Differences in syntax and mnemonics between A32/T32 and A64 Advanced SIMD instructions**

| A32/T32                                                                                                                                                                                                                                           | A64                                                                                                                                                                                                                                                                                                                        |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| All Advanced SIMD instruction mnemonics begin with v, for example <code>VMAX</code> .                                                                                                                                                             | The first letter of the instruction mnemonic indicates the data type of the instruction. For example, <code>SMAX</code> , <code>UMAX</code> , and <code>FMAX</code> mean signed, unsigned, and floating-point respectively. No suffix means the type is irrelevant and <code>P</code> means polynomial.                    |
| A mnemonic qualifier specifies the type and width of elements in a vector. For example, in the following instruction, <code>U32</code> means 32-bit unsigned integers:                                                                            | A register qualifier specifies the data width and the number of elements in the register. For example, in the following instruction <code>.4S</code> means 4 32-bit elements:                                                                                                                                              |
| <code>VMAX.U32 Q0, Q1, Q2</code>                                                                                                                                                                                                                  | <code>UMAX V0.4S, V1.4S, V2.4S</code>                                                                                                                                                                                                                                                                                      |
| The 128-bit vector registers are named Q0-Q15 and the 64-bit vector registers are named D0-D31.                                                                                                                                                   | All vector registers are named <code>Vn</code> , where n is a register number between 0 and 31. You only use one of the qualified register names <code>Qn</code> , <code>Dn</code> , <code>Sn</code> , <code>Hn</code> or <code>Bn</code> when referring to a scalar register, to indicate the number of significant bits. |
| You load a single element into one or more vector registers by appending an index to each register individually, for example:                                                                                                                     | You load a single element into one or more vector registers by appending the index to the register list, for example:                                                                                                                                                                                                      |
| <code>VLD4.8 {D0[3], D1[3], D2[3], D3[3]}, [R0]</code>                                                                                                                                                                                            | <code>LD4 {V0.B, V1.B, V2.B, V3.B}[3], [X0]</code>                                                                                                                                                                                                                                                                         |
| You can append a condition code to most Advanced SIMD instruction mnemonics to make them conditional.                                                                                                                                             | A64 has no conditionally executed floating-point or Advanced SIMD instructions.                                                                                                                                                                                                                                            |
| <code>L</code> , <code>W</code> and <code>N</code> suffixes indicate long, wide, and narrow variants of Advanced SIMD data processing instructions. A32/T32 Advanced SIMD does not include vector narrowing or widening second part instructions. | <code>L</code> , <code>W</code> and <code>N</code> suffixes indicate long, wide, and narrow variants of Advanced SIMD data processing instructions. You can additionally append a <code>2</code> to implement the second part of a narrowing or widening operation, for example:                                           |
|                                                                                                                                                                                                                                                   | <code>UADDL2 V0.4S, V1.8H, V2.8H ; take input from 4 high-numbered lanes of V1 and V2</code>                                                                                                                                                                                                                               |
| A32/T32 Advanced SIMD does not include vector reduction instructions.                                                                                                                                                                             | The <code>v</code> Advanced SIMD mnemonic suffix identifies vector reduction instructions, in which the operand is a vector and the result a scalar, for example:                                                                                                                                                          |
|                                                                                                                                                                                                                                                   | <code>ADDV S0, V1.4S</code>                                                                                                                                                                                                                                                                                                |
| The <code>P</code> mnemonic qualifier which indicates pairwise instructions is a prefix, for example, <code>VPADD</code> .                                                                                                                        | The <code>P</code> mnemonic qualifier is a suffix, for example <code>ADDP</code> .                                                                                                                                                                                                                                         |

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Conditional execution of A32/T32 Advanced SIMD instructions](#) on page 166  
[Advanced SIMD scalars](#) on page 171  
[Normal, long, wide, and narrow Advanced SIMD instructions](#) on page 169  
[Syntax differences between UAL and A64 assembly language](#) on page 103  
[VSEL \(A32\)](#) on page 658  
[FCSEL \(A64\)](#) on page 960

## 10.7 Load values to Advanced SIMD registers

To load a register with a floating-point immediate value, use VMOV in A32 or FMOV in A64. Both instructions exist in scalar and vector forms.

The A32 Advanced SIMD instructions `vmov` and `vmvn` can also load integer immediates. The A64 Advanced SIMD instructions to load integer immediates are `movi` and `mvni`.

You can load any 64-bit integer, single-precision, or double-precision floating-point value from a literal pool using the `VLDR` pseudo-instruction.

## Related information

[VLDR pseudo-instruction \(A32\)](#) on page 554  
[VMOV \(floating-point\) \(A32\)](#) on page 647  
[VMOV \(immediate\) \(A32\)](#) on page 562

## 10.8 Conditional execution of A32/T32 Advanced SIMD instructions

Most Advanced SIMD instructions always execute unconditionally.

You cannot use any of the following Advanced SIMD instructions in an IT block:

- `VCVT {A, N, P, M}`.
- `VMAXNM`.
- `VMINNM`.
- `VRINT {N, X, A, Z, M, P}`.
- All instructions in the Crypto extension.

In addition, specifying any other Advanced SIMD instruction in an IT block is deprecated.

Arm deprecates conditionally executing any Advanced SIMD instruction unless it is a shared Advanced SIMD and floating-point instruction.

## Related information

[Comparison of condition code meanings in integer and floating-point code](#) on page 140  
[Conditional execution in A32 code](#) on page 132  
[Conditional execution in T32 code](#) on page 133  
[Condition code suffixes](#) on page 138

# 10.9 Floating-point exceptions for Advanced SIMD in A32/T32 instructions

The Advanced SIMD extension records floating-point exceptions in the FPSCR cumulative flags.

It records the following exceptions:

### Invalid operation

The exception is caused if the result of an operation has no mathematical value or cannot be represented.

### Division by zero

The exception is caused if a divide operation has a zero divisor and a dividend that is not zero, an infinity or a NaN.

### Overflow

The exception is caused if the absolute value of the result of an operation, produced after rounding, is greater than the maximum positive normalized number for the destination precision.

### Underflow

The exception is caused if the absolute value of the result of an operation, produced before rounding, is less than the minimum positive normalized number for the destination precision, and the rounded result is inexact.

### Inexact

The exception is caused if the result of an operation is not equivalent to the value that would be produced if the operation were performed with unbounded precision and exponent range.

### Input denormal

The exception is caused if a denormalized input operand is replaced in the computation by a zero.

The descriptions of the Advanced SIMD instructions that can cause floating-point exceptions include a subsection listing the exceptions. If there is no such subsection, that instruction cannot cause any floating-point exception.

## Related information

[Flush-to-zero mode in Advanced SIMD](#) on page 173  
[Advanced SIMD Programming](#) on page 160  
[A-Profile Architectures](#)

[Further reading](#)

## 10.10 Advanced SIMD data types in A32/T32 instructions

Most Advanced SIMD instructions use a data type specifier to define the size and type of data that the instruction operates on.

Data type specifiers in Advanced SIMD instructions consist of a letter indicating the type of data, usually followed by a number indicating the width. They are separated from the instruction mnemonic by a point. The following table shows the data types available in Advanced SIMD instructions:

**Table 10-2: Advanced SIMD data types**

| -                           | 8-bit         | 16-bit | 32-bit        | 64-bit        |
|-----------------------------|---------------|--------|---------------|---------------|
| Unsigned integer            | U8            | U16    | U32           | U64           |
| Signed integer              | S8            | S16    | S32           | S64           |
| Integer of unspecified type | I8            | I16    | I32           | I64           |
| Floating-point number       | not available | F16    | F32 (or F)    | not available |
| Polynomial over {0,1}       | P8            | P16    | not available | not available |

The datatype of the second (or only) operand is specified in the instruction.



Most instructions have a restricted range of permitted data types. See the instruction descriptions for details. However, the data type description is flexible:

- If the description specifies I, you can also use the S or U data types.
- If only the data size is specified, you can specify a type (I, S, U, P or F).
- If no data type is specified, you can specify a data type.

### Related information

[Polynomial arithmetic over {0,1}](#) on page 168

## 10.11 Polynomial arithmetic over {0,1}

The coefficients 0 and 1 are manipulated using the rules of Boolean arithmetic.

The following rules apply:

- $0 + 0 = 1 + 1 = 0$ .
- $0 + 1 = 1 + 0 = 1$ .
- $0 * 0 = 0 * 1 = 1 * 0 = 0$ .

- $1 * 1 = 1$ .

That is, adding two polynomials over  $\{0,1\}$  is the same as a bitwise exclusive OR, and multiplying two polynomials over  $\{0,1\}$  is the same as integer multiplication except that partial products are exclusive-ORed instead of being added.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 10.12 Advanced SIMD vectors

An Advanced SIMD operand can be a vector or a scalar. An Advanced SIMD vector can be a 64-bit doubleword vector or a 128-bit quadword vector.

In A32/T32 Advanced SIMD instructions, the size of the elements in an Advanced SIMD vector is specified by a datatype suffix appended to the mnemonic. In A64 Advanced SIMD instructions, the size and number of the elements in an Advanced SIMD vector are specified by a suffix appended to the register.

Doubleword vectors can contain:

- Eight 8-bit elements.
- Four 16-bit elements.
- Two 32-bit elements.
- One 64-bit element.

Quadword vectors can contain:

- Sixteen 8-bit elements.
- Eight 16-bit elements.
- Four 32-bit elements.
- Two 64-bit elements.

### Related information

[Advanced SIMD scalars](#) on page 171

[Extension register bank mapping for Advanced SIMD in AArch32 state](#) on page 160

[Extended notation extension for Advanced SIMD in A32/T32 code](#) on page 172

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 10.13 Normal, long, wide, and narrow Advanced SIMD instructions

Many A32/T32 and A64 Advanced SIMD data processing instructions are available in Normal, Long, Wide, Narrow, and saturating variants.

### Normal operation

The operands can be any of the vector types. The result vector is the same width, and usually the same type, as the operand vectors, for example:

```
VADD.I16 D0, D1, D2
```

You can specify that the operands and result of a normal A32/T32 Advanced SIMD instruction must all be quadwords by appending a `Q` to the instruction mnemonic. If you do this, `armasm` produces an error if the operands or result are not quadwords.

### Long operation

The operands are doubleword vectors and the result is a quadword vector. The elements of the result are usually twice the width of the elements of the operands, and the same type.

Long operation is specified using an `L` appended to the instruction mnemonic, for example:

```
VADDL.S16 Q0, D2, D3
```

### Wide operation

One operand vector is doubleword and the other is quadword. The result vector is quadword. The elements of the result and the first operand are twice the width of the elements of the second operand.

Wide operation is specified using a `w` appended to the instruction mnemonic, for example:

```
VADDW.S16 Q0, Q1, D4
```

### Narrow operation

The operands are quadword vectors and the result is a doubleword vector. The elements of the result are half the width of the elements of the operands.

Narrow operation is specified using an `n` appended to the instruction mnemonic, for example:

```
VADDHN.I16 D0, Q1, Q2
```

### Related information

[Advanced SIMD vectors](#) on page 169

## 10.14 Saturating Advanced SIMD instructions

Saturating instructions saturate the result to the value of the upper limit or lower limit if the result overflows or underflows.

The saturation limits depend on the datatype of the instruction. The following table shows the ranges that Advanced SIMD saturating instructions saturate to, where  $x$  is the result of the operation.

**Table 10-3: Advanced SIMD saturation ranges**

| Data type                 | Saturation range of $x$   |
|---------------------------|---------------------------|
| Signed byte (s8)          | $-2^7 \leq x < 2^7$       |
| Signed halfword (s16)     | $-2^{15} \leq x < 2^{15}$ |
| Signed word (s32)         | $-2^{31} \leq x < 2^{31}$ |
| Signed doubleword (s64)   | $-2^{63} \leq x < 2^{63}$ |
| Unsigned byte (u8)        | $0 \leq x < 2^8$          |
| Unsigned halfword (u16)   | $0 \leq x < 2^{16}$       |
| Unsigned word (u32)       | $0 \leq x < 2^{32}$       |
| Unsigned doubleword (u64) | $0 \leq x < 2^{64}$       |

Saturating Advanced SIMD arithmetic instructions set the QC bit in the floating-point status register (FPSCR in AArch32 or FPSR in AArch64) to indicate that saturation has occurred.

Saturating instructions are specified using a  $q$  prefix. In A32/T32 Advanced SIMD instructions, this is inserted between the  $v$  and the instruction mnemonic, or between the  $S$  or  $U$  and the mnemonic in A64 Advanced SIMD instructions.

### Related information

[Saturating instructions](#) on page 250

## 10.15 Advanced SIMD scalars

Some Advanced SIMD instructions act on scalars in combination with vectors. Advanced SIMD scalars can be 8-bit, 16-bit, 32-bit, or 64-bit.

In A32/T32 Advanced SIMD instructions, the instruction syntax refers to a single element in a vector register using an index,  $x$ , into the vector, so that  $Dm[x]$  is the  $x$ th element in vector  $Dm$ .

In A64 Advanced SIMD instructions, you append the index to the element size specifier, so that  $Vm.D[x]$  is the  $x$ th doubleword element in vector  $Vm$ .

In A64 Advanced SIMD scalar instructions, you refer to registers using a name that indicates the number of significant bits. The names are  $Bn$ ,  $Hn$ ,  $Sn$ , or  $Dn$ , where  $n$  is the register number (0-31). The unused high bits are ignored on a read and set to zero on a write.

Other than A32/T32 Advanced SIMD multiply instructions, instructions that access scalars can access any element in the register bank.

A32/T32 Advanced SIMD multiply instructions only allow 16-bit or 32-bit scalars, and can only access the first 32 scalars in the register bank. That is, in multiply instructions:

- 16-bit scalars are restricted to registers D0-D7, with  $x$  in the range 0-3.
- 32-bit scalars are restricted to registers D0-D15, with  $x$  either 0 or 1.

### Related information

[Advanced SIMD vectors](#) on page 169

[Extension register bank mapping for Advanced SIMD in AArch32 state](#) on page 160

## 10.16 Extended notation extension for Advanced SIMD in A32/T32 code

armasm implements an extension to the architectural Advanced SIMD assembly syntax, called extended notation. This extension allows you to include datatype information or scalar indexes in register names.



Extended notation is not supported for A64 code.

Note

---

If you use extended notation, you do not have to include the data type or scalar index information in every instruction.

Register names can be any of the following:

#### Untyped

The register name specifies the register, but not what datatype it contains, nor any index to a particular scalar within the register.

#### Untyped with scalar index

The register name specifies the register, but not what datatype it contains. It specifies an index to a particular scalar within the register.

#### Typed

The register name specifies the register, and what datatype it contains, but not any index to a particular scalar within the register.

#### Typed with scalar index

The register name specifies the register, what datatype it contains, and an index to a particular scalar within the register.

Use the `DN` and `QN` directives to define names for typed and scalar registers.

## Related information

[Advanced SIMD vectors](#) on page 169

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Advanced SIMD scalars](#) on page 171

[QN, DN, and SN](#) on page 1477

## 10.17 Advanced SIMD system registers in AArch32 state

Advanced SIMD system registers are accessible in all implementations of Advanced SIMD.

For exception levels using AArch32, the following Advanced SIMD system registers are accessible in all Advanced SIMD implementations:

- FPSCR, the floating-point status and control register.
- FPEXC, the floating-point exception register.
- FPSID, the floating-point system ID register.

A particular Advanced SIMD implementation can have additional registers. For more information, see the Technical Reference Manual for your processor.



Advanced SIMD technology shares the same set of system registers as floating-point.

## Related information

[The Read-Modify-Write operation](#) on page 123

[A-Profile Architectures](#)

[Further reading](#)

## 10.18 Flush-to-zero mode in Advanced SIMD

Flush-to-zero mode replaces denormalized numbers with zero. This does not comply with IEEE 754 arithmetic, but in some circumstances can improve performance considerably.

Flush-to-zero mode in Advanced SIMD always preserves the sign bit.

Advanced SIMD always uses flush-to-zero mode.

## Related information

[When to use flush-to-zero mode in Advanced SIMD](#) on page 174

[The effects of using flush-to-zero mode in Advanced SIMD](#) on page 174

[Advanced SIMD operations not affected by flush-to-zero mode](#) on page 175

## 10.19 When to use flush-to-zero mode in Advanced SIMD

You can change between flush-to-zero mode and normal mode, depending on the requirements of different parts of your code.

You must select flush-to-zero mode if all the following are true:

- IEEE 754 compliance is not a requirement for your system.
- The algorithms you are using sometimes generate denormalized numbers.
- Your system uses support code to handle denormalized numbers.
- The algorithms you are using do not depend for their accuracy on the preservation of denormalized numbers.
- The algorithms you are using do not generate frequent exceptions as a result of replacing denormalized numbers with 0.

You select flush-to-zero mode in one of the following ways:

- In A32 code, by setting the FZ bit in the FPSCR to 1. You do this using the `vMRS` and `vMSR` instructions.
- In A64 code, by setting the FZ bit in the FPCR to 1. You do this using the `MRS` and `MSR` instructions.

You can change between flush-to-zero and normal mode at any time, if different parts of your code have different requirements. Numbers already in registers are not affected by changing mode.

### Related information

[Flush-to-zero mode in Advanced SIMD](#) on page 173

[The effects of using flush-to-zero mode in Advanced SIMD](#) on page 174

## 10.20 The effects of using flush-to-zero mode in Advanced SIMD

In flush-to-zero mode, denormalized inputs are treated as zero. Results that are too small to be represented in a normalized number are replaced with zero.

With certain exceptions, flush-to-zero mode has the following effects on floating-point operations:

- A denormalized number is treated as 0 when used as an input to a floating-point operation. The source register is not altered.
- If the result of a single-precision floating-point operation, before rounding, is in the range  $-2^{-126}$  to  $+2^{-126}$ , it is replaced by 0.
- If the result of a double-precision floating-point operation, before rounding, is in the range  $-2^{-1022}$  to  $+2^{-1022}$ , it is replaced by 0.

In flush-to-zero mode, an Input Denormal exception occurs whenever a denormalized number is used as an operand. An Underflow exception occurs when a result is flushed-to-zero.

### Related information

[Flush-to-zero mode in Advanced SIMD](#) on page 173

[Advanced SIMD operations not affected by flush-to-zero mode](#) on page 175

## 10.21 Advanced SIMD operations not affected by flush-to-zero mode

Some Advanced SIMD instructions can be carried out on denormalized numbers even in flush-to-zero mode, without flushing the results to zero.

These instructions are as follows:

- Copy, absolute value, and negate (`vMOV`, `VMVN`, `V{Q}ABS`, and `V{Q}NEG`).
- Duplicate (`VDUP`).
- Swap (`VSWP`).
- Load and store (`VLDR` and `VSTR`).
- Load multiple and store multiple (`VLDM` and `VSTM`).
- Transfer between extension registers and AArch32 general-purpose registers (`vMOV`).

### Related information

[Flush-to-zero mode in Advanced SIMD](#) on page 173

[VABS \(A32\)](#) on page 514

[VABS \(floating-point\) \(A32\)](#) on page 628

[VDUP \(A32\)](#) on page 541

[VLDM \(A32\)](#) on page 551

[VLDR \(A32\)](#) on page 552

[VMOV \(register\) \(A32\)](#) on page 563

[VMOV \(between two general-purpose registers and a 64-bit extension register\) \(A32\)](#) on page 563

[VMOV \(between a general-purpose register and an Advanced SIMD scalar\) \(A32\)](#) on page 564

[VSWP \(A32\)](#) on page 621

# 11. Floating-point Programming

Describes floating-point assembly language programming.

## 11.1 Architecture support for floating-point

Floating-point is an optional extension to the Arm architecture. There are versions that provide additional instructions.

The floating-point instruction set supported in A32 is based on VFPv4, but with the addition of some new instructions, including the following:

- Floating-point round to integral.
- Conversion from floating-point to integer with a directed rounding mode.
- Direct conversion between half-precision and double-precision floating-point.
- Floating-point conditional select.

In AArch32 state, the register bank consists of thirty-two 64-bit registers, and smaller registers are packed into larger ones, as in Arm®v7 and earlier.

In AArch64 state, the register bank includes thirty-two 128-bit registers and has a new register packing model.

Floating point instructions in A64 are closely based on VFPv4 and A32, but with new instruction mnemonics and some functional enhancements.

### Related information

[Floating-point support](#)

[Further reading](#)

## 11.2 Extension register bank mapping for floating-point in AArch32 state

The floating-point extension register bank is a collection of registers that can be accessed as either 32-bit or 64-bit registers. It is distinct from the Arm core register bank.

The following figure shows the views of the extension register bank, and the overlap between the different size registers. For example, the 64-bit register `d0` is an alias for two consecutive 32-bit registers `s0` and `s1`. The 64-bit registers `d16` and `d17` do not have an alias.

**Figure 11-1: Extension register bank for floating-point in AArch32 state**

The aliased views enable half-precision, single-precision, and double-precision values to coexist in different non-overlapped registers at the same time.

You can also use the same overlapped registers to store half-precision, single-precision, and double-precision values at different times.

Do not attempt to use overlapped 32-bit and 64-bit registers at the same time because it creates meaningless results.

The mapping between the registers is as follows:

- $s_{2n}$  maps to the least significant half of  $D_n$
- $s_{2n+1}$  maps to the most significant half of  $D_n$

For example, you can access the least significant half of register  $D_6$  by referring to  $s_{12}$ , and the most significant half of  $D_6$  by referring to  $s_{13}$ .

## Related information

[Views of the floating-point extension register bank in AArch32 state](#) on page 179

## 11.3 Extension register bank mapping in AArch64 state

The extension register bank is a collection of registers that can be accessed as 16-bit, 32-bit, or 64-bit. It is distinct from the Arm core register bank.

The following figure shows the views of the extension register bank, and the overlap between the different size registers.

**Figure 11-2: Extension register bank for floating-point in AArch64 state**



The mapping between the registers is as follows:

- $s_{<n>}$  maps to the least significant half of  $d_{<n>}$

- $H<n>$  maps to the least significant half of  $S<n>$

For example, you can access the least significant half of register  $D7$  by referring to  $s7$ .

#### Related information

[Views of the floating-point extension register bank in AArch64 state](#) on page 179

## 11.4 Views of the floating-point extension register bank in AArch32 state

Floating-point can have different views of the extension register bank in AArch32 state.

The floating-point extension register bank can be viewed as:

- Thirty-two 64-bit registers,  $D0-D31$ .
- Thirty-two 32-bit registers,  $s0-s31$ . Only half of the register bank is accessible in this view.
- A combination of registers from these views.

64-bit floating-point registers are called double-precision registers and can contain double-precision floating-point values. 32-bit floating-point registers are called single-precision registers and can contain either a single-precision or two half-precision floating-point values.

#### Related information

[Extension register bank mapping for floating-point in AArch32 state](#) on page 176

## 11.5 Views of the floating-point extension register bank in AArch64 state

Floating-point can have different views of the extension register bank in AArch64 state.

The floating-point extension register bank can be viewed as:

- Thirty-two 64-bit registers  $D0-D31$ .
- Thirty-two 32-bit registers  $s0-s31$ .
- Thirty-two 16-bit registers  $H0-H31$ .
- A combination of registers from these views.

#### Related information

[Extension register bank mapping in AArch64 state](#) on page 178

## 11.6 Differences between A32/T32 and A64 floating-point instruction syntax

The syntax and mnemonics of A64 floating-point instructions are based on those in A32/T32 but with some differences.

The following table describes the main differences.

**Table 11-1: Differences in syntax and mnemonics between A32/T32 and A64 floating-point instructions**

| A32/T32                                                                                                                                                                       | A64                                                                                                                                                                                                                                               |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| All floating-point instruction mnemonics begin with v, for example VMAX.                                                                                                      | The first letter of the instruction mnemonic indicates the data type of the instruction. For example, SMAX, UMAX, and FMAX mean signed, unsigned, and floating-point respectively. No suffix means the type is irrelevant and P means polynomial. |
| A mnemonic qualifier specifies the type and width of elements in a vector. For example, in the following instruction, U32 means 32-bit unsigned integers:                     | A register qualifier specifies the data width and the number of elements in the register. For example, in the following instruction .4S means 4 32-bit elements:                                                                                  |
| VMAX.U32 Q0, Q1, Q2                                                                                                                                                           | UMAX V0.4S, V1.4S, V2.4S                                                                                                                                                                                                                          |
| You can append a condition code to most floating-point instruction mnemonics to make them conditional.                                                                        | A64 has no conditionally executed floating-point instructions.                                                                                                                                                                                    |
| The floating-point select instruction, VSEL, is unconditionally executed but uses a condition code as an operand. You append the condition code to the mnemonic, for example: | There are several floating-point instructions that use a condition code as an operand. You specify the condition code in the final operand position, for example:                                                                                 |
| VSELEQ.F32 S1,S2,S3                                                                                                                                                           | FCSEL S1,S2,S3,EQ                                                                                                                                                                                                                                 |
| The P mnemonic qualifier which indicates pairwise instructions is a prefix, for example, VPADD.                                                                               | The P mnemonic qualifier is a suffix, for example ADDP.                                                                                                                                                                                           |

## 11.7 Load values to floating-point registers

To load a register with a floating-point immediate value, use `VMOV` in A32 or `FMOV` in A64. Both instructions exist in scalar and vector forms.

You can load any 64-bit integer, single-precision, or double-precision floating-point value from a literal pool using the `VLDR` pseudo-instruction.

### Related information

[VLDR pseudo-instruction \(floating-point\) \(A32\)](#) on page 642

[VMOV \(floating-point\) \(A32\)](#) on page 647

[FMOV \(scalar, immediate\) \(A64\)](#) on page 986

## 11.8 Conditional execution of A32/T32 floating-point instructions

You can execute floating-point instructions conditionally, in the same way as most A32 and T32 instructions.

You cannot use any of the following floating-point instructions in an IT block:

- VRINT {A, N, P, M}.
- VSEL.
- VCVT {A, N, P, M}.
- VMAXNM.
- VMINNM.

In addition, specifying any other floating-point instruction in an IT block is deprecated.

Most A32 floating-point instructions can be conditionally executed, by appending a condition code suffix to the instruction.

### Related information

[Comparison of condition code meanings in integer and floating-point code](#) on page 140

[Conditional execution in A32 code](#) on page 132

[Conditional execution in T32 code](#) on page 133

[Condition code suffixes](#) on page 138

## 11.9 Floating-point exceptions for floating-point in A32/T32 instructions

The floating-point extension records floating-point exceptions in the FPSCR cumulative flags.

It records the following exceptions:

### Invalid operation

The exception is caused if the result of an operation has no mathematical value or cannot be represented.

### Division by zero

The exception is caused if a divide operation has a zero divisor and a dividend that is not zero, an infinity or a NaN.

### Overflow

The exception is caused if the absolute value of the result of an operation, produced after rounding, is greater than the maximum positive normalized number for the destination precision.

## Underflow

The exception is caused if the absolute value of the result of an operation, produced before rounding, is less than the minimum positive normalized number for the destination precision, and the rounded result is inexact.

## Inexact

The exception is caused if the result of an operation is not equivalent to the value that would be produced if the operation were performed with unbounded precision and exponent range.

## Input denormal

The exception is caused if a denormalized input operand is replaced in the computation by a zero.

The descriptions of the floating-point instructions that can cause floating-point exceptions include a subsection listing the exceptions. If there is no such subsection, that instruction cannot cause any floating-point exception.

## Related information

[Flush-to-zero mode in floating-point](#) on page 184

[Floating-point Instructions \(32-bit\)](#) on page 627

[A-Profile Architectures](#)

[Further reading](#)

## 11.10 Floating-point data types in A32/T32 instructions

Most floating-point instructions use a data type specifier to define the size and type of data that the instruction operates on.

Data type specifiers in floating-point instructions consist of a letter indicating the type of data, usually followed by a number indicating the width. They are separated from the instruction mnemonic by a point.

The following data types are available in floating-point instructions:

### 16-bit

F16

### 32-bit

F32 (or F)

### 64-bit

F64 (or D)

The datatype of the second (or only) operand is specified in the instruction.



Most instructions have a restricted range of permitted data types. See the instruction descriptions for details. However, the data type description is flexible:

- If the description specifies I, you can also use the S or U data types.
- If only the data size is specified, you can specify a type (S, U, P or F).
- If no data type is specified, you can specify a data type.

## Related information

[Polynomial arithmetic over {0,1}](#) on page 168

## 11.11 Extended notation extension for floating-point in A32/T32 code

armasm implements an extension to the architectural floating-point assembly syntax, called extended notation. This extension allows you to include datatype information or scalar indexes in register names.



Extended notation is not supported for A64 code.

Note

If you use extended notation, you do not have to include the data type or scalar index information in every instruction.

Register names can be any of the following:

### Untyped

The register name specifies the register, but not what datatype it contains, nor any index to a particular scalar within the register.

### Untyped with scalar index

The register name specifies the register, but not what datatype it contains. It specifies an index to a particular scalar within the register.

### Typed

The register name specifies the register, and what datatype it contains, but not any index to a particular scalar within the register.

### Typed with scalar index

The register name specifies the register, what datatype it contains, and an index to a particular scalar within the register.

Use the `SN` and `DN` directives to define names for typed and scalar registers.

## Related information

[Floating-point data types in A32/T32 instructions](#) on page 182  
[QN, DN, and SN](#) on page 1477

## 11.12 Floating-point system registers in AArch32 state

Floating-point system registers are accessible in all implementations of floating-point.

For exception levels using AArch32, the following floating-point system registers are accessible in all floating-point implementations:

- FPSCR, the floating-point status and control register.
- FPEXC, the floating-point exception register.
- FPSID, the floating-point system ID register.

A particular floating-point implementation can have additional registers. For more information, see the Technical Reference Manual for your processor.

## Related information

[The Read-Modify-Write operation](#) on page 123  
[A-Profile Architectures](#)  
[Further reading](#)

## 11.13 Flush-to-zero mode in floating-point

Flush-to-zero mode replaces denormalized numbers with zero. This does not comply with IEEE 754 arithmetic, but in some circumstances can improve performance considerably.

Some implementations of floating-point use support code to handle denormalized numbers. The performance of such systems, in calculations involving denormalized numbers, is much less than it is in normal calculations.

Flush-to-zero mode in floating-point always preserves the sign bit.

## Related information

[When to use flush-to-zero mode in floating-point](#) on page 184  
[The effects of using flush-to-zero mode in floating-point](#) on page 185  
[Floating-point operations not affected by flush-to-zero mode](#) on page 186

## 11.14 When to use flush-to-zero mode in floating-point

You can change between flush-to-zero mode and normal mode, depending on the requirements of different parts of your code.

You must select flush-to-zero mode if all the following are true:

- IEEE 754 compliance is not a requirement for your system.
- The algorithms you are using sometimes generate denormalized numbers.
- Your system uses support code to handle denormalized numbers.
- The algorithms you are using do not depend for their accuracy on the preservation of denormalized numbers.
- The algorithms you are using do not generate frequent exceptions as a result of replacing denormalized numbers with 0.

You select flush-to-zero mode in one of the following ways:

- In A32 code, by setting the FZ bit in the FPSCR to 1. You do this using the `VMRS` and `VMSR` instructions.
- In A64 code, by setting the FZ bit in the FPCR to 1. You do this using the `MRS` and `MSR` instructions.

You can change between flush-to-zero and normal mode at any time, if different parts of your code have different requirements. Numbers already in registers are not affected by changing mode.

### Related information

[Flush-to-zero mode in floating-point](#) on page 184

[The effects of using flush-to-zero mode in floating-point](#) on page 185

## 11.15 The effects of using flush-to-zero mode in floating-point

In flush-to-zero mode, denormalized inputs are treated as zero. Results that are too small to be represented in a normalized number are replaced with zero.

With certain exceptions, flush-to-zero mode has the following effects on floating-point operations:

- A denormalized number is treated as 0 when used as an input to a floating-point operation. The source register is not altered.
- If the result of a single-precision floating-point operation, before rounding, is in the range  $-2^{-126}$  to  $+2^{-126}$ , it is replaced by 0.
- If the result of a double-precision floating-point operation, before rounding, is in the range  $-2^{-1022}$  to  $+2^{-1022}$ , it is replaced by 0.

In flush-to-zero mode, an Input Denormal exception occurs whenever a denormalized number is used as an operand. An Underflow exception occurs when a result is flushed-to-zero.

### Related information

[Flush-to-zero mode in floating-point](#) on page 184

[Floating-point operations not affected by flush-to-zero mode](#) on page 186

## 11.16 Floating-point operations not affected by flush-to-zero mode

Some floating-point instructions can be carried out on denormalized numbers even in flush-to-zero mode, without flushing the results to zero.

These instructions are as follows:

- Absolute value and negate (`VABS` and `VNEG`).
- Load and store (`VLDR` and `VSTR`).
- Load multiple and store multiple (`VLDM` and `VSTM`).
- Transfer between extension registers and Arm general-purpose registers (`VMOV`).

### Related information

[Flush-to-zero mode in floating-point](#) on page 184

[VABS \(floating-point\) \(A32\)](#) on page 628

[VLDM \(floating-point\) \(A32\)](#) on page 639

[VLDR \(floating-point\) \(A32\)](#) on page 640

[VSTM \(floating-point\) \(A32\)](#) on page 659

[VSTR \(floating-point\) \(A32\)](#) on page 660

[VLDM \(A32\)](#) on page 551

[VLDR \(A32\)](#) on page 552

[VSTM \(A32\)](#) on page 613

[VSTR \(A32\)](#) on page 617

[VMOV \(between one general-purpose register and single precision floating-point register\) \(A32\)](#) on page 648

[VMOV \(between two general-purpose registers and a 64-bit extension register\) \(A32\)](#) on page 563

[VNEG \(floating-point\) \(A32\)](#) on page 653

[VNEG \(A32\)](#) on page 573

# 12. armasm Command-line Options

Describes the `armasm` command-line syntax and command-line options.

## 12.1 --16

Instructs `armasm` to interpret instructions as T32 instructions using the pre-UAL T32 syntax.

This option is equivalent to a `CODE16` directive at the head of the source file. Use the `--thumb` option to specify T32 instructions using the UAL syntax.



Not supported for AArch64 state.

Note

### Related information

- [--thumb](#) on page 214
- [CODE16 directive](#) on page 1436

## 12.2 --32

A synonym for the `--arm` command-line option.



Not supported for AArch64 state.

Note

### Related information

- [--arm](#) on page 189

## 12.3 --apcs=qualifier...qualifier

Controls interworking and position independence when generating code.

### Syntax

`--apcs=qualifier...qualifier`

Where `qualifier...qualifier` denotes a list of qualifiers. There must be:

- At least one qualifier present.
- No spaces or commas separating individual qualifiers in the list.

Each instance of *qualifier* must be one of:

**none**

Specifies that the input file does not use AAPCS. AAPCS registers are not set up. Other qualifiers are not permitted if you use **none**.

**/interwork, /nointerwork**

For Arm®v7-A, **/interwork** specifies that the code in the input file can interwork between A32 and T32 safely.

For Armv8-A, **/interwork** specifies that the code in the input file can interwork between A32 and T32 safely.

The default is **/nointerwork**.

**/nointerwork** is not supported for AArch64 state.

**/inter, /nointer**

Are synonyms for **/interwork** and **/nointerwork**.

**/inter** is not supported for AArch64 state.

**/ropi, /noropi**

**/ropi** specifies that the code in the input file is Read-Only Position-Independent (ROPI). The default is **/noropi**.

**/pic, /nopic**

Are synonyms for **/ropi** and **/noropi**.

**/rwpi, /norwpi**

**/rwpi** specifies that the code in the input file is Read-Write Position-Independent (RWPI). The default is **/norwpi**.

**/pid, /nopid**

Are synonyms for **/rwpi** and **/norwpi**.

**/fpic, /nofpic**

**/fpic** specifies that the code in the input file is read-only independent and references to addresses are suitable for use in a Linux shared object. The default is **/nofpic**.

**/hardfp, /softfp**

Requests hardware or software floating-point linkage. This enables the procedure call standard to be specified separately from the version of the floating-point hardware available through the **--fpu** option. It is still possible to specify the procedure call standard by using the **--fpu** option, but Arm recommends you use **--apcs**. If floating-point support is not permitted (for example, because **--fpu=none** is specified, or because of other means), then **/hardfp** and **/softfp** are ignored. If floating-point support is permitted and the softfp calling convention is used (**--fpu=softvfp** or **--fpu=softvfp+fp-armv8**), then **/hardfp** gives an error.

/softfp is not supported for AArch64 state.

## Usage

This option specifies whether you are using the *Procedure Call Standard for the Arm Architecture* (AAPCS). It can also specify some attributes of code sections.

The AAPCS forms part of the *Base Standard Application Binary Interface for the Arm Architecture* (BSABI) specification. By writing code that adheres to the AAPCS, you can ensure that separately compiled and assembled modules can work together.



AAPCS qualifiers do not affect the code produced by `armasm`. They are an assertion by the programmer that the code in the input file complies with a particular variant of AAPCS. They cause attributes to be set in the object file produced by `armasm`. The linker uses these attributes to check compatibility of files, and to select appropriate library variants.

## Example

```
armasm --cpu=8-A.32 --apcs=/inter/hardfp inputfile.s
```

## Related information

[Application Binary Interface \(ABI\)](#)

## 12.4 --arm

Instructs `armasm` to interpret instructions as A32 instructions. It does not, however, guarantee A32-only code in the object file. This is the default. Using this option is equivalent to specifying the `ARM` or `CODE32` directive at the start of the source file.



Not supported for AArch64 state.

## Related information

[-32](#) on page 187

[--arm\\_only](#) on page 189

[ARM or CODE32 directive](#) on page 1432

## 12.5 --arm\_only

Instructs `armasm` to only generate A32 code. This is similar to `--arm` but also has the property that `armasm` does not permit the generation of any T32 code.



Not supported for AArch64 state.

Note

### Related information

[--arm](#) on page 189

## 12.6 --bi

A synonym for the `--bigend` command-line option.

### Related information

[--bigend](#) on page 190

[--littleend](#) on page 209

## 12.7 --bigend

Generates code suitable for an Arm® processor using big-endian memory access.

### Default

The default is `--littleend`.

### Related information

[--littleend](#) on page 209

[--bi](#) on page 190

## 12.8 --brief\_diagnostics, --no\_brief\_diagnostics

Enables and disables the output of brief diagnostic messages.

This option instructs the assembler whether to use a shorter form of the diagnostic output. In this form, the original source line is not displayed and the error message text is not wrapped when it is too long to fit on a single line. The default is `--no_brief_diagnostics`.

## Related information

[--diag\\_error=tag\[,tag,...\]](#) on page 197  
[--diag\\_warning=tag\[,tag,...\]](#) on page 200

## 12.9 --checkreglist

Instructs the `armasm` to check RLIST, LDM, and STM register lists to ensure that all registers are provided in increasing register number order.

When this option is used, `armasm` gives a warning if the registers are not listed in order.



In AArch32 state, this option is deprecated. Use `--diag_warning 1206` instead. In AArch64 state, this option is not supported..

## Related information

[--diag\\_warning=tag\[,tag,...\]](#) on page 200

## 12.10 --cpreproc

Instructs `armasm` to call `armclang` to preprocess the input file before assembling it.

### Restrictions

You must use `--cpreproc_opts` with this option to correctly configure the `armclang` compiler for pre-processing.

`armasm` only passes the following command-line options to `armclang` by default:

- Basic pre-processor configuration options, such as `-E`.
- User specified include directories, `-I` directives.
- User specified licensing options, such as `--site_license`.
- Anything specified in `--cpreproc_opts`.

## Related information

[--cpreproc\\_opts=option\[,option,...\]](#) on page 191

[Using the C preprocessor](#) on page 156

[-x armclang option](#)

[Command-line options for preprocessing assembly source code](#)

## 12.11 --cpreproc\_opts=option[,option,...]

Enables `armasm` to pass options to `armclang` when using the C preprocessor.

### Syntax

`--cpreproc_opts=option[,option,...]`

Where `option[,option,...]` is a comma-separated list of C preprocessing options.

At least one option must be specified.

### Restrictions

As a minimum, you must specify the `armclang` options `--target` and either `-mcpu` or `-march` in `--cpreproc_opts`.

To assemble code containing C directives that require the C preprocessor, the input assembly source filename must have an upper-case extension `.S`.

You cannot pass the `armclang` option `-x assembler-with-cpp`, because it gets added to `armclang` after the source file name.



**Note** Ensure that you specify compatible architectures in the `armclang` options `--target`, `-mcpu` or `-march`, and the `armasm` option `--cpu`.

### Example

The options to the preprocessor in this example are `--cpreproc_opts==--target=arm-arm-none-eabi,-mcpu=cortex-a9,-D,DEF1,-D,DEF2`.

```
armasm --cpu=cortex-a9 --cpreproc --cpreproc_opts==--target=arm-arm-none-eabi,-mcpu=cortex-a9,-D,DEF1,-D,DEF2 -I /path/to/includes1 -I /path/to/includes2 input.S
```

### Related information

[--cpreproc](#) on page 191

[Using the C preprocessor](#) on page 156

[Command-line options for preprocessing assembly source code](#)

[Specifying a target architecture, processor, and instruction set](#)

[-march armclang option](#)

[-mcpu armclang option](#)

[-target armclang option](#)

[-x armclang option](#)

## 12.12 --cpu=list

Lists the architecture and processor names that are supported by the `--cpu=name` option.

### Syntax

`--cpu=list`

### Related information

[--cpu=name](#) on page 193

## 12.13 --cpu=name

Enables code generation for the selected Arm® processor or architecture.

### Default

There is no default option for `--cpu`.

### Syntax

`--cpu=name`

Where `name` is the name of a processor or architecture:

Processor and architecture names are not case-sensitive.

Wildcard characters are not accepted.

The following table shows the supported architectures. For a complete list of the supported architecture and processor names, specify the `--cpu=list` option.



armasm does not support architectures later than Armv8.3.

Note

**Table 12-1: Supported Arm architectures**

| Architecture name | Description                                                                                            |
|-------------------|--------------------------------------------------------------------------------------------------------|
| 6-M               | Armv6 architecture microcontroller profile.                                                            |
| 6S-M              | Armv6 architecture microcontroller profile with OS extensions.                                         |
| 7-A               | Armv7 architecture application profile.                                                                |
| 7-A.security      | Armv7-A architecture profile with Security Extensions and includes the SMC instruction (formerly SMI). |
| 7-R               | Armv7 architecture real-time profile.                                                                  |
| 7-M               | Armv7 architecture microcontroller profile.                                                            |
| 7E-M              | Armv7-M architecture profile with DSP extension.                                                       |

| Architecture name | Description                                                                               |
|-------------------|-------------------------------------------------------------------------------------------|
| 8-A.32            | Armv8-A architecture profile, AArch32 state.                                              |
| 8-A.32.crypto     | Armv8-A architecture profile, AArch32 state with cryptographic instructions.              |
| 8-A.64            | Armv8-A architecture profile, AArch64 state.                                              |
| 8-A.64.crypto     | Armv8-A architecture profile, AArch64 state with cryptographic instructions.              |
| 8.1-A.32          | Armv8.1, for Armv8-A architecture profile, AArch32 state.                                 |
| 8.1-A.32.crypto   | Armv8.1, for Armv8-A architecture profile, AArch32 state with cryptographic instructions. |
| 8.1-A.64          | Armv8.1, for Armv8-A architecture profile, AArch64 state.                                 |
| 8.1-A.64.crypto   | Armv8.1, for Armv8-A architecture profile, AArch64 state with cryptographic instructions. |
| 8.2-A.32          | Armv8.2, for Armv8-A architecture profile, AArch32 state.                                 |
| 8.2-A.32.crypto   | Armv8.2, for Armv8-A architecture profile, AArch32 state with cryptographic instructions. |
| 8.2-A.64          | Armv8.2, for Armv8-A architecture profile, AArch64 state.                                 |
| 8.2-A.64.crypto   | Armv8.2, for Armv8-A architecture profile, AArch64 state with cryptographic instructions. |
| 8.3-A.32          | Armv8.3, for Armv8-A architecture profile, AArch32 state.                                 |
| 8.3-A.32.crypto   | Armv8.3, for Armv8-A architecture profile, AArch32 state with cryptographic instructions. |
| 8.3-A.64          | Armv8.3, for Armv8-A architecture profile, AArch64 state.                                 |
| 8.3-A.64.crypto   | Armv8.3, for Armv8-A architecture profile, AArch64 state with cryptographic instructions. |
| 8-R               | Armv8-R architecture profile.                                                             |
| 8-M.Base          | Armv8-M baseline architecture profile. Derived from the Armv6-M architecture.             |
| 8-M.Main          | Armv8-M mainline architecture profile. Derived from the Armv7-M architecture.             |
| 8-M.Main.dsp      | Armv8-M mainline architecture profile with DSP extension.                                 |



The full list of supported architectures and processors depends on your license.

## Usage

The following general points apply to processor and architecture options:

### Processors

- Selecting the processor selects the appropriate architecture, Floating-Point Unit (FPU), and memory organization.
- If you specify a processor for the `--cpu` option, the generated code is optimized for that processor. This enables the assembler to use specific coprocessors or instruction scheduling for optimum performance.

## Architectures

If you specify an architecture name for the `--cpu` option, the generated code can run on any processor supporting that architecture. For example, `--cpu=7-A` produces code that can be used by the Cortex®-A9 processor.

### FPU

- Some specifications of `--cpu` imply an `--fpu` selection.



**Note** Any explicit FPU, set with `--fpu` on the command line, overrides an implicit FPU.

- If no `--fpu` option is specified and the `--cpu` option does not imply an `--fpu` selection, then `--fpu=softvfp` is used.

### A32/T32

- Specifying a processor or architecture that supports T32 instructions, such as `--cpu=cortex-a9`, does not make the assembler generate T32 code. It only enables features of the processor to be used, such as long multiply. Use the `--thumb` option to generate T32 code, unless the processor only supports T32 instructions.



**Note** Specifying the target processor or architecture might make the generated object code incompatible with other Arm processors. For example, A32 code generated for architecture Armv8 might not run on a Cortex-A9 processor, if the generated object code includes instructions specific to Armv8. Therefore, you must choose the lowest common denominator processor suited to your purpose.

- If the architecture only supports T32, you do not have to specify `--thumb` on the command line. For example, if building for Cortex-M4 or Armv7-M with `--cpu=7-M`, you do not have to specify `--thumb` on the command line, because Armv7-M only supports T32. Similarly, Armv6-M and other T32-only architectures.

## Restrictions

You cannot specify both a processor and an architecture on the same command-line.

## Example

```
armasm --cpu=Cortex-A17 inputfile.s
```

## Related information

- [--apcs=qualifier...qualifier](#) on page 187
- [--cpu=list](#) on page 192
- [--fpu=name](#) on page 205
- [--thumb](#) on page 214
- [--unsafe](#) on page 215

## A-Profile Architectures

### 12.14 --debug

Instructs the assembler to generate DWARF debug tables.

--debug is a synonym for -g. The default is DWARF 3.



Note

Local symbols are not preserved with --debug. You must specify --keep if you want to preserve the local symbols to aid debugging.

#### Related information

[--dwarf2](#) on page 201

[--dwarf3](#) on page 201

[--keep](#) on page 206

[-g](#) on page 206

### 12.15 --depend=dependfile

Writes makefile dependency lines to a file.

Source file dependency lists are suitable for use with make utilities.

#### Related information

[--md](#) on page 209

[--depend\\_format=string](#) on page 196

### 12.16 --depend\_format=string

Specifies the format of output dependency files, for compatibility with some UNIX make programs.

#### Syntax

`--depend_format=string`

Where *string* is one of:

##### **unix**

generates dependency file entries using UNIX-style path separators.

##### **unix escaped**

is the same as **unix**, but escapes spaces with \\.

**`unix_quoted`**

is the same as `unix`, but surrounds path names with double quotes.

**Related information**

[--depend=dependfile](#) on page 196

## 12.17 --diag\_error=tag[,tag,...]

Sets diagnostic messages that have a specific tag to Error severity.

**Syntax**

`--diag_error=tag[,tag,...]`

Where `tag` can be:

- A diagnostic message number to set to error severity. This is the four-digit number, `nnnn`, with the tool letter prefix, but without the letter suffix indicating the severity.
- `warning`, to treat all warnings as errors.

**Usage**

Diagnostic messages output by the assembler can be identified by a tag in the form of `prefixnumber`, where the `prefix` is A.

You can specify more than one tag with this option by separating each tag using a comma. You can specify the optional assembler prefix A before the tag number. If any prefix other than A is included, the message number is ignored.

The following table shows the meaning of the term severity used in the option descriptions:

**Table 12-2: Severity of diagnostic messages**

| Severity | Description                                                                                                                                                                                                                    |
|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Error    | Errors indicate violations in the syntactic or semantic rules of assembly language. Assembly continues, but object code is not generated.                                                                                      |
| Warning  | Warnings indicate unusual conditions in your code that might indicate a problem. Assembly continues, and object code is generated unless any problems with an Error severity are detected.                                     |
| Remark   | Remarks indicate common, but not recommended, use of assembly language. These diagnostics are not issued by default. Assembly continues, and object code is generated unless any problems with an Error severity are detected. |

**Related information**

[--brief\\_diagnostics, --no\\_brief\\_diagnostics](#) on page 190

[--diag\\_remark=tag\[,tag,...\]](#) on page 198

[--diag\\_style=arm|ide|gnu](#) on page 198

[--diag\\_suppress=tag\[,tag,\]](#) on page 199  
[--diag\\_warning=tag\[,tag,...\]](#) on page 200

## 12.18 --diag\_remark=tag[,tag,...]

Sets diagnostic messages that have a specific tag to Remark severity.

### Syntax

`--diag_remark=tag[,tag,...]`

Where *tag* is a comma-separated list of diagnostic message numbers. This is the four-digit number, *nnnn*, with the tool letter prefix, but without the letter suffix indicating the severity.

### Usage

Diagnostic messages output by the assembler can be identified by a tag in the form of *prefixnumber*, where the *prefix* is A.

You can specify more than one tag with this option by separating each tag using a comma. You can specify the optional assembler prefix A before the tag number. If any prefix other than A is included, the message number is ignored.

### Related information

[--brief\\_diagnostics, --no\\_brief\\_diagnostics](#) on page 190  
[--diag\\_error=tag\[,tag,...\]](#) on page 197  
[--diag\\_style=arm|ide|gnu](#) on page 198  
[--diag\\_suppress=tag\[,tag,\]](#) on page 199  
[--diag\\_warning=tag\[,tag,...\]](#) on page 200

## 12.19 --diag\_style=arm|ide|gnu

Specifies the display style for diagnostic messages.

### Default

The default is `--diag_style=arm`.

### Syntax

`--diag_style=string`

Where *string* is one of:

**arm**

Display messages using the legacy Arm® compiler style.

**ide**

Include the line number and character count for any line that is in error. These values are displayed in parentheses.

**gnu**

Display messages in the format used by `gcc`.

## Usage

`--diag_style=gnu` matches the format reported by the GNU Compiler, `gcc`.

`--diag_style=ide` matches the format reported by Microsoft Visual Studio.

Choosing the option `--diag_style=ide` implicitly selects the option `--brief_diagnostics`. Explicitly selecting `--no_brief_diagnostics` on the command line overrides the selection of `--brief_diagnostics` implied by `--diag_style=ide`.

Selecting either the option `--diag_style=arm` or the option `--diag_style=gnu` does not imply any selection of `--brief_diagnostics`.

## Related information

[--brief\\_diagnostics, --no\\_brief\\_diagnostics](#) on page 190

[--diag\\_error=tag\[,tag,...\]](#) on page 197

[--diag\\_remark=tag\[,tag,...\]](#) on page 198

[--diag\\_suppress=tag\[,tag,...\]](#) on page 199

[--diag\\_warning=tag\[,tag,...\]](#) on page 200

## 12.20 `--diag_suppress=tag[,tag, ...]`

Suppresses diagnostic messages that have a specific tag.

### Syntax

`--diag_suppress=tag[,tag,...]`

Where `tag` can be:

- A diagnostic message number to be suppressed. This is the four-digit number, `nnnn`, with the tool letter prefix, but without the letter suffix indicating the severity.
- `error`, to suppress all errors that can be downgraded.
- `warning`, to suppress all warnings.

Diagnostic messages output by `armasm` can be identified by a tag in the form of `prefixnumber`, where the `prefix` is A.

You can specify more than one tag with this option by separating each tag using a comma.

## Example

For example, to suppress the warning messages that have numbers 1293 and 187, use the following command:

```
armasm --cpu=8-A.64 --diag_suppress=1293,187
```

You can specify the optional assembler prefix A before the tag number. For example:

```
armasm --cpu=8-A.64 --diag_suppress=A1293,A187
```

If any prefix other than A is included, the message number is ignored. Diagnostic message tags can be cut and pasted directly into a command line.

## Related information

- [--brief\\_diagnostics, --no\\_brief\\_diagnostics](#) on page 190
- [--diag\\_error=tag\[,tag,...\]](#) on page 197
- [--diag\\_remark=tag\[,tag,...\]](#) on page 198
- [--diag\\_style=arm|ide|gnu](#) on page 198
- [--diag\\_warning=tag\[,tag,...\]](#) on page 200

## 12.21 --diag\_warning=tag[,tag,...]

Sets diagnostic messages that have a specific tag to Warning severity.

### Syntax

```
--diag_warning=tag[,tag,...]
```

Where *tag* can be:

- A diagnostic message number to set to warning severity. This is the four-digit number, *nnnn*, with the tool letter prefix, but without the letter suffix indicating the severity.
- `error`, to set all errors that can be downgraded to warnings.

Diagnostic messages output by the assembler can be identified by a tag in the form of *prefixnumber*, where the *prefix* is A.

You can specify more than one tag with this option by separating each tag using a comma.

You can specify the optional assembler prefix A before the tag number. If any prefix other than A is included, the message number is ignored.

## Related information

- [--brief\\_diagnostics, --no\\_brief\\_diagnostics](#) on page 190
- [--diag\\_error=tag\[,tag,...\]](#) on page 197
- [--diag\\_remark=tag\[,tag,...\]](#) on page 198

[--diag\\_style=arm|ide|gnu](#) on page 198  
[--diag\\_suppress=tag\[,tag,\]](#) on page 199

## 12.22 --dllexport\_all

Controls symbol visibility when building DLLs.

This option gives all exported global symbols `STV_PROTECTED` visibility in ELF rather than `STV_HIDDEN`, unless overridden by source directives.

### Related information

[EXPORT or GLOBAL](#) on page 1448

## 12.23 --dwarf2

Uses DWARF 2 debug table format.



Not supported for AArch64 state.

Note

---

This option can be used with `--debug`, to instruct `armasm` to generate DWARF 2 debug tables.

### Related information

[--debug](#) on page 196  
[--dwarf3](#) on page 201

## 12.24 --dwarf3

Uses DWARF 3 debug table format.

This option can be used with `--debug`, to instruct the assembler to generate DWARF 3 debug tables. This is the default if `--debug` is specified.

### Related information

[--debug](#) on page 196  
[--dwarf2](#) on page 201

## 12.25 --errors=errorfile

Redirects the output of diagnostic messages from stderr to the specified errors file.

## 12.26 --exceptions, --no\_exceptions

Enables or disables exception handling.



Note

Not supported for AArch64 state.

These options instruct `armasm` to switch on or off exception table generation for all functions defined by `FUNCTION` (or `PROC`) and `ENDFUNC` (or `ENDP`) directives.

`--no_exceptions` causes no tables to be generated. It is the default.

### Related information

[--exceptions\\_unwind, --no\\_exceptions\\_unwind](#) on page 202

[FRAME UNWIND ON](#) on page 1458

[FRAME UNWIND OFF](#) on page 1459

[FUNCTION or PROC](#) on page 1459

[ENDFUNC or ENDP](#) on page 1446

## 12.27 --exceptions\_unwind, --no\_exceptions\_unwind

Enables or disables function unwinding for exception-aware code. This option is only effective if `--exceptions` is enabled.



Note

Not supported for AArch64 state.

The default is `--exceptions_unwind`.

For finer control, use the `FRAME UNWIND ON` and `FRAME UNWIND OFF` directives.

### Related information

[--exceptions, --no\\_exceptions](#) on page 202

[FRAME UNWIND ON](#) on page 1458

[FRAME UNWIND OFF](#) on page 1459  
[FUNCTION or PROC](#) on page 1459  
[ENDFUNC or ENDP](#) on page 1446

## 12.28 --execstack, --no\_execstack

Generates a `.note.GNU-stack` section marking the stack as either executable or non-executable.

You can also use the `AREA` directive to generate either an executable or non-executable `.note.GNU-stack` section. The following code generates an executable `.note.GNU-stack` section. Omitting the `CODE` attribute generates a non-executable `.note.GNU-stack` section.

```
AREA      | .note.GNU-stack|,ALIGN=0,READONLY,NOALLOC, CODE
```

In the absence of `--execstack` and `--no_execstack`, the `.note.GNU-stack` section is not generated unless it is specified by the `AREA` directive.

If both the command-line option and source directive are used and are different, then the stack is marked as executable.

**Table 12-3: Specifying a command-line option and an AREA directive for GNU-stack sections**

| -                           | --execstack command-line option | --no_execstack command-line option |
|-----------------------------|---------------------------------|------------------------------------|
| execstack AREA directive    | execstack                       | execstack                          |
| no_execstack AREA directive | execstack                       | no_execstack                       |

### Related information

[AREA](#) on page 1428

## 12.29 --execute\_only

Adds the `EXECONLY` `AREA` attribute to all code sections.

### Usage

The `EXECONLY` `AREA` attribute causes the linker to treat the section as execute-only.

It is the user's responsibility to ensure that the code in the section is safe to run in execute-only memory. For example:

- The code must not contain literal pools.
- The code must not attempt to load data from the same, or another, execute-only section.

### Restrictions

This option is only supported for:

- Processors that support the Arm®v8-M.mainline or Armv8-M.baseline architecture.
- Processors that support the Armv7-M architecture, such as Cortex®-M3, Cortex-M4, and Cortex-M7.
- Processors that support the Armv6-M architecture.



Arm has only performed limited testing of execute-only code on Armv6-M targets.

## 12.30 --fpmodel=model

Specifies floating-point standard conformance and sets library attributes and floating-point optimizations.

### Syntax

`--fpmodel=model`

Where *model* is one of:

#### **none**

Source code is not permitted to use any floating-point type or floating-point instruction. This option overrides any explicit `--fpu=name` option.

#### **ieee\_full**

All facilities, operations, and representations guaranteed by the IEEE standard are available in single and double-precision. Modes of operation can be selected dynamically at runtime.

#### **ieee\_fixed**

IEEE standard with round-to-nearest and no inexact exceptions.

#### **ieee\_no\_fenv**

IEEE standard with round-to-nearest and no exceptions. This mode is compatible with the Java floating-point arithmetic model.

#### **std**

IEEE finite values with denormals flushed to zero, round-to-nearest and no exceptions. It is C and C++ compatible. This is the default option.

Finite values are as predicted by the IEEE standard. It is not guaranteed that NaNs and infinities are produced in all circumstances defined by the IEEE model, or that when they are produced, they have the same sign. Also, it is not guaranteed that the sign of zero is that predicted by the IEEE model.

#### **fast**

Some value altering optimizations, where accuracy is sacrificed to fast execution. This is not IEEE compatible, and is not standard C.



This does not cause any changes to the code that you write.

Note

## Example

```
armasm --cpu=8-A.32 --fpemode ieee_full inputfile.s
```

## Related information

- [fpu=name](#) on page 205
- [IEEE Standards Association](#)

## 12.31 --fpu=list

Lists the FPU architecture names that are supported by the --fpu=name option.

## Example

```
armasm --fpu=list
```

## Related information

- [fpemode=model](#) on page 204
- [fpu=name](#) on page 205

## 12.32 --fpu=name

Specifies the target FPU architecture.

### Syntax

```
--fpu=name
```

Where *name* is the name of the target FPU architecture. Specify --fpu=list to list the supported FPU architecture names that you can use with --fpu=name. The default floating-point architecture depends on the target architecture.



Software floating-point linkage is not supported for AArch64 state.

Note

## Usage

If you specify this option, it overrides any implicit FPU option that appears on the command line, for example, where you use the `--cpu` option. Floating-point instructions also produce either errors or warnings if assembled for the wrong target FPU.

`armasm` sets a build attribute corresponding to name in the object file. The linker determines compatibility between object files, and selection of libraries, accordingly.

## Related information

[--fpmode=model](#) on page 204

## 12.33 -g

Enables the generation of debug tables.

This option is a synonym for `--debug`.

## Related information

[--debug](#) on page 196

## 12.34 --help

Displays a summary of the main command-line options.

## Default

This is the default if you specify `armasm` without any options or source files.

## Related information

[--version\\_number](#) on page 216

[--vsn](#) on page 217

## 12.35 -idir[,dir,...]

Adds directories to the source file include path.

Any directories added using this option have to be fully qualified.

## Related information

[GET or INCLUDE](#) on page 1462

## 12.36 --keep

Instructs the assembler to keep named local labels in the symbol table of the object file, for use by the debugger.

### Related information

[KEEP](#) on page 1468

## 12.37 --length=n

Sets the listing page length.

Length zero means an unpaged listing. The default is 66 lines.

### Related information

[--list=file](#) on page 208

## 12.38 --li

A synonym for the `--littleend` command-line option.

### Related information

[--littleend](#) on page 209

[--bigend](#) on page 190

## 12.39 --library\_type=lib

Enables the selected library to be used at link time.

### Syntax

`--library_type=lib`

Where `lib` is one of:

#### **standardlib**

Specifies that the full Arm® runtime libraries are selected at link time. This is the default.

#### **microlib**

Specifies that the C micro-library (microlib) is selected at link time.



- This option can be used with the compiler, assembler, or linker when use of the libraries require more specialized optimizations.
- This option can be overridden at link time by providing it to the linker.
- microlib is not supported for AArch64 state.

## Related information

[Building an application with microlib](#)

## 12.40 --list=file

Instructs the assembler to output a detailed listing of the assembly language produced by the assembler to a file.

If `-` is given as `file`, the listing is sent to `stdout`.

Use the following command-line options to control the behavior of `--list`:

- `--no_terse`.
- `--width`.
- `--length`.
- `--xref`.

## Related information

[--no\\_terse](#) on page 210

[--width=n](#) on page 217

[--length=n](#) on page 207

[--xref](#) on page 217

[OPT](#) on page 1476

## 12.41 --list=

Instructs the assembler to send the detailed assembly language listing to `inputfile.lst`.



You can use `--list` without the equals sign and filename to send the output to `inputfile.lst`. However, this syntax is deprecated and the assembler issues a warning. This syntax is to be removed in a later release. Use `--list=file` instead.

## Related information

[--list=file](#) on page 208

## 12.42 --littleend

Generates code suitable for an Arm® processor using little-endian memory access.

### Related information

- [--bigend](#) on page 190
- [--li](#) on page 207

## 12.43 -m

Instructs the assembler to write source file dependency lists to stdout.

### Related information

- [--md](#) on page 209

## 12.44 --maxcache=n

Sets the maximum source cache size in bytes.

The default is 8MB. `armasm` gives a warning if the size is less than 8MB.

## 12.45 --md

Creates makefile dependency lists.

This option instructs the assembler to write source file dependency lists to `inputfile.d`.

### Related information

- [-m](#) on page 209

## 12.46 --no\_code\_gen

Instructs the assembler to exit after pass 1, generating no object file. This option is useful if you only want to check the syntax of the source code or directives.

## 12.47 --no\_esc

Instructs the assembler to ignore C-style escaped special characters, such as \n and \t.

## 12.48 --no\_hide\_all

Gives all exported and imported global symbols STV\_DEFAULT visibility in ELF rather than STV\_HIDDEN, unless overridden using source directives.

You can use the following directives to specify an attribute that overrides the implicit symbol visibility:

- EXPORT.
- EXTERN.
- GLOBAL.
- IMPORT.

### Related information

[EXPORT or GLOBAL](#) on page 1448

[IMPORT and EXTERN](#) on page 1465

## 12.49 --no\_regs

Instructs armasm not to predefine register names.



This option is deprecated. In AArch32 state, use --regnames=none instead.

Note

### Related information

[--regnames](#) on page 212

## 12.50 --no\_terse

Instructs the assembler to show in the list file the lines of assembly code that it has skipped because of conditional assembly.

If you do not specify this option, the assembler does not output the skipped assembly code to the list file.

This option turns off the terse flag. By default the terse flag is on.

#### Related information

[-list=file](#) on page 208

## 12.51 --no\_warn

Turns off warning messages.

#### Related information

[--diag\\_warning=tag\[,tag,...\]](#) on page 200

## 12.52 -o filename

Specifies the name of the output file.

If this option is not used, the assembler creates an object filename in the form `inputfilename.o`. This option is case-sensitive.

## 12.53 --pd

A synonym for the `--predefine` command-line option.

#### Related information

[--predefine "directive"](#) on page 211

## 12.54 --predefine "directive"

Instructs `armasm` to pre-execute one of the `SETA`, `SETL`, or `SETS` directives.

You must enclose `directive` in quotes, for example:

```
armasm --cpu=8-A.64 --predefine "VariableName SETA 20" inputfile.s
```

`armasm` also executes a corresponding `GBLL`, `GBLS`, or `GBLA` directive to define the variable before setting its value.

The variable name is case-sensitive. The variables defined using the command line are global to `armasm` source files specified on the command line.

## Considerations when using --predefine

Be aware of the following:

- The command-line interface of your system might require you to enter special character combinations, such as \\\", to include strings in *directive*. Alternatively, you can use --via *file* to include a --predefine argument. The command-line interface does not alter arguments from --via files.
- --predefine is not equivalent to the compiler option -Dname. --predefine defines a global variable whereas -Dname defines a macro that the C preprocessor expands.

Although you can use predefined global variables in combination with assembly control directives, for example IF and ELSE to control conditional assembly, they are not intended to provide the same functionality as the C preprocessor in armasm . If you require this functionality, Arm recommends you use the compiler to pre-process your assembly code.

### Related information

[-pd](#) on page 211

[GBLA, GBLL, and GBLS](#) on page 1460

[IF, ELSE, ENDIF, and ELIF](#) on page 1462

[SETA, SETL, and SETS](#) on page 1483

## 12.55 --reduce\_paths, --no\_reduce\_paths

Enables or disables the elimination of redundant path name information in file paths.

Windows systems impose a 260 character limit on file paths. Where relative pathnames exist whose absolute names expand to longer than 260 characters, you can use the --reduce\_paths option to reduce absolute pathname length by matching up directories with corresponding instances of .. and eliminating the directory/.. sequences in pairs.

--no\_reduce\_paths is the default.



**Note** Arm recommends that you avoid using long and deeply nested file paths, in preference to minimizing path lengths using the --reduce\_paths option.



**Note** This option is valid for 32-bit Windows systems only.

## 12.56 --regnames

Controls the predefinition of register names.



Not supported for AArch64 state.

**Note**

### Syntax

`--regnames=option`

Where *option* is one of the following:

**none**

Instructs `armasm` not to predefine register names.

**callstd**

Defines additional register names based on the AAPCS variant that you are using, as specified by the `--apcs` option.

**all**

Defines all AAPCS registers regardless of the value of `--apcs`.

### Related information

[--no\\_regs](#) on page 210

[Predeclared core register names in AArch32 state](#) on page 81

[Predeclared extension register names in AArch32 state](#) on page 81

[--apcs=qualifier...qualifier](#) on page 187

## 12.57 --report-if-not-wysiwyg

Instructs `armasm` to report when it outputs an encoding that was not directly requested in the source code.

This can happen when `armasm`:

- Uses a pseudo-instruction that is not available in other assemblers, for example `MOV32`.
- Outputs an encoding that does not directly match the instruction mnemonic, for example if the assembler outputs the `MVN` encoding when assembling the `MOV` instruction.
- Inserts additional instructions where necessary for instruction syntax semantics, for example `armasm` can insert a missing `IT` instruction before a conditional T32 instruction.



Not supported for AArch64 state.

Note

## 12.58 --show\_cmdline

Outputs the command line used by the assembler.

### Usage

Shows the command line after processing by the assembler, and can be useful to check:

- The command line a build system is using.
- How the assembler is interpreting the supplied command line, for example, the ordering of command-line options.

The commands are shown normalized, and the contents of any via files are expanded.

The output is sent to the standard error stream (`stderr`).

### Related information

[--via=filename](#) on page 216

## 12.59 --thumb

Instructs `armasm` to interpret instructions as T32 instructions, using UAL syntax. This is equivalent to a `THUMB` directive at the start of the source file.



Not supported for AArch64 state.

Note

### Related information

[--arm](#) on page 189

[THUMB directive](#) on page 1486

## 12.60 --unaligned\_access, --no\_unaligned\_access

Enables or disables unaligned accesses to data on Arm®-based processors.

These options instruct the assembler to set an attribute in the object file to enable or disable the use of unaligned accesses.

## 12.61 --unsafe

Enables instructions for other architectures to be assembled without error.



Not supported for AArch64 state.

Note

It downgrades error messages to corresponding warning messages. It also suppresses warnings about operator precedence.

### Related information

- [--diag\\_error=tag\[,tag,...\]](#) on page 197
- [--diag\\_warning=tag\[,tag,...\]](#) on page 200
- [Binary operators](#) on page 233

## 12.62 --untyped\_local\_labels

Causes `armasm` not to set the T32 bit for the address of a numeric local label referenced in an `LDR` pseudo-instruction.



Not supported for AArch64 state.

Note

When this option is not used, if you reference a numeric local label in an `LDR` pseudo-instruction, and the label is in T32 code, then `armasm` sets the T32 bit (bit 0) of the address. You can then use the address as the target for a `BX` or `BLX` instruction.

If you require the actual address of the numeric local label, without the T32 bit set, then use this option.



**Note** When using this option, if you use the address in a branch (register) instruction, armasm treats it as an A32 code address, causing the branch to arrive in A32 state, meaning it would interpret this code as A32 instructions.

## Example

```
THUMB
...
1 ...
...
LDR r0,%B1 ; r0 contains the address of numeric local label "1",
             ; T32 bit is not set if --untyped_local_labels was used
...
```

## Related information

[LDR pseudo-instruction \(A32\)](#) on page 319

[B \(A32\)](#) on page 264

[Numeric local labels](#) on page 225

## 12.63 --version\_number

Displays the version of armasm you are using.

### Usage

The assembler displays the version number in the format `Mmmuuuxx`, where:

- `M` is the major version number, 6.
- `mm` is the minor version number.
- `uu` is the update number.
- `xx` is reserved for Arm internal use. You can ignore this for the purposes of checking whether the current release is a specific version or within a range of versions.

## 12.64 --via=filename

Reads an additional list of input filenames and assembler options from filename.

### Syntax

`--via=filename`

Where `filename` is the name of a via file containing options to be included on the command line.

## Usage

You can enter multiple `--via` options on the assembler command line. The `--via` options can also be included within a via file.

### Related information

- [Overview of via files](#) on page 1490
- [Via file syntax rules](#) on page 1490

## 12.65 --vsn

Displays the version information and the license details.



Note

`--vsn` is intended to report the version information for manual inspection. The Component line indicates the release of Arm® Compiler you are using. If you need to access the version in other tools or scripts, for example in build scripts, use the output from `--version_number`.

### Example

```
> armasm --vsn

Product: ARM Compiler N.n
Component: ARM Compiler N.n
Tool: armasm [tool_id]
license_type
Software supplied by: ARM Limited
```

## 12.66 --width=n

Sets the listing page width.

The default is 79 characters.

### Related information

- [--list=file](#) on page 208

## 12.67 --xref

Instructs the assembler to list cross-referencing information on symbols, including where they were defined and where they were used, both inside and outside macros.

The default is off.

## Related information

[--list=file](#) on page 208

# 13. Symbols, Literals, Expressions, and Operators

Describes how you can use symbols to represent variables, addresses, and constants in code, and how you can combine these with operators to create numeric or string expressions.

## 13.1 Symbol naming rules

You must follow some rules when naming symbols in assembly language source code.

The following rules apply:

- Symbol names must be unique within their scope.
- You can use uppercase letters, lowercase letters, numeric characters, or the underscore character in symbol names. Symbol names are case-sensitive, and all characters in the symbol name are significant.
- Do not use numeric characters for the first character of symbol names, except in numeric local labels.
- Symbols must not use the same name as built-in variable names or predefined symbol names.
- If you use the same name as an instruction mnemonic or directive, use double bars to delimit the symbol name. For example:

```
||ASSERT||
```

The bars are not part of the symbol.

- You must not use the symbols `|$a|`, `|$t|`, or `|$d|` as program labels. These are mapping symbols that mark the beginning of A32, T32, and A64 code, and data within the object file. You must not use `|$x|` in A64 code.
- Symbols beginning with the characters `$v` are mapping symbols that relate to floating-point code. Arm recommends you avoid using symbols beginning with `$v` in your source code.

If you have to use a wider range of characters in symbols, for example, when working with compilers, use single bars to delimit the symbol name. For example:

```
|.text|
```

The bars are not part of the symbol. You cannot use bars, semicolons, or newlines within the bars.

### Related information

[Numeric local labels](#) on page 225

[Predeclared core register names in AArch32 state](#) on page 81

[Predeclared extension register names in AArch32 state](#) on page 81

[Built-in variables and constants](#) on page 148

## 13.2 Variables

You can declare numeric, logical, or string variables using assembler directives.

The value of a variable can be changed as assembly proceeds. Variables are local to the assembler. This means that in the generated code or data, every instance of the variable has a fixed value.

The type of a variable cannot be changed. Variables are one of the following types:

- Numeric.
- Logical.
- String.

The range of possible values of a numeric variable is the same as the range of possible values of a numeric constant or numeric expression.

The possible values of a logical variable are {TRUE} or {FALSE}.

The range of possible values of a string variable is the same as the range of values of a string expression.

Use the **GBLA**, **GBLL**, **GBLS**, **LCLA**, **LCLL**, and **LCLS** directives to declare symbols representing variables, and assign values to them using the **SETA**, **SETL**, and **SETS** directives.

### Example

```
a    SETA 100
L1  MOV R1, # (a*5) ; In the object file, this is MOV R1, #500
a    SETA 200        ; Value of 'a' is 200 only after this point.
                     ; The previous instruction is always MOV R1, #500
...
BNE L1           ; When the processor branches to L1, it executes
                  ; MOV R1, #500
```

### Related information

[Numeric expressions](#) on page 228

[String expressions](#) on page 227

[Numeric constants](#) on page 220

[Logical expressions](#) on page 231

[GBLA, GBL, and GBLS](#) on page 1460

[LCLA, LCLL, and LCLS](#) on page 1469

[SETA, SETL, and SETS](#) on page 1483

## 13.3 Numeric constants

You can define 32-bit numeric constants using the `EQU` assembler directive.

Numeric constants are 32-bit integers in A32 and T32 code. You can set them using unsigned numbers in the range 0 to  $2^{32}$ -1, or signed numbers in the range - $2^{31}$  to  $2^{31}$ -1. However, the assembler makes no distinction between  $-n$  and  $2^{32}-n$ .

In A64 code, numeric constants are 64-bit integers. You can set them using unsigned numbers in the range 0 to  $2^{64}$ -1, or signed numbers in the range - $2^{63}$  to  $2^{63}$ -1. However, the assembler makes no distinction between  $-n$  and  $2^{64}-n$ .

Relational operators such as `>=` use the unsigned interpretation. This means that  $0 > -1$  is `{FALSE}`.

Use the `EQU` directive to define constants. You cannot change the value of a numeric constant after you define it. You can construct expressions by combining numeric constants and binary operators.

### Related information

[Numeric expressions](#) on page 228

[Syntax of numeric literals](#) on page 229

[EQU](#) on page 1447

## 13.4 Assembly time substitution of variables

You can assign a string variable to all or part of a line of assembly language code. A string variable can contain numeric and logical variables.

Use the variable with a `$` prefix in the places where the value is to be substituted for the variable. The dollar character instructs `armasm` to substitute the string into the source code line before checking the syntax of the line. `armasm` faults if the substituted line is larger than the source line limit.

Numeric and logical variables can also be substituted. The current value of the variable is converted to a hexadecimal string (or `T` or `F` for logical variables) before substitution.

Use a dot to mark the end of the variable name if the following character would be permissible in a symbol name. You must set the contents of the variable before you can use it.

If you require a `$` that you do not want to be substituted, use `$$`. This is converted to a single `$`.

You can include a variable with a `$` prefix in a string. Substitution occurs in the same way as anywhere else.

Substitution does not occur within vertical bars, except that vertical bars within double quotes do not affect substitution.

## Example

```

; straightforward substitution
    GBLS    add4ff
    ;
add4ff    SETS    "ADD r4,r4,#0xFF"      ; set up add4ff
    $add4ff.00          ; invoke add4ff
    ; this produces
    ADD r4,r4,#0xFF00
; elaborate substitution
    GBLS    s1
    GBLS    s2
    GBLS    fixup
    GBLA    count
    ;
count     SETA    14
s1        SETS    "a$$b$count" ; s1 now has value a$b0000000E
s2        SETS    "abc"
fixup     SETS    "|xy$s2.z|" ; fixup now has value |xyabcz|
|C$$code|  MOV     r4,#16       ; but the label here is C$$code

```

## Related information

[Syntax of source lines in assembly language](#) on page 97

[Symbol naming rules](#) on page 219

## 13.5 Register-relative and PC-relative expressions

The assembler supports PC-relative and register-relative expressions.

A register-relative expression evaluates to a named register combined with a numeric expression.

You write a PC-relative expression in source code as a label or the PC, optionally combined with a numeric expression. Some instructions can also accept PC-relative expressions in the form [PC, #number].

If you specify a label, the assembler calculates the offset from the PC value of the current instruction to the address of the label. The assembler encodes the offset in the instruction. If the offset is too large, the assembler produces an error. The offset is either added to or subtracted from the PC value to form the required address.

Arm recommends you write PC-relative expressions using labels rather than the PC because the value of the PC depends on the instruction set.



- In A32 code, the value of the PC is the address of the current instruction plus 8 bytes.
- In T32 code:
  - For `B`, `BL`, `CBNZ`, and `CBZ` instructions, the value of the PC is the address of the current instruction plus 4 bytes.
  - For all other instructions that use labels, the value of the PC is the address of the current instruction plus 4 bytes, with bit[1] of the result cleared to 0 to make it word-aligned.

- In A64 code, the value of the PC is the address of the current instruction.

## Example

```

LDR      r4,=data+4*n    ; n is an assembly-time variable
; code
MOV      pc,lr
data   DCD    value_0
; n-1 DCD directives
DCD    value_n           ; data+4*n points here
; more DCD directives

```

## Related information

[Labels](#) on page 223

[MAP](#) on page 1474

## 13.6 Labels

A label is a symbol that represents the memory address of an instruction or data.

The address can be PC-relative, register-relative, or absolute. Labels are local to the source file unless you make them global using the `EXPORT` directive.

The address given by a label is calculated during assembly. `armasm` calculates the address of a label relative to the origin of the section where the label is defined. A reference to a label within the same section can use the PC plus or minus an offset. This is called PC-relative addressing.

Addresses of labels in other sections are calculated at link time, when the linker has allocated specific locations in memory for each section.

## Related information

[Syntax of source lines in assembly language](#) on page 97

[EXPORT or GLOBAL](#) on page 1448

[Labels for PC-relative addresses](#) on page 223

[Labels for register-relative addresses](#) on page 224

[Labels for absolute addresses](#) on page 224

## 13.7 Labels for PC-relative addresses

A label can represent the PC value plus or minus the offset from the PC to the label. Use these labels as targets for branch instructions, or to access small items of data embedded in code sections.

You can define PC-relative labels using a label on an instruction or on one of the data definition directives.

You can also use the section name of an `AREA` directive as a label for PC-relative addresses. In this case the label points to the first byte of the specified `AREA`. Arm does not recommend using `AREA` names as branch targets because when branching from A32 to T32 state or T32 to A32 state in this way, the processor does not change the state properly.

### Related information

[AREA](#) on page 1428

[DCB](#) on page 1438

[DCD and DCDU](#) on page 1439

[DCFD and DCFDU](#) on page 1440

[DCFS and DCFSU](#) on page 1441

[DCI](#) on page 1442

[DCQ and DCQU](#) on page 1443

[DCW and DCWU](#) on page 1444

## 13.8 Labels for register-relative addresses

A label can represent a named register plus a numeric value. You define these labels in a storage map. They are most commonly used to access data in data sections.

You can use the `EQU` directive to define additional register-relative labels, based on labels defined in storage maps.



Register-relative addresses are not supported in A64 code.

Note

### Example of storage map definitions

```
MAP      0,r9
MAP      0xff,r9
```

### Related information

[DCDO](#) on page 1440

[EQU](#) on page 1447

[MAP](#) on page 1474

[SPACE or FILL](#) on page 1485

## 13.9 Labels for absolute addresses

A label can represent the absolute address of code or data.

These labels are numeric constants. In A32 and T32 code they are integers in the range 0 to  $2^{32}-1$ . In A64 code, they are integers in the range 0 to  $2^{64}-1$ . They address the memory directly. You can use labels to represent absolute addresses using the `EQU` directive. To ensure that the labels are used correctly when referenced in code, you can specify the absolute address as:

- A32 code with the `ARM` directive.
- T32 code with the `THUMB` directive.
- Data.

### Example of defining labels for absolute address

```
abc EQU 2           ; assigns the value 2 to the symbol abc
xyz EQU label+8    ; assigns the address (label+8) to the symbol xyz
fiq EQU 0x1C, ARM   ; assigns the absolute address 0x1C to the symbol fiq
                     ; and marks it as A32 code
```

### Related information

[Labels](#) on page 223

[EQU](#) on page 1447

[Labels for PC-relative addresses](#) on page 223

[Labels for register-relative addresses](#) on page 224

## 13.10 Numeric local labels

Numeric local labels are a type of label that you refer to by number rather than by name. They are used in a similar way to PC-relative labels, but their scope is more limited.

A numeric local label is a number in the range 0-99, optionally followed by a name. Unlike other labels, a numeric local label can be defined many times and the same number can be used for more than one numeric local label in an area.

Numeric local labels do not appear in the object file. This means that, for example, a debugger cannot set a breakpoint directly on a numeric local label, like it can for named local labels kept using the `KEEP` directive.

A numeric local label can be used in place of `symbol` in source lines in an assembly language module:

- On its own, that is, where there is no instruction or directive.
- On a line that contains an instruction.
- On a line that contains a code- or data-generating directive.

A numeric local label is generally used where you might use a PC-relative label.

Numeric local labels are typically used for loops and conditional code within a routine, or for small subroutines that are only used locally. They are particularly useful when you are generating labels in macros.

The scope of numeric local labels is limited by the `AREA` directive. Use the `ROUT` directive to limit the scope of numeric local labels more tightly. A reference to a numeric local label refers to a matching label within the same scope. If there is no matching label within the scope in either direction, `armasm` generates an error message and the assembly fails.

You can use the same number for more than one numeric local label even within the same scope. By default, `armasm` links a numeric local label reference to:

- The most recent numeric local label with the same number, if there is one within the scope.
- The next following numeric local label with the same number, if there is not a preceding one within the scope.

Use the optional parameters to modify this search pattern if required.

## Related information

[Syntax of source lines in assembly language](#) on page 97

[Syntax of numeric local labels](#) on page 226

[Labels](#) on page 223

[MACRO and MEND](#) on page 1471

[KEEP](#) on page 1468

[ROUT](#) on page 1483

## 13.11 Syntax of numeric local labels

When referring to numeric local labels you can specify how `armasm` searches for the label.

### Syntax

`n[routname] ;` a numeric local label

`%[F|B] [A|T]n[routname];` a reference to a numeric local label

where:

**n**

is the number of the numeric local label in the range 0-99.

**routname**

is the name of the current scope.

**%**

introduces the reference.

**F**

instructs `armasm` to search forwards only.

**B**

instructs `armasm` to search backwards only.

**A**

instructs `armasm` to search all macro levels.

**T**

instructs `armasm` to look at this macro level only.

## Usage

If neither `F` nor `B` is specified, `armasm` searches backwards first, then forwards.

If neither `A` nor `T` is specified, `armasm` searches all macros from the current level to the top level, but does not search lower level macros.

If `routname` is specified in either a label or a reference to a label, `armasm` checks it against the name of the nearest preceding `ROUT` directive. If it does not match, `armasm` generates an error message and the assembly fails.

## Related information

[Numeric local labels](#) on page 225

[ROUT](#) on page 1483

## 13.12 String expressions

String expressions consist of combinations of string literals, string variables, string manipulation operators, and parentheses.

Characters that cannot be placed in string literals can be placed in string expressions using the `:CHR:` unary operator. Any ASCII character from 0 to 255 is permitted.

The value of a string expression cannot exceed 5120 characters in length. It can be of zero length.

### Example

```
improb SETS "literal":CC:(strvar2:LEFT:4)
; sets the variable improb to the value "literal"
; with the left-most four characters of the
; contents of string variable strvar2 appended
```

## Related information

[String literals](#) on page 227

[Unary operators](#) on page 232

[String manipulation operators](#) on page 234

[Variables](#) on page 220

[SETA, SETL, and SETS](#) on page 1483

## 13.13 String literals

String literals consist of a series of characters or spaces contained between double quote characters.

The length of a string literal is restricted by the length of the input line.

To include a double quote character or a dollar character within the string literal, include the character twice as a pair. For example, you must use `$$` if you require a single `$` in the string.

C string escape sequences are also enabled and can be used within the string, unless `--no_esc` is specified.

### Examples

```
abc      SETS      "this string contains only one "" double quote"
def      SETS      "this string contains only one $$ dollar symbol"
```

### Related information

[Syntax of source lines in assembly language](#) on page 97

[--no\\_esc](#) on page 209

## 13.14 Numeric expressions

Numeric expressions consist of combinations of numeric constants, numeric variables, ordinary numeric literals, binary operators, and parentheses.

Numeric expressions can contain register-relative or program-relative expressions if the overall expression evaluates to a value that does not include a register or the PC.

Numeric expressions evaluate to 32-bit integers in A32 and T32 code. You can interpret them as unsigned numbers in the range 0 to  $2^{32}-1$ , or signed numbers in the range  $-2^{31}$  to  $2^{31}-1$ . However, `armasm` makes no distinction between  $-n$  and  $2^{32} - n$ . Relational operators such as `>=` use the unsigned interpretation. This means that  $0 > -1$  is {`FALSE`}.

In A64 code, numeric expressions evaluate to 64-bit integers. You can interpret them as unsigned numbers in the range 0 to  $2^{64}-1$ , or signed numbers in the range  $-2^{63}$  to  $2^{63}-1$ . However, `armasm` makes no distinction between  $-n$  and  $2^{64} - n$ .



`armasm` does not support 64-bit arithmetic variables. See [SETA, SETL, and SETS](#) (Restrictions) for a workaround.

Arm recommends that you only use `armasm` for legacy Arm syntax assembly code, and that you use the `armclang` assembler and GNU syntax for all new assembly files.

## Example

```
a    SETA    256*256          ; 256*256 is a numeric expression
     MOV      r1,#(a*22)       ; (a*22) is a numeric expression
```

## Related information

[Syntax of numeric literals](#) on page 229  
[Binary operators](#) on page 233  
[Variables](#) on page 220  
[Numeric constants](#) on page 220  
[SETA, SETL, and SETS](#) on page 1483

## 13.15 Syntax of numeric literals

Numeric literals consist of a sequence of characters, or a single character in quotes, evaluating to an integer.

They can take any of the following forms:

- *decimal-digits*.
- *0xhexadecimal-digits*.
- *&hexadecimal-digits*.
- *n\_base-n-digits*.
- *'character'*.

where:

***decimal-digits***

Is a sequence of characters using only the digits 0 to 9.

***hexadecimal-digits***

Is a sequence of characters using only the digits 0 to 9 and the letters A to F or a to f.

***n******\_***

Is a single digit between 2 and 9 inclusive, followed by an underscore character.

***base-n-digits***

Is a sequence of characters using only the digits 0 to (*n*-1)

***character***

Is any single character except a single quote. Use the standard C escape character (\\"') if you require a single quote. The character must be enclosed within opening and closing single quotes. In this case, the value of the numeric literal is the numeric code of the character.

You must not use any other characters. The sequence of characters must evaluate to an integer.

In A32/T32 code, the range is 0 to  $2^{32}$ -1, except in DCQ, DCQU, DCD, and DCQU directives.

In A64 code, the range is 0 to  $2^{64}-1$ , except in DCD and DCQD directives.



- In the DCQ and DCQU, the integer range is 0 to  $2^{64}-1$
- In the DCO and DCOU directives, the integer range is 0 to  $2^{128}-1$

## Examples

```

a      SETA    34906
addr   DCD     0xA10E
        LDR     r4,=&1000000F
        DCD     2_11001010
c3     SETA    8_74007
        DCQ    0x0123456789abcdef
        LDR     r1,'A'      ; pseudo-instruction loading 65 into r1
        ADD     r3,r2,'#'\'' ; add 39 to contents of r2, result to r3

```

## Related information

[Numeric constants](#) on page 220

## 13.16 Syntax of floating-point literals

Floating-point literals consist of a sequence of characters evaluating to a floating-point number.

They can take any of the following forms:

- $\{-\} \text{digits} E \{-\} \text{digits}$
- $\{-\} \{\text{digits}\} . \text{digits}$
- $\{-\} \{\text{digits}\} . \text{digits} E \{-\} \text{digits}$
- $0x \text{hexdigits}$
- $& \text{hexdigits}$
- $0f\_ \text{hexdigits}$
- $0d\_ \text{hexdigits}$

where:

### **digits**

Are sequences of characters using only the digits 0 to 9. You can write  $E$  in uppercase or lowercase. These forms correspond to normal floating-point notation.

### **hexdigits**

Are sequences of characters using only the digits 0 to 9 and the letters A to F or a to f. These forms correspond to the internal representation of the numbers in the computer. Use these forms to enter infinities and NaNs, or if you want to be sure of the exact bit patterns you are using.

The `0x` and `&` forms allow the floating-point bit pattern to be specified by any number of hex digits.

The `0f_` form requires the floating-point bit pattern to be specified by exactly 8 hex digits.

The `0d_` form requires the floating-point bit pattern to be specified by exactly 16 hex digits.

The range for half-precision floating-point values is:

- Maximum 65504 (IEEE format) or 131008 (alternative format).
- Minimum 0.00012201070785522461.

The range for single-precision floating-point values is:

- Maximum 3.40282347e+38.
- Minimum 1.17549435e-38.

The range for double-precision floating-point values is:

- Maximum 1.79769313486231571e+308.
- Minimum 2.22507385850720138e-308.

Floating-point numbers are only available if your system has floating-point, Advanced SIMD with floating-point.

## Examples

```

DCFD    1E308,-4E-100
DCFS    1.0
DCFS    0.02
DCFD    3.725e15
DCFS    0x7FC00000          ; Quiet NaN
DCFD    &FFF000000000000      ; Minus infinity

```

## Related information

[Numeric constants](#) on page 220

[Syntax of numeric literals](#) on page 229

## 13.17 Logical expressions

Logical expressions consist of combinations of logical literals (`{TRUE}` or `{FALSE}`), logical variables, Boolean operators, relations, and parentheses.

Relations consist of combinations of variables, literals, constants, or expressions with appropriate relational operators.

## Related information

[Boolean operators](#) on page 236

[Relational operators](#) on page 236

## 13.18 Logical literals

Logical or Boolean literals can have one of two values, {TRUE} or {FALSE}.

### Related information

[Syntax of numeric literals](#) on page 229  
[String literals](#) on page 227

## 13.19 Unary operators

Unary operators return a string, numeric, or logical value. They have higher precedence than other operators and are evaluated first.

A unary operator precedes its operand. Adjacent operators are evaluated from right to left.

The following table lists the unary operators that return strings:

**Table 13-1: Unary operators that return strings**

| Operator     | Usage                 | Description                                                                                                                                                                                                     |
|--------------|-----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| :CHR:        | :CHR:A                | Returns the character with ASCII code A.                                                                                                                                                                        |
| :LOWERCASE:  | :LOWERCASE:string     | Returns the given string, with all uppercase characters converted to lowercase.                                                                                                                                 |
| :REVERSE_CC: | :REVERSE_CC:cond_code | Returns the inverse of the condition code in cond_code, or an error if cond_code does not contain a valid condition code.                                                                                       |
| :STR:        | :STR:A                | In A32 and T32 code, returns an 8-digit hexadecimal string corresponding to a numeric expression, or the string "T" or "F" if used on a logical expression. In A64 code, returns a 16-digit hexadecimal string. |
| :UPPERCASE:  | :UPPERCASE:string     | Returns the given string, with all lowercase characters converted to uppercase.                                                                                                                                 |

The following table lists the unary operators that return numeric values:

**Table 13-2: Unary operators that return numeric or logical values**

| Operator | Usage    | Description                                                                      |
|----------|----------|----------------------------------------------------------------------------------|
| ?        | ?A       | Number of bytes of code generated by line defining symbol A.                     |
| + and -  | +A<br>-A | Unary plus. Unary minus. + and - can act on numeric and PC-relative expressions. |

| Operator      | Usage                  | Description                                                                                                                                  |
|---------------|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
| :BASE:        | :BASE:A                | If A is a PC-relative or register-relative expression, :BASE: returns the number of its register component. :BASE: is most useful in macros. |
| :CC_ENCODING: | :CC_ENCODING:cond_code | Returns the numeric value of the condition code in cond_code, or an error if cond_code does not contain a valid condition code.              |
| :DEF:         | :DEF:A                 | { TRUE } if A is defined, otherwise {FALSE}.                                                                                                 |
| :INDEX:       | :INDEX:A               | If A is a register-relative expression, :INDEX: returns the offset from that base register. :INDEX: is most useful in macros.                |
| :LEN:         | :LEN:A                 | Length of string A.                                                                                                                          |
| :LNOT:        | :LNOT:A                | Logical complement of A.                                                                                                                     |
| :NOT:         | :NOT:A                 | Bitwise complement of A (~ is an alias, for example ~A).                                                                                     |
| :RCONST:      | :RCONST:Rn             | Number of register. In A32/T32 code, 0-15 corresponds to R0-R15. In A64 code, 0-30 corresponds to W0-W30 or X0-X30.                          |

## Related information

[Binary operators](#) on page 233

## 13.20 Binary operators

You write binary operators between the pair of sub-expressions they operate on. They have lower precedence than unary operators.



The order of precedence is not the same as in C.

Note

## Related information

[Multiplicative operators](#) on page 233

[String manipulation operators](#) on page 234

[Shift operators](#) on page 235

[Addition, subtraction, and logical operators](#) on page 235

[Relational operators](#) on page 236

[Boolean operators](#) on page 236

[Difference between operator precedence in assembly language and C](#) on page 237

## 13.21 Multiplicative operators

Multiplicative operators have the highest precedence of all binary operators. They act only on numeric expressions.

The following table shows the multiplicative operators:

**Table 13-3: Multiplicative operators**

| Operator | Alias | Usage   | Explanation |
|----------|-------|---------|-------------|
| *        | -     | A*B     | Multiply    |
| /        | -     | A/B     | Divide      |
| :MOD:    | %     | A:MOD:B | A modulo B  |

You can use the :MOD: operator on PC-relative expressions to ensure code is aligned correctly. These alignment checks have the form *PC-relative:MOD:Constant*. For example:

```

AREA x, CODE
ASSERT ({PC}:MOD:4) == 0
DCB 1
y  DCB 2
ASSERT (y:MOD:4) == 1
ASSERT ({PC}:MOD:4) == 2
END

```

### Related information

[Binary operators](#) on page 233

[Register-relative and PC-relative expressions](#) on page 222

[Syntax of numeric literals](#) on page 229

[Numeric expressions](#) on page 228

## 13.22 String manipulation operators

You can use string manipulation operators to concatenate two strings, or to extract a substring.

The following table shows the string manipulation operators. In cc, both A and B must be strings. In the slicing operators LEFT and RIGHT:

- A must be a string.
- B must be a numeric expression.

**Table 13-4: String manipulation operators**

| Operator | Usage     | Explanation                      |
|----------|-----------|----------------------------------|
| :CC:     | A:CC:B    | B concatenated onto the end of A |
| :LEFT:   | A:LEFT:B  | The left-most B characters of A  |
| :RIGHT:  | A:RIGHT:B | The right-most B characters of A |

## Related information

[String expressions](#) on page 227

[Numeric expressions](#) on page 228

## 13.23 Shift operators

Shift operators act on numeric expressions, by shifting or rotating the first operand by the amount specified by the second.

The following table shows the shift operators:

**Table 13-5: Shift operators**

| Operator | Alias | Usage   | Explanation              |
|----------|-------|---------|--------------------------|
| :ROL:    | -     | A:ROL:B | Rotate A left by B bits  |
| :ROR:    | -     | A:ROR:B | Rotate A right by B bits |
| :SHL:    | <<    | A:SHL:B | Shift A left by B bits   |
| :SHR:    | >>    | A:SHR:B | Shift A right by B bits  |



SHR is a logical shift and does not propagate the sign bit.

Note

## Related information

[Binary operators](#) on page 233

## 13.24 Addition, subtraction, and logical operators

Addition, subtraction, and logical operators act on numeric expressions.

Logical operations are performed bitwise, that is, independently on each bit of the operands to produce the result.

The following table shows the addition, subtraction, and logical operators:

**Table 13-6: Addition, subtraction, and logical operators**

| Operator | Alias | Usage   | Explanation                     |
|----------|-------|---------|---------------------------------|
| +        | -     | A+B     | Add A to B                      |
| -        | -     | A-B     | Subtract B from A               |
| :AND:    | &     | A:AND:B | Bitwise AND of A and B          |
| :EOR:    | ^     | A:EOR:B | Bitwise Exclusive OR of A and B |
| :OR:     | -     | A:OR:B  | Bitwise OR of A and B           |

The use of `|` as an alias for `:OR:` is deprecated.

## Related information

[Binary operators](#) on page 233

## 13.25 Relational operators

Relational operators act on two operands of the same type to produce a logical value.

The operands can be one of:

- Numeric.
- PC-relative.
- Register-relative.
- Strings.

Strings are sorted using ASCII ordering. String `A` is less than string `B` if it is a leading substring of string `B`, or if the left-most character in which the two strings differ is less in string `A` than in string `B`.

Arithmetic values are unsigned, so the value of `0>-1` is `{FALSE}`.

The following table shows the relational operators:

**Table 13-7: Relational operators**

| Operator           | Alias                    | Usage                | Explanation                  |
|--------------------|--------------------------|----------------------|------------------------------|
| <code>=</code>     | <code>==</code>          | <code>A=B</code>     | A equal to B                 |
| <code>&gt;</code>  | <code>-</code>           | <code>A&gt;B</code>  | A greater than B             |
| <code>&gt;=</code> | <code>-</code>           | <code>A&gt;=B</code> | A greater than or equal to B |
| <code>&lt;</code>  | <code>-</code>           | <code>A&lt;B</code>  | A less than B                |
| <code>&lt;=</code> | <code>-</code>           | <code>A&lt;=B</code> | A less than or equal to B    |
| <code>/=</code>    | <code>&lt;&gt; !=</code> | <code>A/=B</code>    | A not equal to B             |

## Related information

[Binary operators](#) on page 233

## 13.26 Boolean operators

Boolean operators perform standard logical operations on their operands. They have the lowest precedence of all operators.

In all three cases, both A and B must be expressions that evaluate to either `{TRUE}` or `{FALSE}`.

The following table shows the Boolean operators:

**Table 13-8: Boolean operators**

| Operator | Alias | Usage    | Explanation                     |
|----------|-------|----------|---------------------------------|
| :LAND:   | &&    | A:LAND:B | Logical AND of A and B          |
| :LEOR:   | -     | A:LEOR:B | Logical Exclusive OR of A and B |
| :LOR:    |       | A:LOR:B  | Logical OR of A and B           |

### Related information

[Binary operators](#) on page 233

## 13.27 Operator precedence

armasm includes an extensive set of operators for use in expressions. It evaluates them using a strict order of precedence.

Many of the operators resemble their counterparts in high-level languages such as C.

armasm evaluates operators in the following order:

1. Expressions in parentheses are evaluated first.
2. Operators are applied in precedence order.
3. Adjacent unary operators are evaluated from right to left.
4. Binary operators of equal precedence are evaluated from left to right.

### Related information

[Unary operators](#) on page 232

[Binary operators](#) on page 233

[Multiplicative operators](#) on page 233

[String manipulation operators](#) on page 234

[Shift operators](#) on page 235

[Addition, subtraction, and logical operators](#) on page 235

[Relational operators](#) on page 236

[Boolean operators](#) on page 236

[Difference between operator precedence in assembly language and C](#) on page 237

## 13.28 Difference between operator precedence in assembly language and C

`armasm` does not follow the same order of precedence when evaluating operators as a C compiler.

For example, `(1 + 2 :SHR: 3)` evaluates as `(1 + (2 :SHR: 3)) = 1` in assembly language. The equivalent expression in C evaluates as `((1 + 2) >> 3) = 0`.

Arm recommends you use brackets to make the precedence explicit.

If your code contains an expression that would parse differently in C, and you are not using the `--unsafe` option, `armasm` gives a warning:

A1466W: Operator precedence means that expression would evaluate differently in C

In the following tables:

- The highest precedence operators are at the top of the list.
- The highest precedence operators are evaluated first.
- Operators of equal precedence are evaluated from left to right.

The following table shows the order of precedence of operators in assembly language, and a comparison with the order in C.

**Table 13-9: Operator precedence in Arm assembly language**

| assembly language precedence                     | equivalent C operators                   |
|--------------------------------------------------|------------------------------------------|
| unary operators                                  | unary operators                          |
| <code>* / :MOD:</code>                           | <code>* / %</code>                       |
| string manipulation                              | n/a                                      |
| <code>:SHL: :SHR: :ROR: :ROL:</code>             | <code>&lt;&lt; &gt;&gt;</code>           |
| <code>+ - :AND: :OR: :EOR:</code>                | <code>+ - &amp;   ^</code>               |
| <code>= &gt; &gt;= &lt; &lt;= /= &lt;&gt;</code> | <code>== &gt; &gt;= &lt; &lt;= !=</code> |
| <code>:LAND: :LOR: :LEOR:</code>                 | <code>&amp;&amp;   </code>               |

The following table shows the order of precedence of operators in C.

**Table 13-10: Operator precedence in C**

| C precedence                           |
|----------------------------------------|
| unary operators                        |
| <code>* / %</code>                     |
| <code>+ - (as binary operators)</code> |
| <code>&lt;&lt; &gt;&gt;</code>         |
| <code>&lt; &lt;= &gt; &gt;=</code>     |
| <code>== !=</code>                     |

## C precedence

&  
^  
|  
&&  
||

### Related information

[Operator precedence](#) on page 237

[Binary operators](#) on page 233

# 14. A32 and T32 Instructions

Describes the A32 and T32 instructions supported in AArch32 state.

## 14.1 A32 and T32 instruction summary

An overview of the instructions available in the A32 and T32 instruction sets.

**Table 14-1: Summary of instructions**

| Mnemonic                | Brief description                                                                    |
|-------------------------|--------------------------------------------------------------------------------------|
| ADC, ADD                | Add with Carry, Add                                                                  |
| ADR                     | Load program or register-relative address (short range)                              |
| ADRL pseudo-instruction | Load program or register-relative address (medium range)                             |
| AND                     | Logical AND                                                                          |
| ASR                     | Arithmetic Shift Right                                                               |
| B                       | Branch                                                                               |
| BFC, BFI                | Bit Field Clear and Insert                                                           |
| BIC                     | Bit Clear                                                                            |
| BKPT                    | Software breakpoint                                                                  |
| BL                      | Branch with Link                                                                     |
| BLX, BLXNS              | Branch with Link, change instruction set, Branch with Link and Exchange (Non-secure) |
| BX, BXNS                | Branch, change instruction set, Branch and Exchange (Non-secure)                     |
| CBZ, CBNZ               | Compare and Branch if {Non}Zero                                                      |
| CDP                     | Coprocessor Data Processing operation                                                |
| CDP2                    | Coprocessor Data Processing operation                                                |
| CLREX                   | Clear Exclusive                                                                      |
| CLZ                     | Count leading zeros                                                                  |
| CMN, CMP                | Compare Negative, Compare                                                            |
| CPS                     | Change Processor State                                                               |
| CPY pseudo-instruction  | Copy                                                                                 |
| CRC32                   | CRC32                                                                                |
| CRC32C                  | CRC32C                                                                               |
| DBG                     | Debug                                                                                |
| DCPS1                   | Debug switch to exception level 1                                                    |
| DCPS2                   | Debug switch to exception level 2                                                    |
| DCPS3                   | Debug switch to exception level 3                                                    |
| DMB, DSB                | Data Memory Barrier, Data Synchronization Barrier                                    |
| DSB                     | Data Synchronization Barrier                                                         |
| EOR                     | Exclusive OR                                                                         |

| Mnemonic                      | Brief description                                                |
|-------------------------------|------------------------------------------------------------------|
| ERET                          | Exception Return                                                 |
| ESB                           | Error Synchronization Barrier                                    |
| HLT                           | Halting breakpoint                                               |
| HVC                           | Hypervisor Call                                                  |
| ISB                           | Instruction Synchronization Barrier                              |
| IT                            | If-Then                                                          |
| LDAEX, LDAEXB, LDAEXH, LDAEXD | Load-Acquire Register Exclusive Word, Byte, Halfword, Doubleword |
| LDC, LDC2                     | Load Coprocessor                                                 |
| LDM                           | Load Multiple registers                                          |
| LDR                           | Load Register with word                                          |
| LDR pseudo-instruction        | Load Register pseudo-instruction                                 |
| LDA, LDAB, LDAH               | Load-Acquire Register Word, Byte, Halfword                       |
| LDRB                          | Load Register with Byte                                          |
| LDRBT                         | Load Register with Byte, user mode                               |
| LDRD                          | Load Registers with two words                                    |
| LDREX, LDREXB, LDREXH, LDREXD | Load Register Exclusive Word, Byte, Halfword, Doubleword         |
| LDRH                          | Load Register with Halfword                                      |
| LDRHT                         | Load Register with Halfword, user mode                           |
| LDRSB                         | Load Register with Signed Byte                                   |
| LDRSBT                        | Load Register with Signed Byte, user mode                        |
| LDRSH                         | Load Register with Signed Halfword                               |
| LDRSHT                        | Load Register with Signed Halfword, user mode                    |
| LDRT                          | Load Register with word, user mode                               |
| LSL, LSR                      | Logical Shift Left, Logical Shift Right                          |
| MCR                           | Move from Register to Coprocessor                                |
| MCRR                          | Move from Registers to Coprocessor                               |
| MLA                           | Multiply Accumulate                                              |
| MLS                           | Multiply and Subtract                                            |
| MOV                           | Move                                                             |
| MOVT                          | Move Top                                                         |
| MOV32 pseudo-instruction      | Move 32-bit immediate to register                                |
| MRC                           | Move from Coprocessor to Register                                |
| MRRC                          | Move from Coprocessor to Registers                               |
| MRS                           | Move from PSR to Register                                        |
| MRS pseudo-instruction        | Move from system Coprocessor to Register                         |
| MSR                           | Move from Register to PSR                                        |
| MSR pseudo-instruction        | Move from Register to system Coprocessor                         |
| MUL                           | Multiply                                                         |
| MVN                           | Move Not                                                         |
| NEG pseudo-instruction        | Negate                                                           |

| Mnemonic                                       | Brief description                                               |
|------------------------------------------------|-----------------------------------------------------------------|
| NOP                                            | No Operation                                                    |
| ORN                                            | Logical OR NOT                                                  |
| ORR                                            | Logical OR                                                      |
| PKHBT, PKHTB                                   | Pack Halfwords                                                  |
| PLD                                            | Preload Data                                                    |
| PLDW                                           | Preload Data with intent to Write                               |
| PLI                                            | Preload Instruction                                             |
| PUSH, POP                                      | PUSH registers to stack, POP registers from stack               |
| QADD, QDADD, QDSUB, QSUB                       | Saturating arithmetic                                           |
| QADD8, QADD16, QASX, QSUB8, QSUB16, QSAX       | Parallel signed saturating arithmetic                           |
| RBIT                                           | Reverse Bits                                                    |
| REV, REV16, REVSH                              | Reverse byte order                                              |
| RFE                                            | Return From Exception                                           |
| ROR                                            | Rotate Right Register                                           |
| RRX                                            | Rotate Right with Extend                                        |
| RSB                                            | Reverse Subtract                                                |
| RSC                                            | Reverse Subtract with Carry                                     |
| SADD8, SADD16, SASX                            | Parallel Signed arithmetic                                      |
| SBC                                            | Subtract with Carry                                             |
| SBFX, UBFX                                     | Signed, Unsigned Bit Field eXtract                              |
| SDIV                                           | Signed Divide                                                   |
| SEL                                            | Select bytes according to APSR GE flags                         |
| SETEND                                         | Set Endianness for memory accesses                              |
| SETPAN                                         | Set Privileged Access Never                                     |
| SEV                                            | Set Event                                                       |
| SEVL                                           | Set Event Locally                                               |
| SG                                             | Secure Gateway                                                  |
| SHADD8, SHADD16, SHASX, SHSUB8, SHSUB16, SHSAX | Parallel Signed Halving arithmetic                              |
| SMC                                            | Secure Monitor Call                                             |
| SMLAx <sub>y</sub>                             | Signed Multiply with Accumulate ( $32 \leq 16 \times 16 + 32$ ) |
| SMLAD                                          | Dual Signed Multiply Accumulate                                 |
| -                                              | ( $32 \leq 32 + 16 \times 16 + 16 \times 16$ )                  |
| SMLAL                                          | Signed Multiply Accumulate ( $64 \leq 64 + 32 \times 32$ )      |
| SMLALx <sub>y</sub>                            | Signed Multiply Accumulate ( $64 \leq 64 + 16 \times 16$ )      |
| SMLALD                                         | Dual Signed Multiply Accumulate Long                            |
| -                                              | ( $64 \leq 64 + 16 \times 16 + 16 \times 16$ )                  |
| SMLAWy                                         | Signed Multiply with Accumulate ( $32 \leq 32 \times 16 + 32$ ) |
| SMLSD                                          | Dual Signed Multiply Subtract Accumulate                        |
| -                                              | ( $32 \leq 32 + 16 \times 16 - 16 \times 16$ )                  |
| SMLS LD                                        | Dual Signed Multiply Subtract Accumulate Long                   |

| Mnemonic                                       | Brief description                                                      |
|------------------------------------------------|------------------------------------------------------------------------|
| -                                              | (64 <= 64 + 16 x 16 - 16 x 16)                                         |
| SMMLA                                          | Signed top word Multiply with Accumulate (32 <= TopWord(32 x 32 + 32)) |
| SMMLS                                          | Signed top word Multiply with Subtract (32 <= TopWord(32 - 32 x 32))   |
| SMMUL                                          | Signed top word Multiply (32 <= TopWord(32 x 32))                      |
| SMUAD, SMUSD                                   | Dual Signed Multiply, and Add or Subtract products                     |
| SMULxy                                         | Signed Multiply (32 <= 16 x 16)                                        |
| SMULL                                          | Signed Multiply (64 <= 32 x 32)                                        |
| SMULWY                                         | Signed Multiply (32 <= 32 x 16)                                        |
| SRS                                            | Store Return State                                                     |
| SSAT                                           | Signed Saturate                                                        |
| SSAT16                                         | Signed Saturate, parallel halfwords                                    |
| SSUB8, SSUB16, SSAX                            | Parallel Signed arithmetic                                             |
| STC                                            | Store Coprocessor                                                      |
| STM                                            | Store Multiple registers                                               |
| STR                                            | Store Register with word                                               |
| STRB                                           | Store Register with Byte                                               |
| STRBT                                          | Store Register with Byte, user mode                                    |
| STRD                                           | Store Registers with two words                                         |
| STREX, STREXB, STREXH, STREXD                  | Store Register Exclusive Word, Byte, Halfword, Doubleword              |
| STRH                                           | Store Register with Halfword                                           |
| STRHT                                          | Store Register with Halfword, user mode                                |
| STL, STLB, STLH                                | Store-Release Word, Byte, Halfword                                     |
| STLEX, STLEXB, STLEXH, STLEXD                  | Store-Release Exclusive Word, Byte, Halfword, Doubleword               |
| STRT                                           | Store Register with word, user mode                                    |
| SUB                                            | Subtract                                                               |
| SUBS pc, lr                                    | Exception return, no stack                                             |
| SVC (formerly SWI)                             | Supervisor Call                                                        |
| SXTAB, SXTAB16, SXTAH                          | Signed extend, with Addition                                           |
| SXTB, SXTH                                     | Signed extend                                                          |
| SXTB16                                         | Signed extend                                                          |
| SYS                                            | Execute System coprocessor instruction                                 |
| TBB, TBH                                       | Table Branch Byte, Halfword                                            |
| TEQ                                            | Test Equivalence                                                       |
| TST                                            | Test                                                                   |
| TT, TTT, TTA, TTAT                             | Test Target (Alternate Domain, Unprivileged)                           |
| UADD8, UADD16, UASX                            | Parallel Unsigned arithmetic                                           |
| UDF                                            | Permanently Undefined                                                  |
| UDIV                                           | Unsigned Divide                                                        |
| UHADD8, UHADD16, UHASX, UHSUB8, UHSUB16, UHSAX | Parallel Unsigned Halving arithmetic                                   |

| Mnemonic                                       | Brief description                                                                                                |
|------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| UMAAL                                          | Unsigned Multiply Accumulate Accumulate Long<br>$(64 \leq 32 + 32 + 32 \times 32)$                               |
| -                                              |                                                                                                                  |
| UMLAL, UMULL                                   | Unsigned Multiply Accumulate, Unsigned Multiply<br>$(64 \leq 32 \times 32 + 64), (64 \leq 32 \times 32)$         |
| -                                              |                                                                                                                  |
| UQADD8, UQADD16, UQASX, UQSUB8, UQSUB16, UQSAX | Parallel Unsigned Saturating arithmetic                                                                          |
| USAD8                                          | Unsigned Sum of Absolute Differences                                                                             |
| USADA8                                         | Accumulate Unsigned Sum of Absolute Differences                                                                  |
| USAT                                           | Unsigned Saturate                                                                                                |
| USAT16                                         | Unsigned Saturate, parallel halfwords                                                                            |
| USUB8, USUB16, USAX                            | Parallel Unsigned arithmetic                                                                                     |
| UXTAB, UXTAB16, UXTAH                          | Unsigned extend with Addition                                                                                    |
| UXTB, UXTH                                     | Unsigned extend                                                                                                  |
| UXTB16                                         | Unsigned extend                                                                                                  |
| V*                                             | See <a href="#">Advanced SIMD Instructions (32-bit)</a> and <a href="#">Floating-point Instructions (32-bit)</a> |
| WFE, WFI, YIELD                                | Wait For Event, Wait For Interrupt, Yield                                                                        |

## 14.2 Instruction width specifiers

The instruction width specifiers `.w` and `.n` control the size of T32 instruction encodings.

In T32 code the `.w` width specifier forces the assembler to generate a 32-bit encoding, even if a 16-bit encoding is available. The `.w` specifier has no effect when assembling to A32 code.

In T32 code the `.n` width specifier forces the assembler to generate a 16-bit encoding. In this case, if the instruction cannot be encoded in 16 bits or if `.n` is used in A32 code, the assembler generates an error.

If you use an instruction width specifier, you must place it immediately after the instruction mnemonic and any condition code, for example:

```
BCS.W    label    ; forces 32-bit instruction even for a short branch
B.N     label    ; faults if label out of range for 16-bit instruction
```

## 14.3 Flexible second operand (Operand2)

Many A32 and T32 general data processing instructions have a flexible second operand.

This is shown as `Operand2` in the descriptions of the syntax of each instruction.

`Operand2` can be a:

- Constant.
- Register with optional shift.

## Related information

[Syntax of Operand2 as a constant](#) on page 245

[Syntax of Operand2 as a register with optional shift](#) on page 246

[Shift operations](#) on page 247

## 14.4 Syntax of Operand2 as a constant

An Operand2 constant in an instruction has a limited range of values.

### Syntax

`# constant`

where *constant* is an expression evaluating to a numeric value.

### Usage

In A32 instructions, *constant* can have any value that can be produced by rotating an 8-bit value right by any even number of bits within a 32-bit word.

In T32 instructions, *constant* can be:

- Any constant that can be produced by shifting an 8-bit value left by any number of bits within a 32-bit word.
- Any constant of the form `0x00xy00xy`.
- Any constant of the form `0xxy00xy00`.
- Any constant of the form `0xxyxxxxxy`.



In these constants, x and y are hexadecimal digits.

Note

In addition, in a small number of instructions, *constant* can take a wider range of values. These are listed in the individual instruction descriptions.

When an Operand2 constant is used with the instructions `MOVS`, `MVNS`, `ANDS`, `ORRS`, `ORNS`, `EORS`, `BICS`, `TEQ` or `TST`, the carry flag is updated to bit[31] of the constant, if the constant is greater than 255 and can be produced by shifting an 8-bit value. These instructions do not affect the carry flag if Operand2 is any other constant.

## Instruction substitution

If the value of an Operand2 constant is not available, but its logical inverse or negation is available, then the assembler produces an equivalent instruction and inverts or negates the constant.

For example, an assembler might assemble the instruction `CMP Rd, #0xFFFFFFFF` as the equivalent instruction `CMN Rd, #0x2`.

Be aware of this when comparing disassembly listings with source code.

You can use the `--diag_warning 1645` assembler command line option to check when an instruction substitution occurs.

## Related information

[Flexible second operand \(Operand2\) on page 244](#)

[Syntax of Operand2 as a register with optional shift on page 246](#)

[Shift operations on page 247](#)

## 14.5 Syntax of Operand2 as a register with optional shift

When you use an Operand2 register in an instruction, you can optionally also specify a shift value.

### Syntax

`Rm{, shift}`

where:

#### **Rm**

is the register holding the data for the second operand.

#### **shift**

is an optional constant or register-controlled shift to be applied to `Rm`. It can be one of:

##### **ASR #n**

arithmetic shift right  $n$  bits,  $1 \leq n \leq 32$ .

##### **LSL #n**

logical shift left  $n$  bits,  $1 \leq n \leq 31$ .

##### **LSR #n**

logical shift right  $n$  bits,  $1 \leq n \leq 32$ .

##### **ROR #n**

rotate right  $n$  bits,  $1 \leq n \leq 31$ .

##### **RRX**

rotate right one bit, with extend.

##### **type Rs**

register-controlled shift is available in Arm® code only, where:

***type***

is one of `ASR`, `LSL`, `LSR`, `ROR`.

***Rs***

is a register supplying the shift amount, and only the least significant byte is used.

-

if omitted, no shift occurs, equivalent to `LSL #0`.

**Usage**

If you omit the shift, or specify `LSL #0`, the instruction uses the value in *Rm*.

If you specify a shift, the shift is applied to the value in *Rm*, and the resulting 32-bit value is used by the instruction. However, the contents of the register *Rm* remain unchanged. Specifying a register with shift also updates the carry flag when used with certain instructions.

**Related information**

[Flexible second operand \(Operand2\) on page 244](#)

[Syntax of Operand2 as a constant on page 245](#)

[Shift operations on page 247](#)

## 14.6 Shift operations

Register shift operations move the bits in a register left or right by a specified number of bits, called the shift length.

Register shift can be performed:

- Directly by the instructions `ASR`, `LSR`, `LSL`, `ROR`, and `RRX`, and the result is written to a destination register.
- During the calculation of *operand2* by the instructions that specify the second operand as a register with shift. The result is used by the instruction.

The permitted shift lengths depend on the shift type and the instruction, see the individual instruction description or the flexible second operand description. If the shift length is 0, no shift occurs. Register shift operations update the carry flag except when the specified shift length is 0.

**Arithmetic shift right (ASR)**

Arithmetic shift right by *n* bits moves the left-hand 32-*n* bits of a register to the right by *n* places, into the right-hand 32-*n* bits of the result. It copies the original bit[31] of the register into the left-hand *n* bits of the result.

You can use the `ASR #n` operation to divide the value in the register *Rm* by  $2^{\{n\}}$ , with the result being rounded towards negative-infinity.

When the instruction is `ASRS` or when `ASR #n` is used in *operand2* with the instructions `MOVS`, `MVNS`, `ANDS`, `ORRS`, `ORNS`, `EORS`, `BICS`, `TEQ` or `TST`, the carry flag is updated to the last bit shifted out, bit[ $n-1$ ], of the register  $Rm$ .



- If  $n$  is 32 or more, then all the bits in the result are set to the value of bit[31] of  $Rm$ .
- If  $n$  is 32 or more and the carry flag is updated, it is updated to the value of bit[31] of  $Rm$ .

**Figure 14-1: ASR #3**



### Logical shift right (LSR)

Logical shift right by  $n$  bits moves the left-hand 32- $n$  bits of a register to the right by  $n$  places, into the right-hand 32- $n$  bits of the result. It sets the left-hand  $n$  bits of the result to 0.

You can use the `LSR #n` operation to divide the value in the register  $Rm$  by  $2^{\{n\}}$ , if the value is regarded as an unsigned integer.

When the instruction is `LSRS` or when `LSR #n` is used in *operand2* with the instructions `MOVS`, `MVNS`, `ANDS`, `ORRS`, `ORNS`, `EORS`, `BICS`, `TEQ` or `TST`, the carry flag is updated to the last bit shifted out, bit[ $n-1$ ], of the register  $Rm$ .



- If  $\{n\}$  is 32 or more, then all the bits in the result are cleared to 0.
- If  $\{n\}$  is 33 or more and the carry flag is updated, it is updated to 0.

**Figure 14-2: LSR #3**

### Logical shift left (LSL)

Logical shift left by  $n$  bits moves the right-hand  $32-n$  bits of a register to the left by  $n$  places, into the left-hand  $32-n$  bits of the result. It sets the right-hand  $n$  bits of the result to 0.

You can use the `LSL #n` operation to multiply the value in the register  $Rm$  by  $2^{\{n\}}$ , if the value is regarded as an unsigned integer or a two's complement signed integer. Overflow can occur without warning.

When the instruction is `LSLS` or when `LSL #n`, with non-zero  $n$ , is used in *operand2* with the instructions `MOVS`, `MVNS`, `ANDS`, `ORRS`, `ORNS`, `EORS`, `BICS`, `TEQ` or `TST`, the carry flag is updated to the last bit shifted out, bit[ $32-n$ ], of the register  $Rm$ . These instructions do not affect the carry flag when used with `LSL #0`.



- If  $n$  is 32 or more, then all the bits in the result are cleared to 0.
- If  $n$  is 33 or more and the carry flag is updated, it is updated to 0.

**Figure 14-3: LSL #3**

## Rotate right (ROR)

Rotate right by  $n$  bits moves the left-hand  $32-n$  bits of a register to the right by  $n$  places, into the right-hand  $32-n$  bits of the result. It also moves the right-hand  $n$  bits of the register into the left-hand  $n$  bits of the result.

When the instruction is `RORS` or when `ROR #n` is used in *operand2* with the instructions `MOVS`, `MVNS`, `ANDS`, `ORRS`, `ORNS`, `EORS`, `BICS`, `TEQ` or `TST`, the carry flag is updated to the last bit rotation, bit[ $n-1$ ], of the register  $Rm$ .



- If  $n$  is 32, then the value of the result is same as the value in  $Rm$ , and if the carry flag is updated, it is updated to bit[31] of  $Rm$ .
- ROR with shift length,  $n$ , more than 32 is the same as ROR with shift length  $n - 32$ .

**Figure 14-4: ROR #3**



## Rotate right with extend (RRX)

Rotate right with extend moves the bits of a register to the right by one bit. It copies the carry flag into bit[31] of the result.

When the instruction is `RRXS` or when `RRX` is used in *operand2* with the instructions `MOVS`, `MVNS`, `ANDS`, `ORRS`, `ORNS`, `EORS`, `BICS`, `TEQ` or `TST`, the carry flag is updated to bit[0] of the register  $Rm$ .

**Figure 14-5: RRX**



## Related information

[Flexible second operand \(Operand2\) on page 244](#)

[Syntax of Operand2 as a constant on page 245](#)

[Syntax of Operand2 as a register with optional shift on page 246](#)

## 14.7 Saturating instructions

Some A32 and T32 instructions perform saturating arithmetic.

The saturating instructions are:

- QADD.
- QDADD.
- QDSUB.
- QSUB.
- SSAT.
- USAT.

Some of the parallel instructions are also saturating.

### Saturating arithmetic

Saturation means that, for some value of  $2^{\{n\}}$  that depends on the instruction:

- For a signed saturating operation, if the full result would be less than  $-2^{\{n\}}$ , the result returned is  $-2^{\{n\}}$ .
- For an unsigned saturating operation, if the full result would be negative, the result returned is zero.
- If the full result would be greater than  $2^{\{n\}}-1$ , the result returned is  $2^{\{n\}}-1$ .

When any of these occurs, it is called saturation. Some instructions set the Q flag when saturation occurs.



**Note**

Saturating instructions do not clear the Q flag when saturation does not occur. To clear the Q flag, use an `MSR` instruction.

The Q flag can also be set by two other instructions, but these instructions do not saturate.

### Related information

[QADD \(A32\)](#) on page 360

[QSUB \(A32\)](#) on page 367

[QDADD \(A32\)](#) on page 363

[QDSUB \(A32\)](#) on page 365

[SMLAx<sub>y</sub>](#) on page 400

[SMLAW<sub>y</sub>](#) on page 406

[SMULx<sub>y</sub>](#) on page 414

[SMULW<sub>y</sub>](#) on page 416

[SSAT \(A32\)](#) on page 420

[USAT \(A32\)](#) on page 488

[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

[Saturating Advanced SIMD instructions](#) on page 170

## 14.8 ADC (A32)

Add with Carry.

### Syntax

`ADC {S} {cond} {Rd, } Rn, Operand2`

where:

**S**

is an optional suffix. If `s` is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Operand2**

is a flexible second operand.

### Usage

The `ADC` (Add with Carry) instruction adds the values in `Rn` and `Operand2`, together with the carry flag.

You can use `ADC` to synthesize multiword arithmetic.

In certain circumstances, the assembler can substitute one instruction for another. Be aware of this when reading disassembly listings.

### Use of PC and SP in T32 instructions

You cannot use PC (`r15`) for `Rd`, or any operand with the `ADC` command.

You cannot use SP (`r13`) for `Rd`, or any operand with the `ADC` command.

### Use of PC and SP in A32 instructions

You cannot use PC for `Rn` or any operand in any data processing instruction that has a register-controlled shift.

Use of PC for any operand, in instructions without register-controlled shift, is deprecated.

If you use PC (`r15`) as `Rn` or `Operand2`, the value used is the address of the instruction plus 8.

If you use PC as `Rn`:

- Execution branches to the address corresponding to the result.
- If you use the `s` suffix, see the `SUBS pc, lr` instruction.

Use of SP with the `ADC` A32 instruction is deprecated.

### Condition flags

If `s` is specified, the `ADC` instruction updates the N, Z, C and V flags according to the result.

### 16-bit instructions

The following forms of this instruction are available in T32 code, and are 16-bit instructions:

#### `ADCS Rd, Rd, Rm`

`Rd` and `Rm` must both be Lo registers. This form can only be used outside an IT block.

#### `ADC{cond} Rd, Rd, Rm`

`Rd` and `Rm` must both be Lo registers. This form can only be used inside an IT block.

### Multiword arithmetic examples

These two instructions add a 64-bit integer contained in `r2` and `r3` to another 64-bit integer contained in `r0` and `r1`, and place the result in `r4` and `r5`.

```
ADDS    r4, r0, r2      ; adding the least significant words
ADC     r5, r1, r3      ; adding the most significant words
```

### Related information

[Flexible second operand \(Operand2\)](#) on page 244

[Condition code suffixes](#) on page 138

## 14.9 ADD (A32)

Add without Carry.

### Syntax

`ADD{S}{cond} {Rd}, Rn, Operand2`

`ADD{cond} {Rd}, Rn, #imm12 ; T32, 32-bit encoding only`

where:

**s**

is an optional suffix. If `s` is specified, the condition flags are updated on the result of the operation.

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Rn***

is the register holding the first operand.

***Operand2***

is a flexible second operand.

***imm12***

is any value in the range 0-4095.

## Operation

The `ADD` instruction adds the values in *Rn* and *operand2* or *imm12*.

In certain circumstances, the assembler can substitute one instruction for another. Be aware of this when reading disassembly listings.

## Use of PC and SP in T32 instructions

Generally, you cannot use PC (`R15`) for *Rd*, or any operand.

The exceptions are:

- You can use PC for *Rn* in 32-bit encodings of T32 `ADD` instructions, with a constant *operand2* value in the range 0-4095, and no `s` suffix. These instructions are useful for generating PC-relative addresses. Bit[1] of the PC value reads as 0 in this case, so that the base address for the calculation is always word-aligned.
- You can use PC in 16-bit encodings of T32 `ADD{cond} Rd, Rd, Rm` instructions, where both registers cannot be PC. However, the following 16-bit T32 instructions are deprecated:
  - `ADD{cond} PC, SP, PC`.
  - `ADD{cond} SP, SP, PC`.

Generally, you cannot use SP (`R13`) for *Rd*, or any operand. Except that:

- You can use SP for *Rn* in `ADD` instructions.
- `ADD{cond} SP, SP, SP` is permitted but is deprecated in Arm®v6T2 and above.
- `ADD{S}{cond} SP, SP, Rm{,shift}` and `SUB{S}{cond} SP, SP, Rm{,shift}` are permitted if `shift` is omitted or `LSL #1`, `LSL #2`, or `LSL #3`.

## Use of PC and SP in A32 instructions

You cannot use PC for *Rd* or any operand in any data processing instruction that has a register-controlled shift.

In `ADD` instructions without register-controlled shift, use of PC is deprecated except for the following cases:

- Use of PC for  $Rd$  in instructions that do not add SP to a register.
- Use of PC for  $Rn$  and use of PC for  $Rm$  in instructions that add two registers other than SP.
- Use of PC for  $Rn$  in the instruction `ADD{cond} Rd, Rn, #Constant`.

If you use PC ( $r15$ ) as  $Rn$  or  $Rn$ , the value used is the address of the instruction plus 8.

If you use PC as  $Rn$ :

- Execution branches to the address corresponding to the result.
- If you use the s suffix, see the `SUBS pc, lr` instruction.

You can use SP for  $Rn$  in `ADD` instructions, however, `ADDS PC, SP, #Constant` is deprecated.

You can use SP in `ADD` (register) if  $Rn$  is SP and *shift* is omitted or `LSL #1`, `LSL #2`, or `LSL #3`.

Other uses of SP in these A32 instructions are deprecated.

## Condition flags

If s is specified, these instructions update the N, Z, C and V flags according to the result.

## 16-bit instructions

The following forms of these instructions are available in T32 code, and are 16-bit instructions:

### `ADDS Rd, Rn, #imm`

*imm* range 0-7.  $Rd$  and  $Rn$  must both be Lo registers. This form can only be used outside an IT block.

### `ADD{cond} Rd, Rn, #imm`

*imm* range 0-7.  $Rd$  and  $Rn$  must both be Lo registers. This form can only be used inside an IT block.

### `ADDS Rd, Rn, Rm`

$Rd$ ,  $Rn$  and  $Rm$  must all be Lo registers. This form can only be used outside an IT block.

### `ADD{cond} Rd, Rn, Rm`

$Rd$ ,  $Rn$  and  $Rm$  must all be Lo registers. This form can only be used inside an IT block.

### `ADDS Rd, Rd, #imm`

*imm* range 0-255.  $Rd$  must be a Lo register. This form can only be used outside an IT block.

### `ADD{cond} Rd, Rd, #imm`

*imm* range 0-255.  $Rd$  must be a Lo register. This form can only be used inside an IT block.

### `ADD SP, SP, #imm`

*imm* range 0-508, word aligned.

### `ADD Rd, SP, #imm`

*imm* range 0-1020, word aligned.  $Rd$  must be a Lo register.

**ADD Rd, pc, #imm**

*imm* range 0-1020, word aligned. *Rd* must be a Lo register. Bits[1:0] of the PC are read as 0 in this instruction.

**Example**

```
ADD      r2, r1, r3
```

**Multiword arithmetic example**

These two instructions add a 64-bit integer contained in *r2* and *r3* to another 64-bit integer contained in *r0* and *r1*, and place the result in *r4* and *r5*.

```
ADDS    r4, r0, r2      ; adding the least significant words
ADC     r5, r1, r3      ; adding the most significant words
```

**Related information**

[Flexible second operand \(Operand2\)](#) on page 244

[Condition code suffixes](#) on page 138

[SUBS pc, lr \(A32\)](#) on page 444

## 14.10 ADR (PC-relative) (A32)

Generate a PC-relative address in the destination register, for a label in the current area.

**Syntax**

```
ADR{cond}{.W} Rd, label
```

where:

**cond**

is an optional condition code.

**.W**

is an optional instruction width specifier.

**Rd**

is the destination register to load.

**label**

is a PC-relative expression.

*label* must be within a limited distance of the current instruction.

**Usage**

ADR produces position-independent code, because the assembler generates an instruction that adds or subtracts a value to the PC.

Use the `ADRL` pseudo-instruction to assemble a wider range of effective addresses.

`label` must evaluate to an address in the same assembler area as the `ADR` instruction.

If you use `ADR` to generate a target for a `BX` or `BLX` instruction, it is your responsibility to set the T32 bit (bit 0) of the address if the target contains T32 instructions.

### Offset range and architectures

The assembler calculates the offset from the PC for you. The assembler generates an error if `label` is out of range.

The following table shows the possible offsets between the label and the current instruction:

**Table 14-2: PC-relative offsets**

| Instruction                                         | Offset range                                           |
|-----------------------------------------------------|--------------------------------------------------------|
| A32 <code>ADR</code>                                | See <a href="#">Syntax of Operand2 as a constant</a> . |
| T32 <code>ADR</code> , 32-bit encoding              | $\pm 4095$                                             |
| T32 <code>ADR</code> , 16-bit encoding <sup>1</sup> | 0-1020 <sup>2</sup>                                    |

### ADR in T32

You can use the `.w` width specifier to force `ADR` to generate a 32-bit instruction in T32 code. `ADR` with `.w` always generates a 32-bit instruction, even if the address can be generated in a 16-bit instruction.

For forward references, `ADR` without `.w` always generates a 16-bit instruction in T32 code, even if that results in failure for an address that could be generated in a 32-bit T32 `ADD` instruction.

### Restrictions

In T32 code, `Rd` cannot be `PC` or `SP`.

In A32 code, `Rd` can be `PC` or `SP` but use of `SP` is deprecated.

### Related information

[Load addresses to a register using ADR](#) on page 112

[Syntax of Operand2 as a constant](#) on page 245

[Register-relative and PC-relative expressions](#) on page 222

[ADRL pseudo-instruction \(A32\)](#) on page 259

[AREA](#) on page 1428

[Condition code suffixes](#) on page 138

<sup>1</sup> Rd must be in the range R0-R7.

<sup>2</sup> Must be a multiple of 4.

## 14.11 ADR (register-relative) (A32)

Generate a register-relative address in the destination register, for a label defined in a storage map.

### Syntax

`ADR{cond}{.W} Rd, label`

where:

**cond**

is an optional condition code.

**.W**

is an optional instruction width specifier.

**Rd**

is the destination register to load.

**label**

is a symbol defined by the `FIELD` directive. `label` specifies an offset from the base register which is defined using the `MAP` directive.

`label` must be within a limited distance from the base register.

### Usage

`ADR` generates code to easily access named fields inside a storage map.

Use the `ADRL` pseudo-instruction to assemble a wider range of effective addresses.

### Restrictions

In T32 code:

- `Rd` cannot be `PC`.
- `Rd` can be `SP` only if the base register is `SP`.

### Offset range and architectures

The assembler calculates the offset from the base register for you. The assembler generates an error if `label` is out of range.

The following table shows the possible offsets between the label and the current instruction:

**Table 14-3: Register-relative offsets**

| Instruction                                                | Offset range                                         |
|------------------------------------------------------------|------------------------------------------------------|
| A32 ADR                                                    | See <a href="#">Syntax of Operand2 as a constant</a> |
| T32 ADR, 32-bit encoding                                   | $\pm 4095$                                           |
| T32 ADR, 16-bit encoding, base register is SP <sup>3</sup> | 0-1020 <sup>4</sup>                                  |

<sup>3</sup> `Rd` must be in the range R0-R7 or SP. If `Rd` is SP, the offset range is -508 to 508 and must be a multiple of 4

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

## ADR in T32

You can use the `.w` width specifier to force `ADR` to generate a 32-bit instruction in T32 code. `ADR` with `.w` always generates a 32-bit instruction, even if the address can be generated in a 16-bit instruction.

For forward references, `ADR` without `.w`, with base register SP, always generates a 16-bit instruction in T32 code, even if that results in failure for an address that could be generated in a 32-bit T32 `ADD` instruction.

### Related information

[Register-relative and PC-relative expressions](#) on page 222

[Syntax of Operand2 as a constant](#) on page 245

[ADRL pseudo-instruction \(A32\)](#) on page 259

[MAP](#) on page 1474

[FIELD](#) on page 1451

[Condition code suffixes](#) on page 138

## 14.12 ADRL pseudo-instruction (A32)

Load a PC-relative or register-relative address into a register.

### Syntax

`ADRL{cond} Rd, label`

where:

**cond**

is an optional condition code.

**Rd**

is the register to load.

**label**

is a PC-relative or register-relative expression.

### Usage

`ADRL` always assembles to two 32-bit instructions. Even if the address can be reached in a single instruction, a second, redundant instruction is produced.

If the assembler cannot construct the address in two instructions, it generates an error message and the assembly fails. You can use the `LDR` pseudo-instruction for loading a wider range of addresses.

`ADRL` is similar to the `ADR` instruction, except `ADRL` can load a wider range of addresses because it generates two data processing instructions.

---

<sup>4</sup> Must be a multiple of 4.

`ADRL` produces position-independent code, because the address is PC-relative or register-relative.

If `label` is PC-relative, it must evaluate to an address in the same assembler area as the `ADRL` pseudo-instruction.

If you use `ADRL` to generate a target for a `BX` or `B1X` instruction, it is your responsibility to set the T32 bit (bit 0) of the address if the target contains T32 instructions.

## Architectures and range

The available range depends on the instruction set in use:

### A32

The range of the instruction is any value that can be generated by two `ADD` or two `SUB` instructions. That is, any value that can be produced by the addition of two values, each of which is 8 bits rotated right by any even number of bits within a 32-bit word.

### T32, 32-bit encoding

±1MB bytes to a byte, halfword, or word-aligned address.

### T32, 16-bit encoding

`ADRL` is not available.

The given range is relative to a point four bytes (in T32 code) or two words (in A32 code) after the address of the current instruction.



`ADRL` is not available in Arm®v6-M and Armv8-M.baseline.

Note

## Related information

[Register-relative and PC-relative expressions](#) on page 222

[Load immediate values](#) on page 106

[Syntax of Operand2 as a constant](#) on page 245

[LDR pseudo-instruction \(A32\)](#) on page 319

[AREA](#) on page 1428

[ADD \(A32\)](#) on page 253

[Condition code suffixes](#) on page 138

[A-Profile Architectures](#)

## 14.13 AND (A32)

Logical AND.

### Syntax

`AND{S}{cond} Rd, Rn, Operand2`

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

Page 260 of 1493

where:

**s**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Operand2**

is a flexible second operand.

## Operation

The `AND` instruction performs bitwise AND operations on the values in `Rn` and `operand2`.

In certain circumstances, the assembler can substitute `BIC` for `AND`, or `AND` for `BIC`. Be aware of this when reading disassembly listings.

## Use of PC in T32 instructions

You cannot use PC (`R15`) for `Rd` or any operand with the `AND` instruction.

## Use of PC and SP in A32 instructions

You can use PC and SP with the `AND` A32 instruction but this is deprecated.

If you use PC as `Rn`, the value used is the address of the instruction plus 8.

If you use PC as `Rd`:

- Execution branches to the address corresponding to the result.
- If you use the s suffix, see the `SUBS pc, lr` instruction.

You cannot use PC for any operand in any data processing instruction that has a register-controlled shift.

## Condition flags

If s is specified, the `AND` instruction:

- Updates the N and Z flags according to the result.
- Can update the C flag during the calculation of `operand2`.
- Does not affect the V flag.

## 16-bit instructions

The following forms of this instruction are available in T32 code, and are 16-bit instructions:

**ANDS Rd, Rd, Rm**

*Rd* and *Rm* must both be Lo registers. This form can only be used outside an IT block.

**AND{cond} Rd, Rd, Rm**

*Rd* and *Rm* must both be Lo registers. This form can only be used inside an IT block.

It does not matter if you specify AND{S} *Rd, Rm, Rd*. The instruction is the same.

**Examples**

```
AND      r9, r2, #0xFF00
ANDS    r9, r8, #0x19
```

**Related information**

[Flexible second operand \(Operand2\)](#) on page 244

[SUBS pc, lr \(A32\)](#) on page 444

[Condition code suffixes](#) on page 138

## 14.14 ASR (A32)

Arithmetic Shift Right. This instruction is a preferred synonym for MOV instructions with shifted register operands.

**Syntax**

ASR{S}{cond} Rd, Rm, Rs

ASR{S}{cond} Rd, Rm, #sh

where:

**s**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

**Rd**

is the destination register.

**Rm**

is the register holding the first operand. This operand is shifted right.

**Rs**

is a register holding a shift value to apply to the value in *Rm*. Only the least significant byte is used.

**sh**

is a constant shift. The range of values permitted is 1-32.

## Operation

`ASR` provides the signed value of the contents of a register divided by a power of two. It copies the sign bit into vacated bit positions on the left.

## Restrictions in T32 code

T32 instructions must not use PC or SP.

## Use of SP and PC in A32 instructions

You can use SP in the `ASR` A32 instruction but this is deprecated.

You cannot use PC in instructions with the `ASR{S}{cond} Rd, Rm, Rs` syntax. You can use PC for `Rd` and `Rm` in the other syntax, but this is deprecated.

If you use PC as `Rm`, the value used is the address of the instruction plus 8.

If you use PC as `Rd`:

- Execution branches to the address corresponding to the result.
- If you use the s suffix, the SPSR of the current mode is copied to the CPSR. You can use this to return from exceptions.



The A32 instruction `ASRS{cond} pc, Rm, #sh` always disassembles to the preferred form `MOVS{cond} pc, Rm{, shift}`.



Do not use the s suffix when using PC as `Rd` in User mode or System mode. The assembler cannot warn you about this because it has no information about what the processor mode is likely to be at execution time.

You cannot use PC for `Rd` or any operand in the `ASR` instruction if it has a register-controlled shift.

## Condition flags

If s is specified, the `ASR` instruction updates the N and Z flags according to the result.

The C flag is unaffected if the shift value is 0. Otherwise, the C flag is updated to the last bit shifted out.

## 16-bit instructions

The following forms of these instructions are available in T32 code, and are 16-bit instructions:

### **ASRS Rd, Rm, #sh**

`Rd` and `Rm` must both be Lo registers. This form can only be used outside an IT block.

### **ASR{cond} Rd, Rm, #sh**

`Rd` and `Rm` must both be Lo registers. This form can only be used inside an IT block.

**ASRS Rd, Rd, Rs**

*Rd* and *Rs* must both be Lo registers. This form can only be used outside an IT block.

**ASR{cond} Rd, Rd, Rs**

*Rd* and *Rs* must both be Lo registers. This form can only be used inside an IT block.

**Architectures**

This instruction is available in A32 and T32.

**Example**

```
ASR      r7, r8, r9
```

**Related information**

[MOV \(A32\)](#) on page 333

[Condition code suffixes](#) on page 138

## 14.15 B (A32)

Branch.

**Syntax**

```
B{cond} {.W} label
```

where:

**cond**

is an optional condition code.

**.W**

is an optional instruction width specifier to force the use of a 32-bit **B** instruction in T32.

**label**

is a PC-relative expression.

**Operation**

The **B** instruction causes a branch to *label*.

**Instruction availability and branch ranges**

The following table shows the branch ranges that are available in A32 and T32 code. Instructions that are not shown in this table are not available.

**Table 14-4: B instruction availability and range**

| Instruction | A32   | T32, 16-bit encoding | T32, 32-bit encoding |
|-------------|-------|----------------------|----------------------|
| B label     | ±32MB | ±2KB                 | ±16MB <sup>5</sup>   |

<sup>5</sup> Use **.W** to instruct the assembler to use this 32-bit instruction.

| Instruction          | A32   | T32, 16-bit encoding | T32, 32-bit encoding |
|----------------------|-------|----------------------|----------------------|
| B{cond} <i>label</i> | ±32MB | -252 to +258         | ±1MB <sup>5</sup>    |

## Extending branch ranges

Machine-level B instructions have restricted ranges from the address of the current instruction. However, you can use these instructions even if *label* is out of range. Often you do not know where the linker places *label*. When necessary, the linker adds code to enable longer branches. The added code is called a veneer.

## B in T32

You can use the .w width specifier to force B to generate a 32-bit instruction in T32 code.

B.w always generates a 32-bit instruction, even if the target could be reached using a 16-bit instruction.

For forward references, B without .w always generates a 16-bit instruction in T32 code, even if that results in failure for a target that could be reached using a 32-bit T32 instruction.

## Condition flags

The B instruction does not change the flags.

## Architectures

See the earlier table for details of availability of the B instruction.

## Example

```
B      loopA
```

## Related information

[Register-relative and PC-relative expressions](#) on page 222

[Condition code suffixes](#) on page 138

[Information about image structure and generation](#)

## 14.16 BFC (A32)

Bit Field Clear.

### Syntax

```
BFC{cond} Rd, #lsb, #width
```

where:

**cond**

is an optional condition code.

***Rd***

is the destination register.

***lsb***

is the least significant bit that is to be cleared.

***width***

is the number of bits to be cleared. *width* must not be 0, and (*width* + *lsb*) must be less than or equal to 32.

## Operation

Clears adjacent bits in a register. *width* bits in *Rd* are cleared, starting at *lsb*. Other bits in *Rd* are unchanged.

## Register restrictions

You cannot use PC for any register.

You can use SP in the `BFC` A32 instruction but this is deprecated. You cannot use SP in the `BFC` T32 instruction.

## Condition flags

The `BFC` instruction does not change the flags.

## Architectures

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.17 BFI (A32)

Bit Field Insert.

### Syntax

`BFI{cond} Rd, Rn, #lsb, #width`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register.

**Rn**

is the source register.

**lsb**

is the least significant bit that is to be copied.

**width**

is the number of bits to be copied. *width* must not be 0, and (*width* + *lsb*) must be less than or equal to 32.

## Operation

Inserts adjacent bits from one register into another. *width* bits in *Rd*, starting at *lsb*, are replaced by *width* bits from *Rn*, starting at bit[0]. Other bits in *Rd* are unchanged.

## Register restrictions

You cannot use PC for any register.

You can use SP in the `BFI` A32 instruction but this is deprecated. You cannot use SP in the `BFI` T32 instruction.

## Condition flags

The `BFI` instruction does not change the flags.

## Architectures

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.18 BIC (A32)

Bit Clear.

### Syntax

`BIC{S}{cond} Rd, Rn, Operand2`

where:

**s**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Operand2**

is a flexible second operand.

## Operation

The **BIC** (Bit Clear) instruction performs an AND operation on the bits in *Rn* with the complements of the corresponding bits in the value of *operand2*.

In certain circumstances, the assembler can substitute **BIC** for **AND**, or **AND** for **BIC**. Be aware of this when reading disassembly listings.

## Use of PC in T32 instructions

You cannot use PC (*r15*) for *Rd* or any operand in a **BIC** instruction.

## Use of PC and SP in A32 instructions

You can use PC and SP with the **BIC** instruction but they are deprecated.

If you use PC as *Rn*, the value used is the address of the instruction plus 8.

If you use PC as *Rd*:

- Execution branches to the address corresponding to the result.
- If you use the s suffix, see the **SUBS pc, lr** instruction.

You cannot use PC for any operand in any data processing instruction that has a register-controlled shift.

## Condition flags

If s is specified, the **BIC** instruction:

- Updates the N and Z flags according to the result.
- Can update the C flag during the calculation of *operand2*.
- Does not affect the V flag.

## 16-bit instructions

The following forms of the **BIC** instruction are available in T32 code, and are 16-bit instructions:

### **BICS Rd, Rd, Rm**

*Rd* and *Rm* must both be Lo registers. This form can only be used outside an IT block.

### **BIC{cond} Rd, Rd, Rm**

*Rd* and *Rm* must both be Lo registers. This form can only be used inside an IT block.

## Example

```
BIC      r0, r1, #0xab
```

## Related information

[Flexible second operand \(Operand2\) on page 244](#)

[SUBS pc, lr \(A32\) on page 444](#)

[Condition code suffixes on page 138](#)

## 14.19 BKPT (A32)

Breakpoint.

### Syntax

```
BKPT #imm
```

where:

*imm*

is an expression evaluating to an integer in the range:

- 0-65535 (a 16-bit value) in an A32 instruction.
- 0-255 (an 8-bit value) in a 16-bit T32 instruction.

### Usage

The `BKPT` instruction causes the processor to enter Debug state. Debug tools can use this to investigate system state when the instruction at a particular address is reached.

In both A32 state and T32 state, *imm* is ignored by the Arm® hardware. However, a debugger can use it to store extra information about the breakpoint.

`BKPT` is an unconditional instruction. It must not have a condition code in A32 code. In T32 code, the `BKPT` instruction does not require a condition code suffix because `BKPT` always executes irrespective of its condition code suffix.

### Architectures

This instruction is available in A32 and T32.

In T32, it is only available as a 16-bit instruction.

## 14.20 BL (A32)

Branch with Link.

### Syntax

`BL{cond}{.W} label`

where:

**cond**

is an optional condition code. `cond` is not available on all forms of this instruction.

**.W**

is an optional instruction width specifier to force the use of a 32-bit `BL` instruction in T32.

**label**

is a PC-relative expression.

### Operation

The `BL` instruction causes a branch to `label`, and copies the address of the next instruction into LR (`R14`, the link register).

### Instruction availability and branch ranges

The following table shows the `BL` instructions that are available in A32 and T32 state. Instructions that are not shown in this table are not available.

**Table 14-5: BL instruction availability and range**

| Instruction                  | A32   | T32, 16-bit encoding | T32, 32-bit encoding |
|------------------------------|-------|----------------------|----------------------|
| <code>BL label</code>        | ±32MB | ±4MB <sup>6</sup>    | ±16MB                |
| <code>BL{cond}{label}</code> | ±32MB | -                    | -                    |

### Extending branch ranges

Machine-level `BL` instructions have restricted ranges from the address of the current instruction. However, you can use these instructions even if `label` is out of range. Often you do not know where the linker places `label`. When necessary, the linker adds code to enable longer branches. The added code is called a veneer.

### Condition flags

The `BL` instruction does not change the flags.

### Availability

See the preceding table for details of availability of the `BL` instruction in both instruction sets.

### Examples

```
BLE    ng+8
```

---

<sup>6</sup> `BL label` and `BLX label` are an instruction pair.

|      |      |
|------|------|
| BL   | subC |
| BLLT | rtX  |

## Related information

[Register-relative and PC-relative expressions](#) on page 222

[Condition code suffixes](#) on page 138

[Information about image structure and generation](#)

## 14.21 BLX, BLXNS (A32)

Branch with Link and exchange instruction set and Branch with Link and Exchange (Non-secure).

### Syntax

`BLX{cond} {q} label`

`BLX{cond} {q} Rm`

`BLXNS{cond} {q} Rm` (Arm®v8-M only)

Where:

**cond**

Is an optional condition code. `cond` is not available on all forms of this instruction.

**q**

Is an optional instruction width specifier. Must be set to `.w` when `label` is used.

**label**

Is a PC-relative expression.

**Rm**

Is a register containing an address to branch to.

### Operation

The `BLX` instruction causes a branch to `label`, or to the address contained in `Rm`. In addition:

- The `BLX` instruction copies the address of the next instruction into LR (R14, the link register).
- The `BLX` instruction can change the instruction set.

`BLX label` always changes the instruction set. It changes a processor in A32 state to T32 state, or a processor in T32 state to A32 state.

`BLX Rm` derives the target instruction set from bit[0] of `Rm`:

- If bit[0] of `Rm` is 0, the processor changes to, or remains in, A32 state.
- If bit[0] of `Rm` is 1, the processor changes to, or remains in, T32 state.



- There are no equivalent instructions to `BLX` to change between AArch32 and AArch64 state. The only way to change execution state is by a change of exception level.
- Armv8-M, Armv7-M, and Armv6-M only support the T32 instruction set. An attempt to change the instruction execution state causes the processor to take an exception on the instruction at the target address.

The `BLXNS` instruction calls a subroutine at an address and instruction set specified by a register, and causes a transition from the Secure to the Non-secure domain. This variant of the instruction must only be used when additional steps required to make such a transition safe are taken.

### Instruction availability and branch ranges

The following table shows the instructions that are available in A32 and T32 state. Instructions that are not shown in this table are not available.

**Table 14-6: BLX instruction availability and range**

| Instruction               | A32        | T32, 16-bit encoding   | T32, 32-bit encoding |
|---------------------------|------------|------------------------|----------------------|
| <code>BLX label</code>    | $\pm 32MB$ | $\pm 4MB$ <sup>7</sup> | $\pm 16MB$           |
| <code>BLX Rm</code>       | Available  | Available              | Use 16-bit           |
| <code>BLX{cond} Rm</code> | Available  | -                      | -                    |
| <code>BLXNS</code>        | -          | Available              | -                    |

### Register restrictions

You can use PC for `Rm` in the A32 `BLX` instruction, but this is deprecated. You cannot use PC in other A32 instructions.

You can use PC for `Rm` in the T32 `BLX` instruction. You cannot use PC in other T32 instructions.

You can use SP for `Rm` in this A32 instruction but this is deprecated.

You can use SP for `Rm` in the T32 `BLX` and `BLXNS` instructions, but this is deprecated. You cannot use SP in the other T32 instructions.

### Condition flags

These instructions do not change the flags.

### Availability

See the preceding table for details of availability of the `BLX` and `BLXNS` instructions in both instruction sets.

### Related information

[Register-relative and PC-relative expressions](#) on page 222

[Condition code suffixes](#) on page 138

<sup>7</sup> `BLX label` and `BLI label` are an instruction pair.

[Instruction width specifiers on page 244](#)  
[Information about image structure and generation](#)

## 14.22 BX, BXNS (A32)

Branch and exchange instruction set and Branch and Exchange Non-secure.

### Syntax

`BX{cond}{q} Rm`

`BXNS{cond}{q} Rm` (Arm®v8-M only)

Where:

**`cond`**

Is an optional condition code. `cond` is not available on all forms of this instruction.

**`q`**

Is an optional instruction width specifier.

**`Rm`**

Is a register containing an address to branch to.

### Operation

The `BX` instruction causes a branch to the address contained in `Rm` and exchanges the instruction set, if necessary. The `BX` instruction can change the instruction set.

`BX Rm` derives the target instruction set from bit[0] of `Rm`:

- If bit[0] of `Rm` is 0, the processor changes to, or remains in, A32 state.
- If bit[0] of `Rm` is 1, the processor changes to, or remains in, T32 state.



- 
- There are no equivalent instructions to `BX` to change between AArch32 and AArch64 state. The only way to change execution state is by a change of exception level.
  - Armv8-M, Armv7-M, and Armv6-M only support the T32 instruction set. An attempt to change the instruction execution state causes the processor to take an exception on the instruction at the target address.
- 

`BX` can also be used for an exception return.

The `BXNS` instruction causes a branch to an address and instruction set specified by a register, and causes a transition from the Secure to the Non-secure domain. This variant of the instruction must only be used when additional steps required to make such a transition safe are taken.

## Instruction availability and branch ranges

The following table shows the instructions that are available in A32 and T32 state. Instructions that are not shown in this table are not available.

**Table 14-7: BX instruction availability and range**

| Instruction        | A32       | T32, 16-bit encoding | T32, 32-bit encoding |
|--------------------|-----------|----------------------|----------------------|
| BX <i>Rm</i>       | Available | Available            | Use 16-bit           |
| BX{cond} <i>Rm</i> | Available | -                    | -                    |
| BXNS               | -         | Available            | -                    |

### Register restrictions

You can use PC for *Rm* in the A32 `BX` instruction, but this is deprecated. You cannot use PC in other A32 instructions.

You can use PC for *Rm* in the T32 `BX` and `BXNS` instructions. You cannot use PC in other T32 instructions.

You can use SP for *Rm* in the A32 `BX` instruction but this is deprecated.

You can use SP for *Rm* in the T32 `BX` and `BXNS` instructions, but this is deprecated.

### Condition flags

These instructions do not change the flags.

### Availability

See the preceding table for details of availability of the `BX` and `BXNS` instructions in both instruction sets.

### Related information

[Register-relative and PC-relative expressions](#) on page 222

[Condition code suffixes](#) on page 138

[Instruction width specifiers](#) on page 244

[Information about image structure and generation](#)

## 14.23 BXJ (A32)

Branch and change to Jazelle state.

### Syntax

`BXJ{cond} Rm`

where:

#### *cond*

is an optional condition code. *cond* is not available on all forms of this instruction.

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

**Rm**

is a register containing an address to branch to.

## Operation

The `BXJ` instruction causes a branch to the address contained in `Rm` and changes the instruction set state to Jazelle®.



In Arm®v8, `BXJ` behaves as a `BX` instruction. This means it causes a branch to an address and instruction set specified by a register.

Note

## Instruction availability and branch ranges

The following table shows the `BXJ` instructions that are available in A32 and T32 state. Instructions that are not shown in this table are not available.

**Table 14-8: BXJ instruction availability and range**

| Instruction               | A32       | T32, 16-bit encoding | T32, 32-bit encoding |
|---------------------------|-----------|----------------------|----------------------|
| <code>BXJ Rm</code>       | Available | -                    | Available            |
| <code>BXJ{cond} Rm</code> | Available | -                    | -                    |

## Register restrictions

You can use SP for `Rm` in the `BXJ` A32 instruction but this is deprecated.

You cannot use SP in the `BXJ` T32 instruction.

## Condition flags

The `BXJ` instruction does not change the flags.

## Availability

See the preceding table for details of availability of the `BXJ` instruction in both instruction sets.

## Related information

[Register-relative and PC-relative expressions](#) on page 222

[Condition code suffixes](#) on page 138

[Information about image structure and generation](#)

## 14.24 CBZ and CBNZ (A32)

Compare and Branch on Zero, Compare and Branch on Non-Zero.

### Syntax

`CBZ{q} Rn, label`

`CBNZ{q} Rn, label`

where:

**q**

Is an optional instruction width specifier.

**Rn**

Is the register holding the operand.

**label**

Is the branch destination.

## Usage

You can use the `CBZ` or `CBNZ` instructions to avoid changing the condition flags and to reduce the number of instructions.

Except that it does not change the condition flags, `CBZ Rn, label` is equivalent to:

```
CMP      Rn, #0
BEQ      label
```

Except that it does not change the condition flags, `CBNZ Rn, label` is equivalent to:

```
CMP      Rn, #0
BNE      label
```

## Restrictions

The branch destination must be a multiple of 2 in the range 0 to 126 bytes after the instruction and in the same execution state.

These instructions must not be used inside an IT block.

## Condition flags

These instructions do not change the flags.

## Architectures

These 16-bit instructions are available in Arm®v7-A T32, Armv8-A T32, and Armv8-M only.

There are no Armv7-A A32, or Armv8-A A32 or 32-bit T32 encodings of these instructions.

## Related information

[B \(A32\)](#) on page 264

[CMP and CMN \(A32\)](#) on page 279

[Instruction width specifiers](#) on page 244

## 14.25 CDP and CDP2 (A32)

Coprocessor data operations.



CDP and CDP2 are not supported in Arm®v8.

**Note**

### Syntax

`CDP{cond} coproc, #opcode1, CRd, CRn, CRM{}, #opcode2}`

`CDP2{cond} coproc, #opcode1, CRd, CRn, CRM{}, #opcode2}`

where:

**cond**

is an optional condition code.

In A32 code, **cond** is not permitted for CDP2.

**coproc**

is the name of the coprocessor the instruction is for. The standard name is  $p_n$ , where  $n$  is an integer in the range 0-15.

**opcode1**

is a 4-bit coprocessor-specific opcode.

**opcode2**

is an optional 3-bit coprocessor-specific opcode.

**CRd, CRn, CRM**

are coprocessor registers.

### Usage

The use of these instructions depends on the coprocessor. See the coprocessor documentation for details.

### Architectures

These 32-bit instructions are available in A32 and T32.

There are no 16-bit versions of these instructions in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.26 CLREX (A32)

Clear Exclusive.

### Syntax

`CLREX{ cond}`

where:

**`cond`**

is an optional condition code.



`cond` is permitted only in T32 code, using a preceding `IT` instruction, but this is deprecated in Arm®v8. This is an unconditional instruction in A32.

### Usage

Use the `CLREX` instruction to clear the local record of the executing processor that an address has had a request for an exclusive access.

`CLREX` returns a closely-coupled exclusive access monitor to its open-access state. This removes the requirement for a dummy store to memory.

It is implementation defined whether `CLREX` also clears the global record of the executing processor that an address has had a request for an exclusive access.

### Architectures

This 32-bit instruction is available in A32 and T32.

There is no 16-bit `CLREX` instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

[A-Profile Architectures](#)

## 14.27 CLZ (A32)

Count Leading Zeros.

### Syntax

`CLZ{ cond} Rd, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Rm***

is the operand register.

## Operation

The `CLZ` instruction counts the number of leading zeros in the value in *Rm* and returns the result in *Rd*. The result value is 32 if no bits are set in the source register, and zero if bit 31 is set.

## Register restrictions

You cannot use PC for any operand.

You can use SP in these A32 instructions but this is deprecated.

You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.

## Architectures

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Examples

```
CLZ      r4, r9
CLZNE   r2, r3
```

Use the `CLZ` T32 instruction followed by a left shift of *Rm* by the resulting *Rd* value to normalize the value of register *Rm*. Use `MOVS`, rather than `Mov`, to flag the case where *Rm* is zero:

```
CLZ r5, r9
MOVS r9, r9, LSL r5
```

## Related information

[Condition code suffixes](#) on page 138

## 14.28 CMP and CMN (A32)

Compare and Compare Negative.

### Syntax

`CMP{cond} Rn, Operand2`

`CMN{cond} Rn, Operand2`

where:

**cond**

is an optional condition code.

**Rn**

is the register holding the first operand.

**Operand2**

is a flexible second operand.

### Operation

These instructions compare the value in a register with *Operand2*. They update the condition flags on the result, but do not place the result in any register.

The `CMP` instruction subtracts the value of *Operand2* from the value in *Rn*. This is the same as a `SUBS` instruction, except that the result is discarded.

The `CMN` instruction adds the value of *Operand2* to the value in *Rn*. This is the same as an `ADDS` instruction, except that the result is discarded.

In certain circumstances, the assembler can substitute `CMN` for `CMP`, or `CMP` for `CMN`. Be aware of this when reading disassembly listings.

### Use of PC in A32 and T32 instructions

You cannot use PC for any operand in any data processing instruction that has a register-controlled shift.

You can use PC (`R15`) in these A32 instructions without register controlled shift but this is deprecated.

If you use PC as *Rn* in A32 instructions, the value used is the address of the instruction plus 8.

You cannot use PC for any operand in these T32 instructions.

### Use of SP in A32 and T32 instructions

You can use SP for *Rn* in A32 and T32 instructions.

You can use SP for *Rm* in A32 instructions but this is deprecated.

You can use SP for  $Rm$  in a 16-bit T32 `CMP Rn, Rm` instruction but this is deprecated. Other uses of SP for  $Rm$  are not permitted in T32.

## Condition flags

These instructions update the N, Z, C and V flags according to the result.

## 16-bit instructions

The following forms of these instructions are available in T32 code, and are 16-bit instructions:

### `CMP Rn, Rm`

Lo register restriction does not apply.

### `CMN Rn, Rm`

$Rn$  and  $Rm$  must both be Lo registers.

### `CMP Rn, #imm`

$Rn$  must be a Lo register.  $imm$  range 0-255.

## Correct examples

```
CMP      r2, r9
CMN      r0, #6400
CMPGT    sp, r7, LSL #2
```

## Incorrect example

```
CMP      r2, pc, ASR r0 ; PC not permitted with register-controlled
                           ; shift.
```

## Related information

[Flexible second operand \(Operand2\)](#) on page 244

[Condition code suffixes](#) on page 138

## 14.29 CPS (A32)

Change Processor State.

### Syntax

```
CPSeffect iflags{, #mode}
```

```
CPS #mode
```

where:

#### **effect**

is one of:

#### **IE**

Interrupt or abort enable.

**ID**

Interrupt or abort disable.

***iflags***

is a sequence of one or more of:

**a**

Enables or disables imprecise aborts.

**i**

Enables or disables IRQ interrupts.

**f**

Enables or disables FIQ interrupts.

**mode**

specifies the number of the mode to change to.

**Usage**

Changes one or more of the mode, A, I, and F bits in the CPSR, without changing the other CPSR bits.

`cps` is only permitted in privileged software execution, and has no effect in User mode.

`cps` cannot be conditional, and is not permitted in an IT block.

**Condition flags**

This instruction does not change the condition flags.

**16-bit instructions**

The following forms of these instructions are available in T32 code, and are 16-bit instructions:

- `CPSIE iflags.`
- `CPSID iflags.`

You cannot specify a mode change in a 16-bit T32 instruction.

**Architectures**

This instruction is available in A32 and T32.

In T32, 16-bit and 32-bit versions of this instruction are available.

**Examples**

```
CPSIE if      ; Enable IRQ and FIQ interrupts.
CPSID A      ; Disable imprecise aborts.
CPSID ai, #17 ; Disable imprecise aborts and interrupts, and enter
               ; FIQ mode.
CPS #16       ; Enter User mode.
```

## Related information

[Processor modes, and privileged and unprivileged software execution](#) on page 76

# 14.30 CPY pseudo-instruction

Copy a value from one register to another.

## Syntax

`CPY{cond} Rd, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm**

is the register holding the value to be copied.

## Operation

The `CPY` pseudo-instruction copies a value from one register to another, without changing the condition flags.

`CPY Rd, Rm` assembles to `MOV Rd, Rm`.

## Architectures

This pseudo-instruction is available in A32 code and in T32 code.

## Register restrictions

Using SP or PC for both `Rd` and `Rm` is deprecated.

## Condition flags

This instruction does not change the condition flags.

## Related information

[MOV \(A32\)](#) on page 333

## 14.31 CRC32 (A32)

CRC32 performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register.

### Syntax

`CRC32B{q} Rd, Rn, Rm ; A1 Wd = CRC32(Wn, Rm[<7:0>])`

`CRC32HB{q} Rd, Rn, Rm ; A1 Wd = CRC32(Wn, Rm[<15:0>])`

`CRC32WB{q} Rd, Rn, Rm ; A1 Wd = CRC32(Wn, Rm[<31:0>])`

`CRC32BB{q} Rd, Rn, Rm ; T1 Wd = CRC32(Wn, Rm[<7:0>])`

`CRC32HB{q} Rd, Rn, Rm ; T1 Wd = CRC32(Wn, Rm[<15:0>])`

`CRC32WB{q} Rd, Rn, Rm ; T1 Wd = CRC32(Wn, Rm[<31:0>])`

Where:

**q**

Is an optional instruction width specifier. See [Instruction width specifiers](#). A `crc32` instruction must be unconditional.

**Rd**

Is the general-purpose accumulator output register.

**Rn**

Is the general-purpose accumulator input register.

**Rm**

Is the general-purpose data source register.

### Architectures supported

Supported in architecture Arm®v8.1 and later. Optionally supported in the Armv8-A architecture.

### Usage

`crc32` takes an input CRC value in the first source operand, performs a CRC on the input value in the second source operand, and returns the output CRC value. The second source operand can be 8, 16, or 32 bits. To align with common usage, the bit order of the values is reversed as part of the operation, and the polynomial `0x04C11DB7` is used for the CRC calculation.



`ID_ISAR5.CRC32` indicates whether this instruction is supported in the T32 and A32 instruction sets.



**Note** For more information about the **CONstrained UNPREDICTABLE** behavior, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[CRC32C \(A32\)](#) on page 285

[A32 and T32 instruction summary](#) on page 240

## 14.32 CRC32C (A32)

CRC32C performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register.

### Syntax

`CRC32CB{q} Rd, Rn, Rm ; A1 Wd = CRC32C(Wn, Rm[<7:0>])`

`CRC32CH{q} Rd, Rn, Rm ; A1 Wd = CRC32C(Wn, Rm[<15:0>])`

`CRC32CW{q} Rd, Rn, Rm ; A1 Wd = CRC32C(Wn, Rm[<31:0>])`

`CRC32CB{q} Rd, Rn, Rm ; T1 Wd = CRC32C(Wn, Rm[<7:0>])`

`CRC32CH{q} Rd, Rn, Rm ; T1 Wd = CRC32C(Wn, Rm[<15:0>])`

`CRC32CW{q} Rd, Rn, Rm ; T1 Wd = CRC32C(Wn, Rm[<31:0>])`

Where:

**q**

Is an optional instruction width specifier. See [Instruction width specifiers](#). A `crc32c` instruction must be unconditional.

**Rd**

Is the general-purpose accumulator output register.

**Rn**

Is the general-purpose accumulator input register.

**Rm**

Is the general-purpose data source register.

### Architectures supported

Supported in architecture Arm®v8.1 and later. Optionally supported in the Armv8-A architecture.

## Usage

CRC32C takes an input CRC value in the first source operand, performs a CRC on the input value in the second source operand, and returns the output CRC value. The second source operand can be 8, 16, or 32 bits. To align with common usage, the bit order of the values is reversed as part of the operation, and the polynomial `0x1EDC6F41` is used for the CRC calculation.



*ID\_ISAR5.CRC32* indicates whether this instruction is supported in the T32 and A32 instruction sets.



For more information about the **CONSTRAINED UNPREDICTABLE** behavior, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[CRC32 \(A32\)](#) on page 283

[A32 and T32 instruction summary](#) on page 240

## 14.33 DBG (A32)

Debug.

### Syntax

`DBG{cond} {option}`

where:

**cond**

is an optional condition code.

**option**

is an optional limitation on the operation of the hint. The range is 0-15.

### Usage

`DBG` is a hint instruction. It is optional whether it is implemented or not. If it is not implemented, it behaves as a `NOP`. The assembler produces a diagnostic message if the instruction executes as `NOP` on the target.

Debug hint provides a hint to a debugger and related tools. See your debugger and related tools documentation to determine the use, if any, of this instruction.

### Architectures

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

### Related information

[NOP \(A32\)](#) on page 350

[Condition code suffixes](#) on page 138

## 14.34 DCPS1 (T32 instruction)

Debug switch to exception level 1 (EL1).



This instruction is supported only in Arm®v8.

Note

### Syntax

`DCPS1`

### Usage

This instruction is valid in Debug state only, and is always **UNDEFINED** in Non-debug state.

`DCPS1` targets EL1 and:

- If EL1 is using AArch32, the processing element (PE) enters SVC mode. If EL3 is using AArch32, Secure SVC is an EL3 mode. This means `DCPS1` causes the PE to enter EL3.
- If EL1 is using AArch64, the PE enters EL1h, and executes future instructions as A64 instructions.

In Non-debug state, use the `svc` instruction to generate a trap to EL1.

### Availability

This 32-bit instruction is available in T32 only.

There is no 16-bit version of this instruction in T32.

### Related information

[SVC \(A32\)](#) on page 446

[DCPS2 \(T32 instruction\)](#) on page 287

[DCPS3 \(T32 instruction\)](#) on page 288

[A-Profile Architectures](#)

## 14.35 DCPS2 (T32 instruction)

Debug switch to exception level 2.



This instruction is supported only in Arm®v8.

Note

### Syntax

DCPS2

### Usage

This instruction is valid in Debug state only, and is always **UNDEFINED** in Non-debug state.

DCPS2 targets EL2 and:

- If EL2 is using AArch32, the PE enters Hyp mode.
- If EL2 is using AArch64, the PE enters EL2h, and executes future instructions as A64 instructions.

In Non-debug state, use the HVC instruction to generate a trap to EL2.

### Availability

This 32-bit instruction is available in T32 only.

There is no 16-bit version of this instruction in T32.

### Related information

[HVC \(A32\)](#) on page 298

[DCPS1 \(T32 instruction\)](#) on page 287

[DCPS3 \(T32 instruction\)](#) on page 288

[A-Profile Architectures](#)

## 14.36 DCPS3 (T32 instruction)

Debug switch to exception level 3.



This instruction is supported only in Arm®v8.

Note

## Syntax

`DCPS3`

## Usage

This instruction is valid in Debug state only, and is always **UNDEFINED** in Non-debug state.

`DCPS3` targets EL3 and:

- If EL3 is using AArch32, the PE enters Monitor mode.
- If EL3 is using AArch64, the PE enters EL3h, and executes future instructions as A64 instructions.

In Non-debug state, use the `smc` instruction to generate a trap to EL3.

## Availability

This 32-bit instruction is available in T32 only.

There is no 16-bit version of this instruction in T32.

## Related information

[SMC \(A32\)](#) on page 399

[DCPS2 \(T32 instruction\)](#) on page 287

[DCPS1 \(T32 instruction\)](#) on page 287

[A-Profile Architectures](#)

## 14.37 DMB (A32)

The Data Memory Barrier instruction `DMB` is a memory barrier that ensures the ordering of observations of memory accesses.

### Syntax

`DMB{cond} {option}`

where:

#### `cond`

is an optional condition code.



`cond` is permitted only in T32 code. This instruction is unconditional in A32 code.

#### `option`

is an optional limitation on the operation of the hint. Permitted values are:

**SY**

Full system barrier operation. Reads and writes are the required access types, both before and after the barrier instruction. This option is the default and can be omitted.

**LD**

Full system barrier operation. Reads are the required access type before the barrier instruction, and reads and writes are the required access types after the barrier instruction.

**ST**

Full system barrier operation. Writes are the required access type, both before and after the barrier instruction.

**ISH**

Barrier operation only to the inner shareable domain. Reads and writes are the required access types, both before and after the barrier instruction.

**ISHLD**

Barrier operation that waits only for loads to complete, and only applies to the inner shareable domain. Reads are the required access type before the barrier instruction, and reads and writes are the required access types after the barrier instruction.

**ISHST**

Barrier operation that waits only for stores to complete, and only to the inner shareable domain. Writes are the required access type, both before and after the barrier instruction.

**NSH**

Barrier operation only out to the point of unification. Reads and writes are the required access, both before and after the barrier instruction.

**NSHLD**

Barrier operation that waits only for loads to complete and only applies out to the point of unification. Reads are the required access type before the barrier instruction, and reads and writes are the required access types after the barrier instruction.

**NSHST**

Barrier operation that waits only for stores to complete and only out to the point of unification. Writes are the required access type both before and after the barrier instruction.

**OSH**

Barrier operation only to the outer shareable domain. Reads and writes are the required access types, both before and after the barrier instruction.

**OSHLD**

DMB operation that waits only for loads to complete, and only applies to the outer shareable domain. Reads are the required access type before the barrier instruction, and reads and writes are the required access types after the barrier instruction.

**OSHST**

Barrier operation that waits only for stores to complete, and only to the outer shareable domain. Writes are the required access type, both before and after the barrier instruction.



The options `LD`, `ISHLD`, `NSHLD`, and `OSHLD` are supported only in the Arm®v8-A and Armv8-R architectures.

**Note**

**Operation**

The Data Memory Barrier `DMB` instruction is a memory barrier instruction that ensures the relative order of memory accesses before the barrier with memory accesses after the barrier. The `DMB` instruction does not ensure the completion of any of the memory accesses for which it ensures relative order.

**Alias**

The following alternative values of `option` are supported, but Arm recommends that you do not use them:

- `SH` is an alias for `ISH`.
- `SHST` is an alias for `ISHST`.
- `UN` is an alias for `NSH`.
- `UNST` is an alias for `NSHST`.

**Architectures**

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

**Related information**

[Condition code suffixes](#) on page 138

[Learn the architecture - Memory Systems, Ordering, and Barriers](#)

## 14.38 DSB (A32)

Data Synchronization Barrier `DSB` is a memory barrier that ensures the completion of memory accesses.

**Syntax**

`DSB{cond} {option}`

where:

***cond***

is an optional condition code.



*cond* is permitted only in T32 code. This instruction is unconditional in A32 code.

***option***

is an optional limitation on the operation of the hint. Permitted values are:

**SY**

Full system barrier operation. Reads and writes are the required access types, both before and after the barrier instruction. This option is the default and can be omitted.

**LD**

Barrier operation that waits only for loads to complete. Reads are the required access type before the barrier instruction, and reads and writes are the required access types after the barrier instruction.

**ST**

Barrier operation that waits only for stores to complete. Writes are the required access type, both before and after the barrier instruction.

**ISH**

Barrier operation only to the inner shareable domain. Reads and writes are the required access types, both before and after the barrier instruction.

**ISHLD**

Barrier operation that waits only for loads to complete, and only applies to the inner shareable domain. Reads are the required access type before the barrier instruction, and reads and writes are the required access types after the barrier instruction.

**ISHST**

Barrier operation that waits only for stores to complete, and only to the inner shareable domain. Writes are the required access type, both before and after the barrier instruction.

**NSH**

Barrier operation only out to the point of unification. Reads and writes are the required access, both before and after the barrier instruction.

**NSHLD**

Barrier operation that waits only for loads to complete and only applies out to the point of unification. Reads are the required access type before the barrier instruction, and reads and writes are the required access types after the barrier instruction.

**NSHST**

Barrier operation that waits only for stores to complete and only out to the point of unification. Writes are the required access type both before and after the barrier instruction.

**OSH**

Barrier operation only to the outer shareable domain. Reads and writes are the required access types, both before and after the barrier instruction.

**OSHLD**

`DMB` operation that waits only for loads to complete, and only applies to the outer shareable domain. Writes are the required access type, both before and after the barrier instruction.

**OSHST**

Barrier operation that waits only for stores to complete, and only to the outer shareable domain. Reads are the required access type before the barrier instruction, and reads and writes are the required access types after the barrier instruction.



The options `LD`, `ISHLD`, `NSHLD`, and `OSHLD` are supported only in the Arm®v8-A and Armv8-R architectures.

Note

## Operation

Data Synchronization Barrier acts as a special kind of memory barrier. No instruction in program order after this instruction executes until this instruction completes. This instruction completes when:

- All explicit memory accesses before this instruction complete.
- All Cache, Branch predictor and TLB maintenance operations before this instruction complete.

## Alias

The following alternative values of *option* are supported for `DSB`, but Arm recommends that you do not use them:

- `SH` is an alias for `ISH`.
- `SHST` is an alias for `ISHST`.
- `UN` is an alias for `NSH`.
- `UNST` is an alias for `NSHST`.

## Architectures

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

[Learn the architecture - Memory Systems, Ordering, and Barriers](#)

## 14.39 EOR (A32)

Logical Exclusive OR.

### Syntax

`EOR{S}{cond} Rd, Rn, Operand2`

where:

**s**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Operand2**

is a flexible second operand.

### Operation

The `EOR` instruction performs bitwise Exclusive OR operations on the values in `Rn` and `operand2`.

### Use of PC in T32 instructions

You cannot use PC (`R15`) for `Rd` or any operand in an `EOR` instruction.

### Use of PC and SP in A32 instructions

You can use PC and SP with the `EOR` instruction but they are deprecated.

If you use PC as `Rn`, the value used is the address of the instruction plus 8.

If you use PC as `Rd`:

- Execution branches to the address corresponding to the result.
- If you use the s suffix, see the `SUBS pc, lr` instruction.

You cannot use PC for any operand in any data processing instruction that has a register-controlled shift.

### Condition flags

If s is specified, the `EOR` instruction:

- Updates the N and Z flags according to the result.
- Can update the C flag during the calculation of `operand2`.

- Does not affect the V flag.

## 16-bit instructions

The following forms of the `EOR` instruction are available in T32 code, and are 16-bit instructions:

**EORS Rd, Rd, Rm**

*Rd* and *Rm* must both be Lo registers. This form can only be used outside an IT block.

**EOR{cond} Rd, Rd, Rm**

*Rd* and *Rm* must both be Lo registers. This form can only be used inside an IT block.

It does not matter if you specify `EOR{S} Rd, Rm, Rd`. The instruction is the same.

### Correct examples

```
EORS    r0,r0,r3,ROR r6
EORS    r7, r11, #0x18181818
```

### Incorrect example

```
EORS    r0,pc,r3,ROR r6      ; PC not permitted with register
                                ; controlled shift
```

### Related information

[Flexible second operand \(Operand2\)](#) on page 244

[SUBS pc, lr \(A32\)](#) on page 444

[Condition code suffixes](#) on page 138

## 14.40 ERET (A32)

Exception Return.

### Syntax

`ERET{cond}`

where:

**cond**

is an optional condition code.

### Usage

In a processor that implements the Virtualization Extensions, you can use `ERET` to perform a return from an exception taken to Hyp mode.

### Operation

When executed in Hyp mode, `ERET` loads the PC from ELR\_hyp and loads the CPSR from SPSR\_hyp. When executed in any other mode, apart from User or System, it behaves as:

- `MOVS PC, LR` in the A32 instruction set.
- `SUBS PC, LR, #0` in the T32 instruction set.

## Notes

You must not use `ERET` in User or System mode. The assembler cannot warn you about this because it has no information about what the processor mode is likely to be at execution time.

`ERET` is the preferred synonym for `SUBS PC, LR, #0` in the T32 instruction set.

## Architectures

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Related information

[Processor modes, and privileged and unprivileged software execution](#) on page 76

[MOV \(A32\)](#) on page 333

[SUBS pc, lr \(A32\)](#) on page 444

[Condition code suffixes](#) on page 138

[HVC \(A32\)](#) on page 298

## 14.41 ESB (A32)

Error Synchronization Barrier.

### Syntax

`ESB{c}{q} ; A1 general registers (A32)`

`ESB{c}.W ; T1 general registers (T32)`

Where:

**q**

Is an optional instruction width specifier. See [Instruction width specifiers](#).

**c**

Is an optional instruction condition code. See [Condition Codes](#).

### Architectures supported

Supported in the Arm®v8-A and Armv8-R architectures.

### Usage

Error Synchronization Barrier.

## Related information

[A32 and T32 instruction summary](#) on page 240

## 14.42 HLT (A32)

Halting breakpoint.



This instruction is supported only in the Arm®v8 architecture.

Note

### Syntax

`HLT{Q} #imm`

Where:

**Q**

is an optional suffix. It only has an effect when Halting debug-mode is disabled. In this case, if Q is specified, the instruction behaves as a `NOP`. If Q is not specified, the instruction is **UNDEFINED**.

**imm**

is an expression evaluating to an integer in the range:

- 0-65535 (a 16-bit value) in an A32 instruction.
- 0-63 (a 6-bit value) in a 16-bit T32 instruction.

### Usage

The `HLT` instruction causes the processor to enter Debug state if Halting debug-mode is enabled.

In both A32 state and T32 state, `imm` is ignored by the Arm hardware. However, a debugger can use it to store additional information about the breakpoint.

`HLT` is an unconditional instruction. It must not have a condition code in A32 code. In T32 code, the `HLT` instruction does not require a condition code suffix because it always executes irrespective of its condition code suffix.

### Availability

This instruction is available in A32 and T32.

In T32, it is only available as a 16-bit instruction.

## Related information

[NOP \(A32\)](#) on page 350

## 14.43 HVC (A32)

Hypervisor Call.

### Syntax

`HVC #imm`

where:

***imm***

is an expression evaluating to an integer in the range 0-65535.

### Operation

In a processor that implements the Virtualization Extensions, the `HVC` instruction causes a Hypervisor Call exception. This means that the processor enters Hyp mode, the CPSR value is saved to the Hyp mode SPSR, and execution branches to the HVC vector.

`HVC` must not be used if the processor is in Secure state, or in User mode in Non-secure state.

*imm* is ignored by the processor. However, it can be retrieved by the exception handler to determine what service is being requested.

`HVC` cannot be conditional, and is not permitted in an IT block.

### Notes

The `ERET` instruction performs an exception return from Hyp mode.

### Architectures

This 32-bit instruction is available in A32 and T32. It is available in Arm®v7 architectures that include the Virtualization Extensions.

There is no 16-bit version of this instruction in T32.

### Related information

[Processor modes, and privileged and unprivileged software execution](#) on page 76  
[ERET \(A32\)](#) on page 295

## 14.44 ISB (A32)

The Instruction Synchronization Barrier `ISB` instruction ensures that all instructions that come after the `ISB` instruction in program order are fetched from the cache or memory after the `ISB` instruction has completed.

### Syntax

```
ISB{cond} {option}
```

where:

**cond**

is an optional condition code.



`cond` is permitted only in T32 code. This instruction is unconditional in A32 code.

**option**

is an optional limitation on the operation of the hint. The permitted value is:

**SY**

Full system barrier operation. This option is the default and can be omitted.

### Operation

Instruction Synchronization Barrier flushes the pipeline in the processor, so that all instructions following the `ISB` instruction are fetched from cache or memory, after the instruction has been completed. It ensures that the effects of context altering operations, such as changing the ASID, or completed TLB maintenance operations, or branch predictor maintenance operations, in addition to all changes to the CP15 registers, executed before the `ISB` instruction are visible to the instructions fetched after the `ISB` instruction.

In addition, the `ISB` instruction ensures that any branches that appear in program order after it are always written into the branch prediction logic with the context that is visible after the `ISB` instruction. This is required to ensure correct execution of the instruction stream.



When the target architecture is Arm®v7-M, you cannot use an `ISB` instruction in an IT block, unless it is the last instruction in the block.

### Architectures

This 32-bit instructions are available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

[Learn the architecture - Memory Systems, Ordering, and Barriers](#)

## 14.45 IT (A32)

The `IT` (If-Then) instruction makes a single following instruction (the IT block) conditional. The conditional instruction must be from a restricted set of 16-bit instructions.

### Syntax

`IT cond`

where:

**`cond`**

specifies the condition for the following instruction.

### Deprecated syntax

`IT{x{y{z}}}{cond}`

where:

**`x`**

specifies the condition switch for the second instruction in the IT block.

**`y`**

specifies the condition switch for the third instruction in the IT block.

**`z`**

specifies the condition switch for the fourth instruction in the IT block.

**`cond`**

specifies the condition for the first instruction in the IT block.

The condition switches for the second, third, and fourth instructions in the IT block can be either:

**T**

Then. Applies the condition `{cond}` to the instruction.

**E**

Else. Applies the inverse condition of `{cond}` to the instruction.

### Usage

The IT block can contain between two and four conditional instructions, where the conditions can be all the same, or some of them can be the logical inverse of the others, but this is deprecated in Arm®v8.

The conditional instruction (including branches, but excluding the `BKPT` instruction) must specify the condition in the `{cond}` part of its syntax.

You are not required to write `IT` instructions in your code, because the assembler generates them for you automatically according to the conditions specified on the following instructions. However, if you do write `IT` instructions, the assembler validates the conditions specified in the `IT` instructions against the conditions specified in the following instructions.

Writing the `IT` instructions ensures that you consider the placing of conditional instructions, and the choice of conditions, in the design of your code.

When assembling to A32 code, the assembler performs the same checks, but does not generate any `IT` instructions.

Except for `CMP`, `CMN`, and `TST`, the 16-bit instructions that normally affect the condition flags, do not affect them when used inside an IT block.

A `BKPT` instruction in an IT block is always executed, so it does not require a condition in the `{cond}` part of its syntax. The IT block continues from the next instruction. Using a `BKPT` or `HLT` instruction inside an IT block is deprecated.



You can use an IT block for unconditional instructions by using the `AL` condition.

**Note**

---

Conditional branches inside an IT block have a longer branch range than those outside the IT block.

## Restrictions

The following instructions are not permitted in an IT block:

- `IT`.
- `CBZ` and `CBNZ`.
- `TBB` and `TBH`.
- `CPS`, `CPSID` and `CPSIE`.
- `SETEND`.

Other restrictions when using an IT block are:

- A branch or any instruction that modifies the PC is only permitted in an IT block if it is the last instruction in the block.
- You cannot branch to any instruction in an IT block, unless when returning from an exception handler.
- You cannot use any assembler directives in an IT block.



`armasm` shows a diagnostic message when any of these instructions are used in an IT block.

**Note**

Using any instruction not listed in the following table in an IT block is deprecated. Also, any explicit reference to R15 (the PC) in the IT block is deprecated.

**Table 14-9: Permitted instructions inside an IT block**

| 16-bit instruction            | When deprecated                                                              |
|-------------------------------|------------------------------------------------------------------------------|
| MOV, MVN                      | When $Rm$ or $Rd$ is the PC                                                  |
| LDR, LDRB, LDRH, LDRSB, LDRSH | For PC-relative forms                                                        |
| STR, STRB, STRH               | -                                                                            |
| ADD, ADC, RSB, SBC, SUB       | ADD SP, SP, #imm or SUB SP, SP, #imm or when $Rm$ , $Rdn$ or $Rdm$ is the PC |
| CMP, CMN                      | When $Rm$ or $Rn$ is the PC                                                  |
| MUL                           | -                                                                            |
| ASR, LSL, LSR, ROR            | -                                                                            |
| AND, BIC, EOR, ORR, TST       | -                                                                            |
| BX, BLX                       | When $Rm$ is the PC                                                          |

## Condition flags

This instruction does not change the flags.

## Exceptions

Exceptions can occur between an `IT` instruction and the corresponding IT block, or within an IT block. This exception results in entry to the appropriate exception handler, with suitable return information in LR and SPSR.

Instructions designed for use as exception returns can be used as normal to return from the exception, and execution of the IT block resumes correctly. This is the only way that a PC-modifying instruction can branch to an instruction in an IT block.

## Availability

This 16-bit instruction is available in T32 only.

In A32 code, `IT` is a pseudo-instruction that does not generate any code.

There is no 32-bit version of this instruction.

## Correct examples

```
IT      GT
LDRGT  r0, [r1,#4]

IT      EQ
ADDEQ  r0, r1, r2
```

## Incorrect examples

```
IT      NE
ADD   r0,r0,r1 ; syntax error: no condition code used in IT block

ITT    EQ
```

```

MOVEQ  r0,r1
ADDEQ  r0,r0,#1 ; IT block covering more than one instruction is deprecated

IT      GT
LDRGT  r0,label ; LDR (PC-relative) is deprecated in an IT block

IT      EQ
ADDEQ  PC,r0      ; ADD is deprecated when Rdn is the PC

```

## 14.46 LDA (A32)

Load-Acquire Register.



**Note**

This instruction is supported only in Arm®v8.

### Syntax

LDA{cond} Rt, [Rn]

LDAB{cond} Rt, [Rn]

LDAH{cond} Rt, [Rn]

where:

**cond**

is an optional condition code.

**Rt**

is the register to load.

**Rn**

is the register on which the memory address is based.

### Operation

LDA loads data from memory. If any loads or stores appear after a load-acquire in program order, then all observers are guaranteed to observe the load-acquire before observing the loads and stores. Loads and stores appearing before a load-acquire are unaffected.

If a store-release follows a load-acquire, each observer is guaranteed to observe them in program order.

There is no requirement that a load-acquire be paired with a store-release.

### Restrictions

The address specified must be naturally aligned, or an alignment fault is generated.

The PC must not be used for *Rt* or *Rn*.

## Availability

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction.

## Related information

[LDAEX \(A32\)](#) on page 304

[STL \(A32\)](#) on page 428

[STLEX \(A32\)](#) on page 429

[Condition code suffixes](#) on page 138

## 14.47 LDAEX (A32)

Load-Acquire Register Exclusive.



This instruction is supported only in Arm®v8.

Note

### Syntax

`LDAEX{cond} Rt, [Rn]`

`LDAEXB{cond} Rt, [Rn]`

`LDAEXH{cond} Rt, [Rn]`

`LDAEXD{cond} Rt, Rt2, [Rn]`

where:

**cond**

is an optional condition code.

**Rt**

is the register to load.

**Rt2**

is the second register for doubleword loads.

**Rn**

is the register on which the memory address is based.

### Operation

`LDAEX` loads data from memory.

- If the physical address has the Shared TLB attribute, `LDAEX` tags the physical address as exclusive access for the current processor, and clears any exclusive access tag for this processor for any other physical address.
- Otherwise, it tags the fact that the executing processor has an outstanding tagged physical address.
- If any loads or stores appear after `LDAEX` in program order, then all observers are guaranteed to observe the `LDAEX` before observing the loads and stores. Loads and stores appearing before `LDAEX` are unaffected.

## Restrictions

The PC must not be used for any of  $Rt$ ,  $Rt2$ , or  $Rn$ .

For A32 instructions:

- SP can be used but use of SP for any of  $Rt$ , or  $Rt2$  is deprecated.
- For `LDAEXD`,  $Rt$  must be an even numbered register, and not LR.
- $Rt2$  must be  $R(t+1)$ .

For T32 instructions:

- SP can be used for  $Rn$ , but must not be used for any of  $Rt$ , or  $Rt2$ .
- For `LDAEXD`,  $Rt$  and  $Rt2$  must not be the same register.

## Usage

Use `LDAEX` and `STLEX` to implement interprocess communication in multiple-processor and shared-memory systems.

For reasons of performance, keep the number of instructions between corresponding `LDAEX` and `STLEX` instructions to a minimum.



The address used in a `STLEX` instruction must be the same as the address in the most recently executed `LDAEX` instruction.

**Note**

## Availability

These 32-bit instructions are available in A32 and T32.

There are no 16-bit versions of these instructions.

## Related information

[STL \(A32\)](#) on page 428

[LDA \(A32\)](#) on page 303

[STLEX \(A32\)](#) on page 429

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.48 LDC and LDC2 (A32)

Transfer Data from memory to Coprocessor.



LDC2 is not supported in Arm®v8.

**Note**

### Syntax

```
{op} {L}{cond} coproc, CRd, [Rn]
{op} {L}{cond} coproc, CRd, [Rn, #{-}offset] ; offset addressing
{op} {L}{cond} coproc, CRd, [Rn, #{-}offset]! ; pre-index addressing
{op} {L}{cond} coproc, CRd, [Rn], #{-}offset ; post-index addressing
{op} {L}{cond} coproc, CRd, label
{op} {L}{cond} coproc, CRd, [Rn], {option}
```

where:

**op**

is LDC or LDC2.

**cond**

is an optional condition code.

In A32 code, *cond* is not permitted for LDC2.

**L**

is an optional suffix specifying a long transfer.

**coproc**

is the name of the coprocessor the instruction is for. The standard name is  $p_n$ , where  $n$  is an integer that must be:

- In the range 0 to 15 in Armv7 and earlier.
- 14 in Armv8.

**CRd**

is the coprocessor register to load.

**Rn**

is the register on which the memory address is based. If PC is specified, the value used is the address of the current instruction plus eight.

– is an optional minus sign. If – is present, the offset is subtracted from  $Rn$ . Otherwise, the offset is added to  $Rn$ .

***offset***

is an expression evaluating to a multiple of 4, in the range 0 to 1020.

!

is an optional suffix. If ! is present, the address including the offset is written back into  $Rn$ .

***label***

is a word-aligned PC-relative expression.

*label* must be within 1020 bytes of the current instruction.

***option***

is a coprocessor option in the range 0-255, enclosed in braces.

## Usage

The use of these instructions depends on the coprocessor. See the coprocessor documentation for details.

## Architectures

These 32-bit instructions are available in A32 and T32.

There are no 16-bit versions of these instructions in T32.

## Register restrictions

You cannot use PC for  $Rn$  in the pre-index and post-index instructions. These are the forms that write back to  $Rn$ .

## Related information

[Register-relative and PC-relative expressions](#) on page 222

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.49 LDM (A32)

Load Multiple registers.

## Syntax

`LDM{addr_mode}{cond} Rn{!}, reglist{^}`

where:

***addr\_mode***

is any one of the following:

**IA**

Increment address After each transfer. This is the default, and can be omitted.

**IB**

Increment address Before each transfer (A32 only).

**DA**

Decrement address After each transfer (A32 only).

**DB**

Decrement address Before each transfer.

You can also use the stack oriented addressing mode suffixes, for example, when implementing stacks.

***cond***

is an optional condition code.

***Rn***

is the base register, the AArch32 register holding the initial address for the transfer. *Rn* must not be PC.

**!**

is an optional suffix. If ! is present, the final address is written back into *Rn*.

***reglist***

is a list of one or more registers to be loaded, enclosed in braces. It can contain register ranges. It must be comma separated if it contains more than one register or register range. Any combination of registers R0 to R15 (PC) can be transferred in A32 state, but there are some restrictions in T32 state.

**^**

is an optional suffix, available in A32 state only. You must not use it in User mode or System mode. It has the following purposes:

- If *reglist* contains the PC (R15), in addition to the normal multiple register transfer, the SPSR is copied into the CPSR. This is for returning from exception handlers. Use this only from exception modes.
- Otherwise, data is transferred into or out of the User mode registers instead of the current mode registers.

## Restrictions on reglist in 32-bit T32 instructions

In 32-bit T32 instructions:

- The SP cannot be in the list.
- The PC and LR cannot both be in the list.
- There must be two or more registers in the list.

If you write an LDM instruction with only one register in reglist, the assembler automatically substitutes the equivalent LDR instruction. Be aware of this when comparing disassembly listings with source code.

You can use the `--diag_warning 1645` assembler command line option to check when an instruction substitution occurs.

## Restrictions on reglist in A32 instructions

A32 load instructions can have SP and PC in the *reglist* but these instructions that include SP in the *reglist* or both PC and LR in the *reglist* are deprecated.

## 16-bit instructions

16-bit versions of a subset of these instructions are available in T32 code.

The following restrictions apply to the 16-bit instructions:

- All registers in *reglist* must be Lo registers.
- *Rn* must be a Lo register.
- *addr\_mode* must be omitted (or `IA`), meaning increment address after each transfer.
- Writeback must be specified for `LDM` instructions where *Rn* is not in the *reglist*.

In addition, the `PUSH` and `POP` instructions are subsets of the `STM` and `LDM` instructions and can therefore be expressed using the `STM` and `LDM` instructions. Some forms of `PUSH` and `POP` are also 16-bit instructions.

## Loading to the PC

A load to the PC causes a branch to the instruction at the address loaded.

Also:

- Bits[1:0] must not be 0b10.
- If bit[0] is 1, execution continues in T32 state.
- If bit[0] is 0, execution continues in A32 state.

## Loading or storing the base register, with writeback

In A32 or 16-bit T32 instructions, if *Rn* is in *reglist*, and writeback is specified with the `!` suffix:

- If the instruction is `STM{addr_mode}{cond} Rn` and *Rn* is the lowest-numbered register in *reglist*, the initial value of *Rn* is stored. These instructions are deprecated.
- Otherwise, the loaded or stored value of *Rn* cannot be relied on, so these instructions are not permitted.

32-bit T32 instructions are not permitted if *Rn* is in *reglist*, and writeback is specified with the `!` suffix.

## Correct example

```
LDM      r8,{r0,r2,r9}      ; LDMIA is a synonym for LDM
```

## Incorrect example

```
LDMDA    r2, {}           ; must be at least one register in list
```

## Related information

[Stack implementation using LDM and STM](#) on page 119  
[POP \(A32\)](#) on page 357  
[Condition code suffixes](#) on page 138  
[Address alignment in A32/T32 code](#) on page 157

## 14.50 LDR (immediate offset) (A32)

Load with immediate offset, pre-indexed immediate offset, or post-indexed immediate offset.

### Syntax

```
LDR{type}{cond} Rt, [Rn {, #offset}] ; immediate offset
LDR{type}{cond} Rt, [Rn, #offset]! ; pre-indexed
LDR{type}{cond} Rt, [Rn], #offset ; post-indexed
LDRD{cond} Rt, Rt2, [Rn {, #offset}] ; immediate offset, doubleword
LDRD{cond} Rt, Rt2, [Rn, #offset]! ; pre-indexed, doubleword
LDRD{cond} Rt, Rt2, [Rn], #offset ; post-indexed, doubleword
```

where:

***type***

can be any one of:

**B**

unsigned Byte (Zero extend to 32 bits on loads.)

**SB**

signed Byte (LDR only. Sign extend to 32 bits.)

**H**

unsigned Halfword (Zero extend to 32 bits on loads.)

**SH**

signed Halfword (LDR only. Sign extend to 32 bits.)

**-**

omitted, for Word.

***cond***

is an optional condition code.

**Rt**

is the register to load.

**Rn**

is the register on which the memory address is based.

**offset**

is an offset. If *offset* is omitted, the address is the contents of *Rn*.

**Rt2**

is the additional register to load for doubleword operations.

Not all options are available in every instruction set and architecture.

## Offset ranges and architectures

The following table shows the ranges of offsets and availability of these instructions:

**Table 14-10: Offsets and architectures, LDR, word, halfword, and byte**

| Instruction                                                                             | Immediate offset           | Pre-indexed                | Post-indexed               |
|-----------------------------------------------------------------------------------------|----------------------------|----------------------------|----------------------------|
| A32, word or byte <sup>8</sup>                                                          | -4095 to 4095              | -4095 to 4095              | -4095 to 4095              |
| A32, signed byte, halfword, or signed halfword                                          | -255 to 255                | -255 to 255                | -255 to 255                |
| A32, doubleword                                                                         | -255 to 255                | -255 to 255                | -255 to 255                |
| T32 32-bit encoding, word, halfword, signed halfword, byte, or signed byte <sup>8</sup> | -255 to 4095               | -255 to 255                | -255 to 255                |
| T32 32-bit encoding, doubleword                                                         | -1020 to 1020 <sup>9</sup> | -1020 to 1020 <sup>9</sup> | -1020 to 1020 <sup>9</sup> |
| T32 16-bit encoding, word <sup>10</sup>                                                 | 0 to 124 <sup>9</sup>      | Not available              | Not available              |
| T32 16-bit encoding, unsigned halfword <sup>10</sup>                                    | 0 to 62 <sup>11</sup>      | Not available              | Not available              |
| T32 16-bit encoding, unsigned byte <sup>10</sup>                                        | 0 to 31                    | Not available              | Not available              |
| T32 16-bit encoding, word, Rn is SP <sup>12</sup>                                       | 0 to 1020 <sup>9</sup>     | Not available              | Not available              |

## Register restrictions

*Rn* must be different from *Rt* in the pre-index and post-index forms.

<sup>8</sup> For word loads, Rt can be the PC. A load to the PC causes a branch to the address loaded. In Arm®v4, bits[1:0] of the address loaded must be 0b00. In Armv5T and above, bits[1:0] must not be 0b10, and if bit[0] is 1, execution continues in T32 state, otherwise execution continues in A32 state.

<sup>9</sup> Must be divisible by 4.

<sup>10</sup> Rt and Rn must be in the range R0-R7.

<sup>11</sup> Must be divisible by 2.

<sup>12</sup> Rt must be in the range R0-R7.

## Doubleword register restrictions

$R_n$  must be different from  $R_{t2}$  in the pre-index and post-index forms.

For T32 instructions, you must not specify SP or PC for either  $R_t$  or  $R_{t2}$ .

For A32 instructions:

- $R_t$  must be an even-numbered register.
- $R_t$  must not be LR.
- Arm strongly recommends that you do not use R12 for  $R_t$ .
- $R_{t2}$  must be  $R(t + 1)$ .

## Use of PC

In A32 code you can use PC for  $R_t$  in LDR word instructions and PC for  $R_n$  in LDR instructions.

Other uses of PC are not permitted in these A32 instructions.

In T32 code you can use PC for  $R_t$  in LDR word instructions and PC for  $R_n$  in LDR instructions. Other uses of PC in these T32 instructions are not permitted.

## Use of SP

You can use SP for  $R_n$ .

In A32 code, you can use SP for  $R_n$  in word instructions. You can use SP for  $R_n$  in non-word instructions in A32 code but this is deprecated.

In T32 code, you can use SP for  $R_n$  in word instructions only. All other use of SP for  $R_n$  in these instructions are not permitted in T32 code.

## Examples

```
LDR    r8,[r10]      ; loads R8 from the address in R10.
LDRNE  r2,[r5,#960]! ; (conditionally) loads R2 from a word
                      ; 960 bytes above the address in R5, and
                      ; increments R5 by 960.
```

## Related information

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.51 LDR (PC-relative) (A32)

Load register. The address is an offset from the PC.

### Syntax

```
LDR{type}{cond}{.W} Rt, label
```

`LDRD{cond} Rt, Rt2, label ; Doubleword`

where:

**type**

can be any one of:

**B**

unsigned Byte (Zero extend to 32 bits on loads.)

**SB**

signed Byte (`LDR` only. Sign extend to 32 bits.)

**H**

unsigned Halfword (Zero extend to 32 bits on loads.)

**SH**

signed Halfword (`LDR` only. Sign extend to 32 bits.)

**-**

omitted, for Word.

**cond**

is an optional condition code.

**.W**

is an optional instruction width specifier.

**Rt**

is the register to load or store.

**Rt2**

is the second register to load or store.

**label**

is a PC-relative expression.

`label` must be within a limited distance of the current instruction.



Equivalent syntaxes are available for the `STR` instruction in A32 code but they are deprecated.

Note

## Offset range and architectures

The assembler calculates the offset from the PC for you. The assembler generates an error if `label` is out of range.

The following table shows the possible offsets between the label and the current instruction:

**Table 14-11: PC-relative offsets**

| Instruction                                            | Offset range         |
|--------------------------------------------------------|----------------------|
| A32 LDR, LDRB, LDRSB, LDRH, LDRSH <sup>13</sup>        | ±4095                |
| A32 LDRD                                               | ±255                 |
| 32-bit T32 LDR, LDRB, LDRSB, LDRH, LDRSH <sup>13</sup> | ±4095                |
| 32-bit T32 LDRD <sup>14</sup>                          | ±1020 <sup>15</sup>  |
| 16-bit T32 LDR <sup>16</sup>                           | 0-1020 <sup>15</sup> |

## LDR (PC-relative) in T32

You can use the `.w` width specifier to force `LDR` to generate a 32-bit instruction in T32 code. `LDR.w` always generates a 32-bit instruction, even if the target could be reached using a 16-bit `LDR`.

For forward references, `LDR` without `.w` always generates a 16-bit instruction in T32 code, even if that results in failure for a target that could be reached using a 32-bit T32 `LDR` instruction.

## Doubleword register restrictions

For 32-bit T32 instructions, you must not specify SP or PC for either `Rt` or `Rt2`.

For A32 instructions:

- `Rt` must be an even-numbered register.
- `Rt` must not be LR.
- Arm strongly recommends that you do not use R12 for `Rt`.
- `Rt2` must be `R(t + 1)`.

## Use of SP

In A32 code, you can use SP for `Rt` in `LDR` word instructions. You can use SP for `Rt` in `LDR` non-word A32 instructions but this is deprecated.

In T32 code, you can use SP for `Rt` in `LDR` word instructions only. All other uses of SP in these instructions are not permitted in T32 code.

## Related information

[Register-relative and PC-relative expressions](#) on page 222

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

<sup>13</sup> For word loads, Rt can be the PC. A load to the PC causes a branch to the address loaded. In Arm®v4, bits[1:0] of the address loaded must be 0b00. In Armv5T and above, bits[1:0] must not be 0b10, and if bit[0] is 1, execution continues in T32 state, otherwise execution continues in A32 state.

<sup>14</sup> In Armv7-M, `LDRD` (PC-relative) instructions must be on a word-aligned address.

<sup>15</sup> Must be a multiple of 4.

<sup>16</sup> Rt must be in the range R0-R7. There are no byte, halfword, or doubleword 16-bit instructions.

## 14.52 LDR (register offset) (A32)

Load with register offset, pre-indexed register offset, or post-indexed register offset.

### Syntax

```
LDR{type}{cond} Rt, [Rn, ±Rm {, shift}] ; register offset
LDR{type}{cond} Rt, [Rn, ±Rm {, shift}]! ; pre-indexed ; A32 only
LDR{type}{cond} Rt, [Rn], ±Rm {, shift} ; post-indexed ; A32 only
LDRD{cond} Rt, Rt2, [Rn, ±Rm] ; register offset, doubleword ; A32 only
LDRD{cond} Rt, Rt2, [Rn, ±Rm]! ; pre-indexed, doubleword ; A32 only
LDRD{cond} Rt, Rt2, [Rn], ±Rm ; post-indexed, doubleword ; A32 only
```

where:

#### **type**

can be any one of:

#### **B**

unsigned Byte (Zero extend to 32 bits on loads.)

#### **SB**

signed Byte (LDR only. Sign extend to 32 bits.)

#### **H**

unsigned Halfword (Zero extend to 32 bits on loads.)

#### **SH**

signed Halfword (LDR only. Sign extend to 32 bits.)

#### **-**

omitted, for Word.

#### **cond**

is an optional condition code.

#### **Rt**

is the register to load.

#### **Rn**

is the register on which the memory address is based.

#### **Rm**

is a register containing a value to be used as the offset.  $-Rm$  is not permitted in T32 code.

#### **shift**

is an optional shift.

**Rt2**

is the additional register to load for doubleword operations.

Not all options are available in every instruction set and architecture.

### Offset register and shift options

The following table shows the ranges of offsets and availability of these instructions:

| Instruction                                                                | Where $\pm Rm$ is shown, you can use $-Rm$ , $+Rm$ , or $Rm$ . Where $+Rm$ is shown, you cannot use $-Rm$ .                                                                                                                                                                                                       | $\pm Rn$ | shift        |               |     |
|----------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|--------------|---------------|-----|
| A32, word or byte                                                          | For word loads, Rt can be the PC. A load to the PC causes a branch to the address loaded. In Armv4, bits[1:0] of the address loaded must be 0b00. In Armv5T and above, bits[1:0] must not be 0b10, and if bit[0] is 1, execution continues in T32 state, otherwise execution continues in A32 state. <sup>2</sup> | $\pm Rm$ | LSL<br>#0-31 | LSR<br>#1-32  |     |
|                                                                            |                                                                                                                                                                                                                                                                                                                   |          | ASR<br>#1-32 | ROR<br>#1-31  | RRX |
| A32, signed byte, halfword, or signed halfword                             |                                                                                                                                                                                                                                                                                                                   | $\pm Rm$ |              | Not available |     |
| A32, doubleword                                                            |                                                                                                                                                                                                                                                                                                                   | $\pm Rm$ |              | Not available |     |
| T32 32-bit encoding, word, halfword, signed halfword, byte, or signed byte | Rt, Rn, and Rm must all be in the range R0-R7. <sup>2</sup>                                                                                                                                                                                                                                                       | $+Rm$    | LSL<br>#0-3  |               |     |
| T32 16-bit encoding, all except doubleword 3                               |                                                                                                                                                                                                                                                                                                                   | $+Rm$    |              | Not available |     |

### Register restrictions

In the pre-index and post-index forms,  $Rn$  must be different from  $Rt$ .

### Doubleword register restrictions

For A32 instructions:

- $Rt$  must be an even-numbered register.
- $Rt$  must not be LR.
- Arm strongly recommends that you do not use R12 for  $Rt$ .
- $Rt2$  must be  $R(t + 1)$ .
- $Rm$  must be different from  $Rt$  and  $Rt2$  in LDRD instructions.
- $Rn$  must be different from  $Rt2$  in the pre-index and post-index forms.

### Use of PC

In A32 instructions you can use PC for  $Rt$  in LDR word instructions, and you can use PC for  $Rn$  in LDR instructions with register offset syntax (that is the forms that do not writeback to the  $Rn$ ).

Other uses of PC are not permitted in A32 instructions.

In T32 instructions you can use PC for  $Rt$  in LDR word instructions. Other uses of PC in these T32 instructions are not permitted.

### Use of SP

You can use SP for  $Rn$ .

In A32 code, you can use SP for `Rt` in word instructions. You can use SP for `Rt` in non-word A32 instructions but this is deprecated.

You can use SP for `Rm` in A32 instructions but this is deprecated.

In T32 code, you can use SP for `Rt` in word instructions only. All other use of SP for `Rt` in these instructions are not permitted in T32 code.

Use of SP for `Rm` is not permitted in T32 state.

## Related information

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.53 LDR (register-relative) (A32)

Load register. The address is an offset from a base register.

### Syntax

`LDR{type}{cond}{.W} Rt, label`

`LDRD{cond} Rt, Rt2, label ; Doubleword`

where:

**`type`**

can be any one of:

**B**

unsigned Byte (Zero extend to 32 bits on loads.)

**SB**

signed Byte (`LDR` only. Sign extend to 32 bits.)

**H**

unsigned Halfword (Zero extend to 32 bits on loads.)

**SH**

signed Halfword (`LDR` only. Sign extend to 32 bits.)

**-**

omitted, for Word.

**`cond`**

is an optional condition code.

**.W**

is an optional instruction width specifier.

**Rt**

is the register to load or store.

**Rt2**

is the second register to load or store.

**label**

is a symbol defined by the `FIELD` directive. `label` specifies an offset from the base register which is defined using the `MAP` directive.

`label` must be within a limited distance of the value in the base register.

## Offset range and architectures

The assembler calculates the offset from the base register for you. The assembler generates an error if `label` is out of range.

The following table shows the possible offsets between the label and the current instruction:

**Table 14-13: Register-relative offsets**

| Instruction                                                                                                                  | Offset range             |
|------------------------------------------------------------------------------------------------------------------------------|--------------------------|
| A32 <code>LDR</code> , <code>LDRB</code> <sup>17</sup>                                                                       | $\pm 4095$               |
| A32 <code>LDRSB</code> , <code>LDRH</code> , <code>LDRSH</code>                                                              | $\pm 255$                |
| A32 <code>LDRD</code>                                                                                                        | $\pm 255$                |
| T32, 32-bit <code>LDR</code> , <code>LDRB</code> , <code>LDRSB</code> , <code>LDRH</code> , <code>LDRSH</code> <sup>17</sup> | -255 to 4095             |
| T32, 32-bit <code>LDRD</code>                                                                                                | $\pm 1020$ <sup>18</sup> |
| T32, 16-bit <code>LDR</code> <sup>19</sup>                                                                                   | 0 to 124 <sup>18</sup>   |
| T32, 16-bit <code>LDRH</code> <sup>19</sup>                                                                                  | 0 to 62 <sup>20</sup>    |
| T32, 16-bit <code>LDRB</code> <sup>19</sup>                                                                                  | 0 to 31                  |
| T32, 16-bit <code>LDR</code> , base register is <code>SP</code> <sup>21</sup>                                                | 0 to 1020 <sup>18</sup>  |

## LDR (register-relative) in T32

You can use the `.w` width specifier to force `LDR` to generate a 32-bit instruction in T32 code. `LDR.w` always generates a 32-bit instruction, even if the target could be reached using a 16-bit `LDR`.

For forward references, `LDR` without `.w` always generates a 16-bit instruction in T32 code, even if that results in failure for a target that could be reached using a 32-bit T32 `LDR` instruction.

<sup>17</sup> For word loads, `Rt` can be the PC. A load to the PC causes a branch to the address loaded. In Arm®v4, bits[1:0] of the address loaded must be 0b00. In Armv5T and above, bits[1:0] must not be 0b10, and if bit[0] is 1, execution continues in T32 state, otherwise execution continues in A32 state.

<sup>18</sup> Must be a multiple of 4.

<sup>19</sup> `Rt` and base register must be in the range R0-R7.

<sup>20</sup> Must be a multiple of 2.

<sup>21</sup> `Rt` must be in the range R0-R7.

## Doubleword register restrictions

For 32-bit T32 instructions, you must not specify SP or PC for either  $Rt$  or  $Rt2$ .

For A32 instructions:

- $Rt$  must be an even-numbered register.
- $Rt$  must not be LR.
- Arm strongly recommends that you do not use R12 for  $Rt$ .
- $Rt2$  must be  $R(t + 1)$ .

## Use of PC

You can use PC for  $Rt$  in word instructions. Other uses of PC are not permitted in these instructions.

## Use of SP

In A32 code, you can use SP for  $Rt$  in word instructions. You can use SP for  $Rt$  in non-word A32 instructions but this is deprecated.

In T32 code, you can use SP for  $Rt$  in word instructions only. All other use of SP for  $Rt$  in these instructions are not permitted in T32 code.

## Related information

[Register-relative and PC-relative expressions](#) on page 222

[FIELD](#) on page 1451

[MAP](#) on page 1474

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.54 LDR pseudo-instruction (A32)

Load a register with either a 32-bit immediate value or an address.



This describes the `LDR` pseudo-instruction only, and not the `LDR` instruction.

**Note**

### Syntax

`LDR{cond}{.W} Rt, =expr`

`LDR{cond}{.W} Rt, =label_expr`

where:

***cond***

is an optional condition code.

**.W**

is an optional instruction width specifier.

**Rt**

is the register to be loaded.

**expr**

evaluates to a numeric value.

**label\_expr**

is a PC-relative or external expression of an address in the form of a label plus or minus a numeric value.

## Usage

When using the `LDR` pseudo-instruction:

- If the value of `expr` can be loaded with a valid `MOV` or `MVN` instruction, the assembler uses that instruction.
- If a valid `MOV` or `MVN` instruction cannot be used, or if the `label_expr` syntax is used, the assembler places the constant in a literal pool and generates a PC-relative `LDR` instruction that reads the constant from the literal pool.



- An address loaded in this way is fixed at link time, so the code is not position-independent.
- The address holding the constant remains valid regardless of where the linker places the ELF section containing the `LDR` instruction.

The assembler places the value of `label_expr` in a literal pool and generates a PC-relative `LDR` instruction that loads the value from the literal pool.

If `label_expr` is an external expression, or is not contained in the current section, the assembler places a linker relocation directive in the object file. The linker generates the address at link time.

If `label_expr` is either a named or numeric local label, the assembler places a linker relocation directive in the object file and generates a symbol for that local label. The address is generated at link time. If the local label references T32 code, the T32 bit (bit 0) of the address is set.

The offset from the PC to the value in the literal pool must be less than ±4KB (in an A32 or 32-bit T32 encoding) or in the range 0 to +1KB (16-bit T32 encoding). You are responsible for ensuring that there is a literal pool within range.

If the label referenced is in T32 code, the `LDR` pseudo-instruction sets the T32 bit (bit 0) of `label_expr`.



In RealView Compilation Tools (RVCT) v2.2, the T32 bit of the address was not set. If you have code that relies on this behavior, use the command line option `--untyped_local_labels` to force the assembler not to set the T32 bit when referencing labels in T32 code.

## LDR in T32 code

You can use the `.w` width specifier to force `LDR` to generate a 32-bit instruction in T32 code. `LDR.w` always generates a 32-bit instruction, even if the immediate value could be loaded in a 16-bit `MOV`, or there is a literal pool within reach of a 16-bit PC-relative load.

If the value to be loaded is not known in the first pass of the assembler, `LDR` without `.w` generates a 16-bit instruction in T32 code, even if that results in a 16-bit PC-relative load for a value that could be generated in a 32-bit `MOV` or `MVN` instruction. However, if the value is known in the first pass, and it can be generated using a 32-bit `MOV` or `MVN` instruction, the `MOV` or `MVN` instruction is used.

In UAL syntax, the `LDR` pseudo-instruction never generates a 16-bit flag-setting `MOV` instruction. Use the `--diag_warning 1727` assembler command line option to check when a 16-bit instruction could have been used.

You can use the `MOV32` pseudo-instruction for generating immediate values or addresses without loading from a literal pool.

## Examples

```

LDR    r3,=0xff0      ; loads 0xff0 into R3
      ; => MOV.W r3,#0xff0
LDR    r1,=0xffff     ; loads 0xffff into R1
      ; => LDR r1,[pc,offset_to_litpool]
      ; ...
      ;     litpool DCD 0xffff
LDR    r2,=place      ; loads the address of
      ; place into R2
      ; => LDR r2,[pc,offset_to_litpool]
      ; ...
      ;     litpool DCD place

```

## Related information

[--untyped\\_local\\_labels](#) on page 215

[Numeric constants](#) on page 220

[Register-relative and PC-relative expressions](#) on page 222

[Numeric local labels](#) on page 225

[MOV32 pseudo-instruction \(A32\)](#) on page 335

[Condition code suffixes](#) on page 138

[LTORG](#) on page 1470

## 14.55 LDR, unprivileged (A32)

Unprivileged load byte, halfword, or word.

### Syntax

```
LDR{type}T{cond} Rt, [Rn {, #offset}] ; immediate offset (32-bit T32 encoding only)
LDR{type}T{cond} Rt, [Rn] {, #offset} ; post-indexed (A32 only)
LDR{type}T{cond} Rt, [Rn], ±Rm {, shift} ; post-indexed (register) (A32 only)
```

where:

#### **type**

can be any one of:

##### **B**

unsigned Byte (Zero extend to 32 bits on loads.)

##### **SB**

signed Byte (Sign extend to 32 bits.)

##### **H**

unsigned Halfword (Zero extend to 32 bits on loads.)

##### **SH**

signed Halfword (Sign extend to 32 bits.)

##### **-**

omitted, for Word.

#### **cond**

is an optional condition code.

#### **Rt**

is the register to load.

#### **Rn**

is the register on which the memory address is based.

#### **offset**

is an offset. If offset is omitted, the address is the value in *Rn*.

#### **Rm**

is a register containing a value to be used as the offset. *Rm* must not be PC.

#### **shift**

is an optional shift.

### Operation

When these instructions are executed by privileged software, they access memory with the same restrictions as they would have if they were executed by unprivileged software.

When executed by unprivileged software these instructions behave in the same way as the corresponding load instruction, for example `LDRSBT` behaves in the same way as `LDRSB`.

## Offset ranges and architectures

The following table shows the ranges of offsets and availability of these instructions.

| Instruction                                                                 | Immediate offset | Post-indexed  | $\pm Rm1$ | shift         |
|-----------------------------------------------------------------------------|------------------|---------------|-----------|---------------|
| A32, word or byte You can use -Rm, +Rm, or Rm.                              | Not available    | -4095 to 4095 | $\pm Rm$  | LSL #0-31     |
|                                                                             |                  |               |           | LSR #1-32     |
|                                                                             |                  |               |           | ASR #1-32     |
|                                                                             |                  |               |           | ROR #1-31     |
|                                                                             |                  |               |           | RRX           |
| A32, signed byte, halfword, or signed halfword                              | Not available    | -255 to 255   | $\pm Rm$  | Not available |
| T32, 32-bit encoding, word, halfword, signed halfword, byte, or signed byte | 0 to 255         | Not available |           | Not available |

## Related information

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.56 LDREX (A32)

Load Register Exclusive.

### Syntax

`LDREX{cond} Rt, [Rn {, #offset}]`

`LDREXB{cond} Rt, [Rn]`

`LDREXH{cond} Rt, [Rn]`

`LDREXD{cond} Rt, Rt2, [Rn]`

where:

#### ***cond***

is an optional condition code.

#### ***Rt***

is the register to load.

#### ***Rt2***

is the second register for doubleword loads.

#### ***Rn***

is the register on which the memory address is based.

***offset***

is an optional offset applied to the value in  $Rn$ .  $offset$  is permitted only in 32-bit T32 instructions. If  $offset$  is omitted, an offset of zero is assumed.

**Operation**

`LDREX` loads data from memory.

- If the physical address has the Shared TLB attribute, `LDREX` tags the physical address as exclusive access for the current processor, and clears any exclusive access tag for this processor for any other physical address.
- Otherwise, it tags the fact that the executing processor has an outstanding tagged physical address.

`LDREXB` and `LDREXH` zero extend the value loaded.

**Restrictions**

PC must not be used for any of  $Rt$ ,  $Rt2$ , or  $Rn$ .

For A32 instructions:

- SP can be used but use of SP for any of  $Rt$ , or  $Rt2$  is deprecated.
- For `LDREXD`,  $Rt$  must be an even numbered register, and not LR.
- $Rt2$  must be  $R(t+1)$ .
- $offset$  is not permitted.

For T32 instructions:

- SP can be used for  $Rn$ , but must not be used for  $Rt$  or  $Rt2$ .
- For `LDREXD`,  $Rt$  and  $Rt2$  must not be the same register.
- The value of  $offset$  can be any multiple of four in the range 0-1020.

**Usage**

Use `LDREX` and `STREX` to implement interprocess communication in multiple-processor and shared-memory systems.

For reasons of performance, keep the number of instructions between corresponding `LDREX` and `STREX` instructions to a minimum.



The address used in a `STREX` instruction must be the same as the address in the most recently executed `LDREX` instruction.

**Architectures**

These 32-bit instructions are available in A32 and T32.

The `LDREXLD` instruction is not available in the Arm®v7-M architecture.

There are no 16-bit versions of these instructions in T32.

## Examples

```
MOV r1, #0x1          ; load the 'lock taken' value
try
  LDREX r0, [LockAddr] ; load the lock value
  CMP r0, #0           ; is the lock free?
  STREXEQ r0, r1, [LockAddr] ; try and claim the lock
  CMPEQ r0, #0          ; did this succeed?
  BNE try              ; no - try again
  ....                 ; yes - we have the lock
```

## Related information

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.57 LSL (A32)

Logical Shift Left. This instruction is a preferred synonym for MOV instructions with shifted register operands.

### Syntax

`LSL{S}{cond} Rd, Rm, Rs`

`LSL{S}{cond} Rd, Rm, #sh`

where:

**S**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

**Rd**

is the destination register.

**Rm**

is the register holding the first operand. This operand is shifted left.

**Rs**

is a register holding a shift value to apply to the value in `Rm`. Only the least significant byte is used.

**sh**

is a constant shift. The range of values permitted is 0-31.

### Operation

`LSL` provides the value of a register multiplied by a power of two, inserting zeros into the vacated bit positions.

## Restrictions in T32 code

T32 instructions must not use PC or SP.

You cannot specify zero for the *sh* value in an `LSL` instruction in an IT block.

## Use of SP and PC in A32 instructions

You can use SP in these A32 instructions but this is deprecated.

You cannot use PC in instructions with the `LSL{S} cond Rd, Rm, Rs` syntax. You can use PC for *Rd* and *Rm* in the other syntax, but this is deprecated.

If you use PC as *Rm*, the value used is the address of the instruction plus 8.

If you use PC as *Rd*:

- Execution branches to the address corresponding to the result.
- If you use the *s* suffix, the SPSR of the current mode is copied to the CPSR. You can use this to return from exceptions.



The A32 instruction `LSLS{cond} pc, Rm, #sh` always disassembles to the preferred form `MOVS{cond} pc, Rm{, shift}`.



Do not use the *s* suffix when using PC as *Rd* in User mode or System mode. The assembler cannot warn you about this because it has no information about what the processor mode is likely to be at execution time.

You cannot use PC for *Rd* or any operand in the `LSL` instruction if it has a register-controlled shift.

## Condition flags

If *s* is specified, the `LSL` instruction updates the N and Z flags according to the result.

The C flag is unaffected if the shift value is 0. Otherwise, the C flag is updated to the last bit shifted out.

## 16-bit instructions

The following forms of these instructions are available in T32 code, and are 16-bit instructions:

### **`LSLS Rd, Rm, #sh`**

*Rd* and *Rm* must both be Lo registers. This form can only be used outside an IT block.

### **`LSL{cond} Rd, Rm, #sh`**

*Rd* and *Rm* must both be Lo registers. This form can only be used inside an IT block.

### **`LSLS Rd, Rd, Rs`**

*Rd* and *Rs* must both be Lo registers. This form can only be used outside an IT block.

**LSL{cond} Rd, Rd, Rs**

*Rd* and *Rs* must both be Lo registers. This form can only be used inside an IT block.

**Architectures**

This 32-bit instruction is available in A32 and T32.

This 16-bit T32 instruction is available in T32.

**Example**

```
LSLS      r1, r2, r3
```

**Related information**

[MOV \(A32\)](#) on page 333

[Condition code suffixes](#) on page 138

## 14.58 LSR (A32)

Logical Shift Right. This instruction is a preferred synonym for `mov` instructions with shifted register operands.

**Syntax**

```
LSR{S}{cond} Rd, Rm, Rs
```

```
LSR{S}{cond} Rd, Rm, #sh
```

where:

**S**

is an optional suffix. If *s* is specified, the condition flags are updated on the result of the operation.

**Rd**

is the destination register.

**Rm**

is the register holding the first operand. This operand is shifted right.

**Rs**

is a register holding a shift value to apply to the value in *Rm*. Only the least significant byte is used.

**sh**

is a constant shift. The range of values permitted is 1-32.

**Operation**

`LSR` provides the unsigned value of a register divided by a variable power of two, inserting zeros into the vacated bit positions.

## Restrictions in T32 code

T32 instructions must not use PC or SP.

## Use of SP and PC in A32 instructions

You can use SP in these A32 instructions but they are deprecated.

You cannot use PC in instructions with the `LSR{S} cond Rd, Rm, Rs` syntax. You can use PC for `Rd` and `Rm` in the other syntax, but this is deprecated.

If you use PC as `Rm`, the value used is the address of the instruction plus 8.

If you use PC as `Rd`:

- Execution branches to the address corresponding to the result.
- If you use the s suffix, the SPSR of the current mode is copied to the CPSR. You can use this to return from exceptions.



The A32 instruction `LSR{S} cond Rd, Rm, #sh` always disassembles to the preferred form `LSR{S} cond Rd, Rm, shift`.



Do not use the s suffix when using PC as `Rd` in User mode or System mode. The assembler cannot warn you about this because it has no information about what the processor mode is likely to be at execution time.

You cannot use PC for `Rd` or any operand in the `LSR` instruction if it has a register-controlled shift.

## Condition flags

If s is specified, the instruction updates the N and Z flags according to the result.

The C flag is unaffected if the shift value is 0. Otherwise, the C flag is updated to the last bit shifted out.

## 16-bit instructions

The following forms of these instructions are available in T32 code, and are 16-bit instructions:

### `LSRS Rd, Rm, #sh`

`Rd` and `Rm` must both be Lo registers. This form can only be used outside an IT block.

### `LSR{cond} Rd, Rm, #sh`

`Rd` and `Rm` must both be Lo registers. This form can only be used inside an IT block.

### `LSRS Rd, Rd, Rs`

`Rd` and `Rs` must both be Lo registers. This form can only be used outside an IT block.

**LSR{cond} Rd, Rd, Rs**

*Rd* and *Rs* must both be Lo registers. This form can only be used inside an IT block.

**Architectures**

This 32-bit instruction is available in A32 and T32.

This 16-bit T32 instruction is available in T32.

**Example**

```
LSR      r4, r5, r6
```

**Related information**

[MOV \(A32\)](#) on page 333

[Condition code suffixes](#) on page 138

## 14.59 MCR and MCR2 (A32)

Move to Coprocessor from general-purpose register. Depending on the coprocessor, you might be able to specify various additional operations.



MCR2 is not supported in Arm®v8.

**Note**

**Syntax**

```
MCR{cond} coproc, #opcode1, Rt, CRn, CRm{}, #opcode2}
```

```
MCR2{cond} coproc, #opcode1, Rt, CRn, CRm{}, #opcode2}
```

where:

**cond**

is an optional condition code.

In A32 code, *cond* is not permitted for MCR2.

**coproc**

is the name of the coprocessor the instruction is for. The standard name is  $p_n$ , where  $n$  is an integer that must be:

- In the range 0-15 in Armv7 and earlier.
- 14 or 15 in Armv8.

**opcode1**

is a 3-bit coprocessor-specific opcode.

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

Page 329 of 1493

***opcode2***

is an optional 3-bit coprocessor-specific opcode.

***Rt***

is a general-purpose register. *Rt* must not be PC.

***CRn, CRm***

are coprocessor registers.

**Usage**

The use of these instructions depends on the coprocessor. See the coprocessor documentation for details.

**Architectures**

These 32-bit instructions are available in A32 and T32.

There are no 16-bit versions of these instructions in T32.

**Related information**

[Condition code suffixes](#) on page 138

## 14.60 MCRR and MCRR2 (A32)

Move to Coprocessor from two general-purpose registers. Depending on the coprocessor, you might be able to specify various additional operations.



MCRR2 is not supported in Arm®v8.

Note

**Syntax**

`MCRR{cond} coproc, #opcode, Rt, Rt2, CRn`

`MCRR2{cond} coproc, #opcode, Rt, Rt2, CRn`

where:

***cond***

is an optional condition code.

In A32 code, *cond* is not permitted for MCRR2.

***coproc***

is the name of the coprocessor the instruction is for. The standard name is `pn`, where *n* is an integer that must be:

- In the range 0-15 in Armv7 and earlier.
- 14 or 15 in Armv8.

***opcode***

is a 4-bit coprocessor-specific opcode.

***Rt, Rt2***

are general-purpose registers. *Rt* and *Rt2* must not be PC.

***CRn***

is a coprocessor register.

**Usage**

The use of these instructions depends on the coprocessor. See the coprocessor documentation for details.

**Architectures**

These 32-bit instructions are available in A32 and T32.

There are no 16-bit versions of these instructions in T32.

**Related information**

[Condition code suffixes](#) on page 138

## 14.61 MLA (A32)

Multiply-Accumulate with signed or unsigned 32-bit operands, giving the least significant 32 bits of the result.

**Syntax**

`MLA{S} {cond} Rd, Rn, Rm, Ra`

where:

***cond***

is an optional condition code.

***S***

is an optional suffix. If *s* is specified, the condition flags are updated on the result of the operation.

***Rd***

is the destination register.

***Rn, Rm***

are registers holding the values to be multiplied.

***Ra***

is a register holding the value to be added.

## Operation

The `MLA` instruction multiplies the values from  $Rn$  and  $Rm$ , adds the value from  $Ra$ , and places the least significant 32 bits of the result in  $Rd$ .

## Register restrictions

You cannot use PC for any register.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

If s is specified, the `MLA` instruction:

- Updates the N and Z flags according to the result.
- Does not affect the C or V flag.

## Architectures

This instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Example

```
MLA      r10, r2, r1, r5
```

## Related information

[Condition code suffixes](#) on page 138

## 14.62 MLS (A32)

Multiply-Subtract, with signed or unsigned 32-bit operands, giving the least significant 32 bits of the result.

### Syntax

```
MLS{cond} Rd, Rn, Rm, Ra
```

where:

**cond**

is an optional condition code.

**s**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

**Rd**

is the destination register.

**Rn, Rm**

are registers holding the values to be multiplied.

**Ra**

is a register holding the value to be subtracted from.

**Operation**

The `MLS` instruction multiplies the values in `Rn` and `Rm`, subtracts the result from the value in `Ra`, and places the least significant 32 bits of the final result in `Rd`.

**Register restrictions**

You cannot use PC for any register.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

**Architectures**

This instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

**Example**

```
MLS      r4, r5, r6, r7
```

**Related information**

[Condition code suffixes](#) on page 138

## 14.63 MOV (A32)

Move.

**Syntax**

```
MOV{S}{cond} Rd, Operand2
```

```
MOV{cond} Rd, #imm16
```

where:

**S**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**Rd**

is the destination register.

***Operand2***

is a flexible second operand.

***imm16***

is any value in the range 0-65535.

**Operation**

The `MOV` instruction copies the value of *operand2* into *Rd*.

In certain circumstances, the assembler can substitute `MVN` for `MOV`, or `MOV` for `MVN`. Be aware of this when reading disassembly listings.

**Use of PC and SP in 32-bit T32 encodings**

You cannot use PC (*R15*) for *Rd*, or in *operand2*, in 32-bit T32 `MOV` instructions. With the following exceptions, you cannot use SP (*R13*) for *Rd*, or in *operand2*:

- `MOV{cond}.W Rd, SP`, where *Rd* is not SP.
- `MOV{cond}.W SP, Rm`, where *Rm* is not SP.

**Use of PC and SP in 16-bit T32 encodings**

You can use PC or SP in 16-bit T32 `MOV{cond} Rd, Rm` instructions but these instructions in which both *Rd* and *Rm* are SP or PC are deprecated.

You cannot use PC or SP in any other `MOV{s}` 16-bit T32 instructions.

**Use of PC and SP in A32 MOV**

You cannot use PC for *Rd* or any operand in any data processing instruction that has a register-controlled shift.

In instructions without register-controlled shift, the use of PC is deprecated except for the following cases:

- `MOVS PC, LR`.
- `MOV PC, Rm` When *Rm* is not PC or SP.
- `MOV Rd, PC` When *Rd* is not PC or SP.

You can use SP for *Rd* or *Rm*. But this is deprecated except for the following cases:

- `MOV SP, Rm` When *Rm* is not PC or SP.
- `MOV Rd, SP` when *Rd* is not PC or SP.



You cannot use PC for *Rd* in `MOV Rd, #imm16` if the `#imm16` value is not a permitted *operand2* value. You can use PC in forms with *operand2* without register-controlled shift.

If you use PC as *Rm*, the value used is the address of the instruction plus 8.

If you use PC as  $Rd$ :

- Execution branches to the address corresponding to the result.
- If you use the s suffix, see the `SUBS pc, lr` instruction.

## Condition flags

If s is specified, the instruction:

- Updates the N and Z flags according to the result.
- Can update the C flag during the calculation of *operand2*.
- Does not affect the V flag.

## 16-bit instructions

The following forms of this instruction are available in T32 code, and are 16-bit instructions:

**MOVS Rd, #imm**

$Rd$  must be a Lo register.  $imm$  range 0-255. This form can only be used outside an IT block.

**MOV{cond} Rd, #imm**

$Rd$  must be a Lo register.  $imm$  range 0-255. This form can only be used inside an IT block.

**MOVS Rd, Rm**

$Rd$  and  $Rm$  must both be Lo registers. This form can only be used outside an IT block.

**MOV{cond} Rd, Rm**

$Rd$  or  $Rm$  can be Lo or Hi registers.

## Availability

These instructions are available in A32 and T32.

In T32, 16-bit and 32-bit versions of these instructions are available.

## Related information

[Flexible second operand \(Operand2\) on page 244](#)

[SUBS pc, lr \(A32\) on page 444](#)

[Condition code suffixes on page 138](#)

[Load immediate values using MOV and MVN on page 106](#)

## 14.64 MOV32 pseudo-instruction (A32)

Load a register with either a 32-bit immediate value or any address.

### Syntax

`MOV32{cond} Rd, expr`

where:

***cond***

is an optional condition code.

***Rd***

is the register to be loaded. *Rd* must not be SP or PC.

***expr***

can be any one of the following:

***symbol1***

A label in this or another program area.

***#constant***

Any 32-bit immediate value.

***symbol1 + constant***

A label plus a 32-bit immediate value.

## Usage

`MOV32` always generates two 32-bit instructions, a `MOV`, `MOVT` pair. This enables you to load any 32-bit immediate, or to access the whole 32-bit address space.

The main purposes of the `MOV32` pseudo-instruction are:

- To generate literal constants when an immediate value cannot be generated in a single instruction.
- To load a PC-relative or external address into a register. The address remains valid regardless of where the linker places the ELF section containing the `MOV32`.



An address loaded in this way is fixed at link time, so the code is not position-independent.

`MOV32` sets the T32 bit (bit 0) of the address if the label referenced is in T32 code.

## Architectures

This pseudo-instruction is available in A32 and T32.

## Examples

```
MOV32 r3, #0xABCD12 ; loads 0xABCD12 into R3
MOV32 r1, Trigger+12 ; loads the address that is 12 bytes
                      ; higher than the address Trigger into R1
```

## Related information

[Condition code suffixes](#) on page 138

## 14.65 MOVT (A32)

Move Top.

### Syntax

```
MOVT {cond} Rd, #imm16
```

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**imm16**

is a 16-bit immediate value.

### Usage

MOVT writes *imm16* to *Rd*[31:16], without affecting *Rd*[15:0].

You can generate any 32-bit immediate with a `MOV`, `MOVT` instruction pair. The assembler implements the `MOV32` pseudo-instruction for convenient generation of this instruction pair.

### Register restrictions

You cannot use PC in A32 or T32 instructions.

You can use SP for *Rd* in A32 instructions but this is deprecated.

You cannot use SP in T32 instructions.

### Condition flags

This instruction does not change the flags.

### Architectures

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

### Related information

[MOV32 pseudo-instruction \(A32\)](#) on page 335

[Condition code suffixes](#) on page 138

## 14.66 MRC and MRC2 (A32)

Move to general-purpose register from Coprocessor. Depending on the coprocessor, you might be able to specify various additional operations.



MRC2 is not supported in Arm®v8.

**Note**

### Syntax

`MRC{cond} coproc, #opcode1, Rt, CRn, CRM{}, #opcode2}`

`MRC2{cond} coproc, #opcode1, Rt, CRn, CRM{}, #opcode2}`

where:

**cond**

is an optional condition code.

In A32 code, `cond` is not permitted for MRC2.

**coproc**

is the name of the coprocessor the instruction is for. The standard name is `pn`, where `n` is an integer that must be:

- In the range 0-15 in Armv7 and earlier.
- 14 or 15 in Armv8.

**opcode1**

is a 3-bit coprocessor-specific opcode.

**opcode2**

is an optional 3-bit coprocessor-specific opcode.

**Rt**

is the general-purpose register. `Rt` must not be PC.

`Rt` can be `APSR_nzcv`. This means that the coprocessor executes an instruction that changes the value of the condition flags in the APSR.

**CRn, CRM**

are coprocessor registers.

### Usage

The use of these instructions depends on the coprocessor. See the coprocessor documentation for details.

## Architectures

These 32-bit instructions are available in A32 and T32.

There are no 16-bit versions of these instructions in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.67 MRRC and MRRC2 (A32)

Move to two general-purpose registers from coprocessor. Depending on the coprocessor, you might be able to specify various additional operations.



MRRC2 is not supported in Arm®v8.

Note

### Syntax

`MRRC {cond} coproc, #opcode, Rt, Rt2, CRm`

`MRRC2 {cond} coproc, #opcode, Rt, Rt2, CRm`

where:

**`cond`**

is an optional condition code.

In A32 code, `cond` is not permitted for `MRRC2`.

**`coproc`**

is the name of the coprocessor the instruction is for. The standard name is `pn`, where `n` is an integer that must be:

- In the range 0-15 in Armv7 and earlier.
- 14 or 15 in Armv8.

**`opcode`**

is a 4-bit coprocessor-specific opcode.

**`Rt, Rt2`**

are general-purpose registers. `Rt` and `Rt2` must not be PC.

**`CRm`**

is a coprocessor register.

## Usage

The use of these instructions depends on the coprocessor. See the coprocessor documentation for details.

## Architectures

These 32-bit instructions are available in A32 and T32.

There are no 16-bit versions of these instructions in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.68 MRS (PSR to general-purpose register) (A32)

Move the contents of a PSR to a general-purpose register.

### Syntax

MRS{cond} Rd, psr

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**psr**

is one of:

**APSR**

on any processor, in any mode.

**CPSR**

deprecated synonym for APSR and for use in Debug state, on any processor except Arm®v7-M and Armv6-M.

**SPSR**

on any processor, except Armv6-M, Armv7-M, Armv8-M.baseline, and Armv8-M.mainline, in privileged software execution only.

**Mpsr**

on Armv6-M, Armv7-M, Armv8-M.baseline, and Armv8-M.mainline processors only.

**Mpsr**

can be any of: IPSR, EPSR, IEPSR, IAPSR, EAPSR, MSP, PSP, XPSR, PRIMASK, BASEPRI, BASEPRI\_MAX, FAULTMASK, OR CONTROL.

## Usage

Use `MRS` in combination with `MSR` as part of a read-modify-write sequence for updating a PSR, for example to change processor mode, or to clear the Q flag.

In process swap code, the programmers' model state of the process being swapped out must be saved, including relevant PSR contents. Similarly, the state of the process being swapped in must also be restored. These operations make use of `MRS`/store and load/`MSR` instruction sequences.

## SPSR

You must not attempt to access the SPSR when the processor is in User or System mode. This is your responsibility. The assembler cannot warn you about this, because it has no information about the processor mode at execution time.

## CPSR

Arm deprecates reading the CPSR endianness bit (E) with an `MRS` instruction.

The CPSR execution state bits, other than the E bit, can only be read when the processor is in Debug state, halting debug-mode. Otherwise, the execution state bits in the CPSR read as zero.

The condition flags can be read in any mode on any processor. Use APSR if you are only interested in accessing the condition flags in User mode.

## Register restrictions

You cannot use PC for `Rd` in A32 instructions. You can use SP for `Rd` in A32 instructions but this is deprecated.

You cannot use PC or SP for `Rd` in T32 instructions.

## Condition flags

This instruction does not change the flags.

## Architectures

This instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Related information

[MRS \(system coprocessor register to general-purpose register\) \(A32\)](#) on page 341

[MSR \(general-purpose register to system coprocessor register\) \(A32\)](#) on page 342

[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

[Condition code suffixes](#) on page 138

[Current Program Status Register in AArch32 state](#) on page 84

## 14.69 MRS (system coprocessor register to general-purpose register) (A32)

Move to general-purpose register from system coprocessor register.

### Syntax

`MRS{cond} Rn, coproc_register`

`MRS{cond} APSR_nzcv, special_register`

where:

***cond***

is an optional condition code.

***coproc\_register***

is the name of the coprocessor register.

***special\_register***

is the name of the coprocessor register that can be written to `APSR_nzcv`. This is only possible for the coprocessor register DBGDSCRint.

***Rn***

is the general-purpose register. `Rn` must not be PC.

### Usage

You can use this pseudo-instruction to read CP14 or CP15 coprocessor registers, except for write-only registers. A complete list of the applicable coprocessor register names is in the *Armv7-AR Architecture Reference Manual*. For example:

```
MRS R1, SCTLR ; writes the contents of the CP15 coprocessor
                  ; register SCTLR into R1
```

### Architectures

This pseudo-instruction is available in Arm®v7-A and Armv7-R in A32 and 32-bit T32 code.

There is no 16-bit version of this pseudo-instruction in T32.

### Related information

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[MSR \(general-purpose register to system coprocessor register\) \(A32\)](#) on page 342

[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

[Condition code suffixes](#) on page 138

[A-Profile Architectures](#)

## 14.70 MSR (general-purpose register to system coprocessor register) (A32)

Move to system coprocessor register from general-purpose register.

### Syntax

```
MSR{cond} coproc_register, Rn
```

where:

**cond**

is an optional condition code.

**coproc\_register**

is the name of the coprocessor register.

**Rn**

is the general-purpose register.  $R_n$  must not be PC.

### Usage

You can use this pseudo-instruction to write to any CP14 or CP15 coprocessor writable register. A complete list of the applicable coprocessor register names is in the *Arm Architecture Reference Manual*. For example:

```
MSR SCLTR, R1 ; writes the contents of R1 into the CP15
                  ; coprocessor register SCLTR
```

### Availability

This pseudo-instruction is available in A32 and T32.

This pseudo-instruction is available in Arm®v7-A and Armv7-R in A32 and 32-bit T32 code.

There is no 16-bit version of this pseudo-instruction in T32.

### Related information

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[MRS \(system coprocessor register to general-purpose register\) \(A32\)](#) on page 341

[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

[Condition code suffixes](#) on page 138

[SYS \(A32\)](#) on page 456

[A-Profile Architectures](#)

## 14.71 MSR (general-purpose register to PSR) (A32)

Load an immediate value, or the contents of a general-purpose register, into the specified fields of a Program Status Register (PSR).

### Syntax

```
MSR{cond} APSR_flags, Rm
```

where:

**cond**

is an optional condition code.

**flags**

specifies the APSR flags to be moved. *flags* can be one or more of:

**nzcvq**

ALU flags field mask, PSR[31:27] (User mode)

**g**

SIMD GE flags field mask, PSR[19:16] (User mode).

**Rm**

is the general-purpose register. *Rm* must not be PC.

### Syntax on architectures other than Armv6-M, Armv7-M, Armv8-M Baseline, and Armv8-M Mainline

```
MSR{cond} APSR_flags , #constant
```

```
MSR{cond} psr_fields, #constant
```

```
MSR{cond} psr_fields, Rm
```

where:

**cond**

is an optional condition code.

**flags**

specifies the APSR flags to be moved. *flags* can be one or more of:

**nzcvq**

ALU flags field mask, PSR[31:27] (User mode)

**g**

SIMD GE flags field mask, PSR[19:16] (User mode).

**constant**

is an expression evaluating to a numeric value. The value must correspond to an 8-bit pattern rotated by an even number of bits within a 32-bit word. Not available in T32.

**Rm**

is the source register.  $R_m$  must not be PC.

**psr**

is one of:

**CPSR**

for use in Debug state, also deprecated synonym for APSR

**SPSR**

on any processor, in privileged software execution only.

**fields**

specifies the SPSR or CPSR fields to be moved. *fields* can be one or more of:

**c**

control field mask byte, PSR[7:0] (privileged software execution)

**x**

extension field mask byte, PSR[15:8] (privileged software execution)

**s**

status field mask byte, PSR[23:16] (privileged software execution)

**f**

flags field mask byte, PSR[31:24] (privileged software execution).

## Syntax on architectures Armv6-M, Armv7-M, Armv8-M Baseline, and Armv8-M Mainline only

`MSR{cond} psr , Rm`

where:

**cond**

is an optional condition code.

**Rm**

is the source register.  $R_m$  must not be PC.

**psr**

can be any of: APSR, IPSR, EPSR, IEPSR, IAPSR, EAPSR, XPSR, MSP, PSP, PRIMASK, BASEPRI, BASEPRI\_MAX, FAULTMASK, OR CONTROL.

## Usage

In User mode:

- Use APSR to access the condition flags, Q, or GE bits.
- Writes to unallocated, privileged or execution state bits in the CPSR are ignored. This ensures that User mode programs cannot change to privileged software execution.

Arm deprecates using `MSR` to change the endianness bit (E) of the CPSR, in any mode.

You must not attempt to access the SPSR when the processor is in User or System mode.

### Register restrictions

You cannot use PC in A32 instructions. You can use SP for  $Rm$  in A32 instructions but this is deprecated.

You cannot use PC or SP in T32 instructions.

### Condition flags

This instruction updates the flags explicitly if the `APSR_nzcvq` or `CPSR_f` field is specified.

### Architectures

This instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

### Related information

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[MRS \(system coprocessor register to general-purpose register\) \(A32\)](#) on page 341

[MSR \(general-purpose register to system coprocessor register\) \(A32\)](#) on page 342

[Condition code suffixes](#) on page 138

## 14.72 MUL (A32)

Multiply with signed or unsigned 32-bit operands, giving the least significant 32 bits of the result.

### Syntax

`MUL{S}{cond} Rd, Rn, Rm`

where:

**cond**

is an optional condition code.

**S**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

**Rd**

is the destination register.

**Rn, Rm**

are registers holding the values to be multiplied.

### Operation

The `MUL` instruction multiplies the values from  $Rn$  and  $Rm$ , and places the least significant 32 bits of the result in  $Rd$ .

## Register restrictions

You cannot use PC for any register.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

If s is specified, the `MUL` instruction:

- Updates the N and Z flags according to the result.
- Does not affect the C or V flag.

## 16-bit instructions

The following forms of the `MUL` instruction are available in T32 code, and are 16-bit instructions:

**MULS Rd, Rn, Rd**

`Rd` and `Rn` must both be Lo registers. This form can only be used outside an IT block.

**MUL{cond} Rd, Rn, Rd**

`Rd` and `Rn` must both be Lo registers. This form can only be used inside an IT block.

There are no other T32 multiply instructions that can update the condition flags.

## Availability

This instruction is available in A32 and T32.

The `MULS` instruction is available in T32 in a 16-bit encoding.

## Examples

```
MUL      r10, r2, r5
MULS    r0, r2, r2
MULLT   r2, r3, r2
```

## Related information

[Condition code suffixes](#) on page 138

## 14.73 MVN (A32)

Move Not.

### Syntax

`MVN{S}{cond} Rd, Operand2`

where:

**s**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Operand2***

is a flexible second operand.

## Operation

The `MVN` instruction takes the value of *operand2*, performs a bitwise logical NOT operation on the value, and places the result into *Rd*.

In certain circumstances, the assembler can substitute `MVN` for `Mov`, or `Mov` for `MVN`. Be aware of this when reading disassembly listings.

### Use of PC and SP in 32-bit T32 MVN

You cannot use PC (`r15`) for *Rd*, or in *operand2*, in 32-bit T32 `MVN` instructions. You cannot use SP (`r13`) for *Rd*, or in *operand2*.

### Use of PC and SP in 16-bit T32 instructions

You cannot use PC or SP in any `MVN{S}` 16-bit T32 instructions.

### Use of PC and SP in A32 MVN

You cannot use PC for *Rd* or any operand in any data processing instruction that has a register-controlled shift.

In instructions without register-controlled shift, use of PC is deprecated.

You can use SP for *Rd* or *Rm*, but this is deprecated.



PC and SP in A32 instructions are deprecated.

Note

If you use PC as *Rm*, the value used is the address of the instruction plus 8.

If you use PC as *Rd*:

- Execution branches to the address corresponding to the result.
- If you use the `s` suffix, see the `SUBS pc, lr` instruction.

### Condition flags

If `s` is specified, the instruction:

- Updates the N and Z flags according to the result.
- Can update the C flag during the calculation of *operand2*.

- Does not affect the V flag.

## 16-bit instructions

The following forms of this instruction are available in T32 code, and are 16-bit instructions:

**MVNS Rd, Rm**

Rd and Rm must both be Lo registers. This form can only be used outside an IT block.

**MVN{cond} Rd, Rm**

Rd and Rm must both be Lo registers. This form can only be used inside an IT block.

## Architectures

This instruction is available in A32 and T32.

### Correct example

```
MVNNE    r11, #0xF000000B ; A32 only. This immediate value is not
                           ; available in T32.
```

### Incorrect example

```
MVN      pc,r3,ASR r0      ; PC not permitted with
                           ; register-controlled shift
```

### Related information

[Flexible second operand \(Operand2\)](#) on page 244

[SUBS pc, lr \(A32\)](#) on page 444

[Condition code suffixes](#) on page 138

[Load immediate values using MOV and MVN](#) on page 106

## 14.74 NEG pseudo-instruction (A32)

Negate the value in a register.

### Syntax

NEG{cond} Rd, Rm

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm**

is the register containing the value that is subtracted from zero.

## Operation

The `NEG` pseudo-instruction negates the value in one register and stores the result in a second register.

`NEG {cond} Rd, Rm` assembles to `RSBS {cond} Rd, Rm, #0`.

## Architectures

The 32-bit encoding of this pseudo-instruction is available in A32 and T32.

There is no 16-bit encoding of this pseudo-instruction available T32.

## Register restrictions

In A32 instructions, using SP or PC for `Rd` or `Rm` is deprecated. In T32 instructions, you cannot use SP or PC for `Rd` or `Rm`.

## Condition flags

This pseudo-instruction updates the condition flags, based on the result.

## Related information

[ADD \(A32\)](#) on page 253

## 14.75 NOP (A32)

No Operation.

### Syntax

`NOP {cond}`

where:

`cond`

is an optional condition code.

### Usage

`NOP` does nothing. If `NOP` is not implemented as a specific instruction on your target architecture, the assembler treats it as a pseudo-instruction and generates an alternative instruction that does nothing, such as `MOV r0, r0` (A32) or `MOV r8, r8` (T32).

`NOP` is not necessarily a time-consuming `NOP`. The processor might remove it from the pipeline before it reaches the execution stage.

You can use `NOP` for padding, for example to place the following instruction on a 64-bit boundary in A32, or a 32-bit boundary in T32.

## Architectures

This instruction is available in A32 and T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.76 ORN (T32 only)

Logical OR NOT.

### Syntax

`ORN{S}{cond} Rd, Rn, Operand2`

where:

**s**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Operand2**

is a flexible second operand.

### Operation

The `ORN` T32 instruction performs an OR operation on the bits in `Rn` with the complements of the corresponding bits in the value of `operand2`.

In certain circumstances, the assembler can substitute `ORN` for `ORR`, or `ORR` for `ORN`. Be aware of this when reading disassembly listings.

### Use of PC

You cannot use PC (`r15`) for `Rd` or any operand in the `ORN` instruction.

### Condition flags

If s is specified, the `ORN` instruction:

- Updates the N and Z flags according to the result.
- Can update the C flag during the calculation of `operand2`.
- Does not affect the V flag.

### Examples

```
ORN      r7, r11, lr, ROR #4
```

```
ORNS      r7, r11, lr, ASR #32
```

## Architectures

This 32-bit instruction is available in T32.

There is no A32 or 16-bit T32 `ORN` instruction.

## Related information

[Flexible second operand \(Operand2\) on page 244](#)

[SUBS pc, lr \(A32\) on page 444](#)

[Condition code suffixes on page 138](#)

## 14.77 ORR (A32)

Logical OR.

### Syntax

`ORR{S}{cond} Rd, Rn, Operand2`

where:

**S**

is an optional suffix. If S is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Operand2**

is a flexible second operand.

### Operation

The `ORR` instruction performs bitwise OR operations on the values in `Rn` and `Operand2`.

In certain circumstances, the assembler can substitute `ORN` for `ORR`, or `ORR` for `ORN`. Be aware of this when reading disassembly listings.

### Use of PC in 32-bit T32 instructions

You cannot use PC (`R15`) for `Rd` or any operand with the `ORR` instruction.

## Use of PC and SP in A32 instructions

You can use PC and SP with the `ORR` instruction but this is deprecated.

If you use PC as  $Rn$ , the value used is the address of the instruction plus 8.

If you use PC as  $Rd$ :

- Execution branches to the address corresponding to the result.
- If you use the s suffix, see the `SUBS pc, lr` instruction.

You cannot use PC for any operand in any data processing instruction that has a register-controlled shift.

## Condition flags

If s is specified, the `ORR` instruction:

- Updates the N and Z flags according to the result.
- Can update the C flag during the calculation of *operand2*.
- Does not affect the V flag.

## 16-bit instructions

The following forms of the `ORR` instruction are available in T32 code, and are 16-bit instructions:

### `ORRS Rd, Rd, Rm`

$Rd$  and  $Rm$  must both be Lo registers. This form can only be used outside an IT block.

### `ORR{cond} Rd, Rd, Rm`

$Rd$  and  $Rm$  must both be Lo registers. This form can only be used inside an IT block.

It does not matter if you specify `ORR{S} Rd, Rm, Rd`. The instruction is the same.

## Example

```
ORREQ    r2,r0,r5
```

## Related information

[Flexible second operand \(Operand2\)](#) on page 244

[SUBS pc, lr \(A32\)](#) on page 444

[Condition code suffixes](#) on page 138

## 14.78 PKHBT and PKHTB (A32)

Halfword Packing instructions that combine a halfword from one register with a halfword from another register. One of the operands can be shifted before extraction of the halfword.

### Syntax

`PKHBT{cond} {Rd}, Rn, Rm{, LSL #leftshift}`

`PKHTB{cond} {Rd}, Rn, Rm{, ASR #rightshift}`

where:

**PKHBT**

Combines bits[15:0] of *Rn* with bits[31:16] of the shifted value from *Rm*.

**PKHTB**

Combines bits[31:16] of *Rn* with bits[15:0] of the shifted value from *Rm*.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Rm**

is the register holding the second operand.

**leftshift**

is in the range 0 to 31.

**rightshift**

is in the range 1 to 32.

### Register restrictions

You cannot use PC for any register.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

These instructions do not change the flags.

### Architectures

These instructions are available in A32.

These 32-bit instructions are available T32. For the Arm®v7-M architecture, they are only available in an Armv7E-M implementation.

There are no 16-bit versions of these instructions in T32.

## Correct examples

```
PKHBT    r0, r3, r5          ; combine the bottom halfword of R3
          ; with the top halfword of R5
PKHBT    r0, r3, r5, LSL #16 ; combine the bottom halfword of R3
          ; with the bottom halfword of R5
PKHTB    r0, r3, r5, ASR #16 ; combine the top halfword of R3
          ; with the top halfword of R5
```

You can also scale the second operand by using different values of shift.

## Incorrect example

```
PKHBTEQ r4, r5, r1, ASR #8 ; ASR not permitted with PKHBT
```

## Related information

[Condition code suffixes](#) on page 138

## 14.79 PLD, PLDW, and PLI (A32)

Preload Data and Preload Instruction allow the processor to signal the memory system that a data or instruction load from an address is likely in the near future.

### Syntax

`PLtype{cond} [Rn {, #offset}]`

`PLtype{cond} [Rn, ±Rm {, shift}]`

`PLtype{cond} label`

where:

#### **type**

can be one of:

#### **D**

Data address.

#### **DW**

Data address with intention to write.

#### **I**

Instruction address.

`type` cannot be `DW` if the syntax specifies `label`.

#### **cond**

is an optional condition code.



*cond* is permitted only in T32 code, using a preceding `IT` instruction, but this is deprecated in the Arm®v8 architecture. This is an unconditional instruction in A32 code and you must not use *cond*.

**Rn**

is the register on which the memory address is based.

**offset**

is an immediate offset. If offset is omitted, the address is the value in *Rn*.

**Rm**

is a register containing a value to be used as the offset.

**shift**

is an optional shift.

**label**

is a PC-relative expression.

## Range of offsets

The offset is applied to the value in *Rn* before the preload takes place. The result is used as the memory address for the preload. The range of offsets permitted is:

- -4095 to +4095 for A32 instructions.
- -255 to +4095 for T32 instructions, when *Rn* is not PC.
- -4095 to +4095 for T32 instructions, when *Rn* is PC.

The assembler calculates the offset from the PC for you. The assembler generates an error if *label* is out of range.

## Register or shifted register offset

In A32 code, the value in *Rm* is added to or subtracted from the value in *Rn*. In T32 code, the value in *Rm* can only be added to the value in *Rn*. The result is used as the memory address for the preload.

The range of shifts permitted is:

- `LSL #0` to `#3` for T32 instructions.
- Any one of the following for A32 instructions:
  - `LSL #0` to `#31`.
  - `LSR #1` to `#32`.
  - `ASR #1` to `#32`.
  - `ROR #1` to `#31`.
  - `RRX`.

## Address alignment for preloads

No alignment checking is performed for preload instructions.

## Register restrictions

$Rm$  must not be PC. For T32 instructions  $Rm$  must also not be SP.

$Rn$  must not be PC for T32 instructions of the syntax `PLtype{cond} [Rn, ±Rm{, #shift}]`.

## Architectures

The `PLD` instruction is available in A32.

The 32-bit encoding of `PLD` is available in T32.

`PLDW` is available only in the Armv7 architecture and above that implement the Multiprocessing Extensions.

`PLI` is available only in the Armv7 architecture and above.

There are no 16-bit encodings of these instructions in T32.

These are hint instructions, and their implementation is optional. If they are not implemented, they execute as `NOPs`.

## Related information

[Register-relative and PC-relative expressions](#) on page 222

[Condition code suffixes](#) on page 138

## 14.80 POP (A32)

Pop registers off a full descending stack.

### Syntax

`POP{cond} reglist`

where:

**`cond`**

is an optional condition code.

**`reglist`**

is a non-empty list of registers, enclosed in braces. It can contain register ranges. It must be comma separated if it contains more than one register or register range.

### Operation

`POP` is a synonym for `LDMIA sp! reglist`. `POP` is the preferred mnemonic.



LDM and LDMFD are synonyms of LDMIA.

Note

Registers are stored on the stack in numerical order, with the lowest numbered register at the lowest address.

## POP, with reglist including the PC

This instruction causes a branch to the address popped off the stack into the PC. This is usually a return from a subroutine, where the LR was pushed onto the stack at the start of the subroutine.

Also:

- Bits[1:0] must not be 0b10.
- If bit[0] is 1, execution continues in T32 state.
- If bit[0] is 0, execution continues in A32 state.

## T32 instructions

A subset of this instruction is available in the T32 instruction set.

The following restriction applies to the 16-bit POP instruction:

*reglist* can only include the Lo registers and the PC.

The following restrictions apply to the 32-bit POP instruction:

- *reglist* must not include the SP.
- *reglist* can include either the LR or the PC, but not both.

## Restrictions on reglist in A32 instructions

The A32 POP instruction cannot have SP but can have PC in the *reglist*. The instruction that includes both PC and LR in the *reglist* is deprecated.

## Example

```
POP    {r0,r10,pc} ; no 16-bit version available
```

## Related information

[LDM \(A32\)](#) on page 307

[PUSH \(A32\)](#) on page 358

[Condition code suffixes](#) on page 138

## 14.81 PUSH (A32)

Push registers onto a full descending stack.

### Syntax

```
PUSH{cond} reglist
```

where:

**cond**

is an optional condition code.

**reglist**

is a non-empty list of registers, enclosed in braces. It can contain register ranges. It must be comma separated if it contains more than one register or register range.

### Operation

PUSH is a synonym for STMDB sp!, reglist. PUSH is the preferred mnemonic.



STMFD is a synonym of STMDB.

Note

Registers are stored on the stack in numerical order, with the lowest numbered register at the lowest address.

### T32 instructions

The following restriction applies to the 16-bit PUSH instruction:

*reglist* can only include the Lo registers and the LR.

The following restrictions apply to the 32-bit PUSH instruction:

- *reglist* must not include the SP.
- *reglist* must not include the PC.

### Restrictions on reglist in A32 instructions

The A32 PUSH instruction can have SP and PC in the *reglist* but the instruction that includes SP or PC in the *reglist* is deprecated.

### Examples

```
PUSH    {r0,r4-r7}
PUSH    {r2,lr}
```

### Related information

[LDM \(A32\)](#) on page 307

[POP \(A32\)](#) on page 357  
[Condition code suffixes](#) on page 138

## 14.82 QADD (A32)

Signed saturating addition.

### Syntax

`QADD{cond} {Rd}, Rm, Rn`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

The `QADD` instruction adds the values in `Rm` and `Rn`. It saturates the result to the signed range  $-2^{31} \leq x \leq 2^{31}-1$ .



All values are treated as two's complement signed integers by this instruction.

**Note**

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Q flag

If saturation occurs, this instruction sets the Q flag. To read the state of the Q flag, use an `MRS` instruction.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Example

```
QADD    r0, r1, r9
```

## Related information

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[The Q flag in AArch32 state](#) on page 83

[Condition code suffixes](#) on page 138

## 14.83 QADD8 (A32)

Signed saturating parallel byte-wise addition.

### Syntax

```
QADD8 {cond} {Rd}, Rn, Rm
```

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction performs four signed integer additions on the corresponding bytes of the operands and writes the results into the corresponding bytes of the destination. It saturates the results to the signed range  $-2^7 \leq x \leq 2^7 - 1$ . The Q flag is not affected even if this operation saturates.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[The Q flag in AArch32 state](#) on page 83  
[Condition code suffixes](#) on page 138

## 14.84 QADD16 (A32)

Signed saturating parallel halfword-wise addition.

### Syntax

`QADD16{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction performs two signed integer additions on the corresponding halfwords of the operands and writes the results into the corresponding halfwords of the destination. It saturates the results to the signed range  $-2^{15} \leq x \leq 2^{15}-1$ . The Q flag is not affected even if this operation saturates.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[The Q flag in AArch32 state](#) on page 83

[Condition code suffixes](#) on page 138

## 14.85 QASX (A32)

Signed saturating parallel add and subtract halfwords with exchange.

### Syntax

`QASX{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction exchanges the two halfwords of the second operand, then performs an addition on the two top halfwords of the operands and a subtraction on the bottom two halfwords. It writes the results into the corresponding halfwords of the destination. It saturates the results to the signed range  $-2^{15} \leq x \leq 2^{15}-1$ . The Q flag is not affected even if this operation saturates.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[The Q flag in AArch32 state](#) on page 83

[Condition code suffixes](#) on page 138

## 14.86 QDADD (A32)

Signed saturating Double and Add.

### Syntax

`QDADD{cond} {Rd}, Rm, Rn`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

`QDADD` calculates  $SAT(Rm + SAT(Rn * 2))$ . It saturates the result to the signed range  $-2^{31} \leq x \leq 2^{31}-1$ . Saturation can occur on the doubling operation, on the addition, or on both. If saturation occurs on the doubling but not on the addition, the Q flag is set but the final result is unsaturated.



All values are treated as two's complement signed integers by this instruction.

Note

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Q flag

If saturation occurs, this instruction sets the Q flag. To read the state of the Q flag, use an `MRS` instruction.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[The Q flag in AArch32 state](#) on page 83

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340  
[Condition code suffixes](#) on page 138

## 14.87 QDSUB (A32)

Signed saturating Double and Subtract.

### Syntax

`QDSUB{cond} {Rd}, Rm, Rn`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

`QDSUB` calculates  $SAT(Rm - SAT(Rn * 2))$ . It saturates the result to the signed range  $-2^{31} \leq x \leq 2^{31}-1$ . Saturation can occur on the doubling operation, on the subtraction, or on both. If saturation occurs on the doubling but not on the subtraction, the Q flag is set but the final result is unsaturated.



All values are treated as two's complement signed integers by this instruction.

Note

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Q flag

If saturation occurs, this instruction sets the Q flag. To read the state of the Q flag, use an `MRS` instruction.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Example

```
QDSUBLT r9, r0, r1
```

### Related information

[The Q flag in AArch32 state](#) on page 83

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[Condition code suffixes](#) on page 138

## 14.88 QSAX (A32)

Signed saturating parallel subtract and add halfwords with exchange.

### Syntax

```
QSAX{cond} {Rd}, Rn, Rm
```

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction exchanges the two halfwords of the second operand, then performs a subtraction on the two top halfwords of the operands and an addition on the bottom two halfwords. It writes the results into the corresponding halfwords of the destination. It saturates the results to the signed range  $-2^{15} \leq x \leq 2^{15}-1$ . The Q flag is not affected even if this operation saturates.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[The Q flag in AArch32 state](#) on page 83  
[Condition code suffixes](#) on page 138

## 14.89 QSUB (A32)

Signed saturating Subtract.

### Syntax

`QSUB{cond} {Rd}, Rm, Rn`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

The `QSUB` instruction subtracts the value in `Rn` from the value in `Rm`. It saturates the result to the signed range  $-2^{31} \leq x \leq 2^{31}-1$ .



All values are treated as two's complement signed integers by this instruction.

Note

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Q flag

If saturation occurs, this instruction sets the Q flag. To read the state of the Q flag, use an `MRS` instruction.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[The Q flag in AArch32 state](#) on page 83

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[Condition code suffixes](#) on page 138

## 14.90 QSUB8 (A32)

Signed saturating parallel byte-wise subtraction.

### Syntax

`QSUB8{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction subtracts each byte of the second operand from the corresponding byte of the first operand and writes the results into the corresponding bytes of the destination. It saturates the results to the signed range  $-2^7 \leq x \leq 2^7 - 1$ . The Q flag is not affected even if this operation saturates.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[The Q flag in AArch32 state](#) on page 83

[Condition code suffixes](#) on page 138

## 14.91 QSUB16 (A32)

Signed saturating parallel halfword-wise subtraction.

### Syntax

`QSUB16{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction subtracts each halfword of the second operand from the corresponding halfword of the first operand and writes the results into the corresponding halfwords of the destination. It saturates the results to the signed range  $-2^{15} \leq x \leq 2^{15}-1$ . The Q flag is not affected even if this operation saturates.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7 E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[The Q flag in AArch32 state](#) on page 83  
[Condition code suffixes](#) on page 138

## 14.92 RBIT (A32)

Reverse the bit order in a 32-bit word.

### Syntax

RBIT{cond} Rd, Rn

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the operand.

### Register restrictions

You cannot use PC for any register.

You can use SP in the A32 instruction but this is deprecated. You cannot use SP in the T32 instruction.

### Condition flags

This instruction does not change the flags.

### Architectures

This instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

### Example

```
RBIT    r7, r8
```

## Related information

[Condition code suffixes](#) on page 138

## 14.93 REV (A32)

Reverse the byte order in a word.

### Syntax

`REV{cond} Rd, Rn`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Rn***

is the register holding the operand.

### Usage

You can use this instruction to change endianness. `REV` converts 32-bit big-endian data into little-endian data or 32-bit little-endian data into big-endian data.

### Register restrictions

You cannot use PC for any register.

You can use SP in the A32 instruction but this is deprecated. You cannot use SP in the T32 instruction.

### Condition flags

This instruction does not change the flags.

### 16-bit instructions

The following form of this instruction is available in T32 code, and is a 16-bit instruction:

**`REV Rd, Rm`**

*Rd* and *Rm* must both be Lo registers.

### Architectures

This instruction is available in A32 and T32.

### Example

```
REV      r3, r7
```

### Related information

[Condition code suffixes](#) on page 138

## 14.94 REV16 (A32)

Reverse the byte order in each halfword independently.

### Syntax

`REV16{cond} Rd, Rn`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Rn***

is the register holding the operand.

### Usage

You can use this instruction to change endianness. `REV16` converts 16-bit big-endian data into little-endian data or 16-bit little-endian data into big-endian data.

### Register restrictions

You cannot use PC for any register.

You can use SP in the A32 instruction but this is deprecated. You cannot use SP in the T32 instruction.

### Condition flags

This instruction does not change the flags.

### 16-bit instructions

The following form of this instruction is available in T32 code, and is a 16-bit instruction:

**`REV16 Rd, Rm`**

*Rd* and *Rm* must both be Lo registers.

### Architectures

This instruction is available in A32 and T32.

### Example

```
REV16    r0, r0
```

### Related information

[Condition code suffixes](#) on page 138

## 14.95 REVSH (A32)

Reverse the byte order in the bottom halfword, and sign extend to 32 bits.

### Syntax

`REVSH{cond} Rd, Rn`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Rn***

is the register holding the operand.

### Usage

You can use this instruction to change endianness. `REVSH` converts either:

- 16-bit signed big-endian data into 32-bit signed little-endian data.
- 16-bit signed little-endian data into 32-bit signed big-endian data.

### Register restrictions

You cannot use PC for any register.

You can use SP in the A32 instruction but this is deprecated. You cannot use SP in the T32 instruction.

### Condition flags

This instruction does not change the flags.

### 16-bit instructions

The following form of this instruction is available in T32 code, and is a 16-bit instruction:

**`REVSH Rd, Rm`**

*Rd* and *Rm* must both be Lo registers.

### Architectures

This instruction is available in A32 and T32.

### Example

```
REVSH    r0, r5      ; Reverse Signed Halfword
```

## Related information

[Condition code suffixes](#) on page 138

## 14.96 RFE (A32)

Return From Exception.

### Syntax

`RFE{addr_mode}{cond} Rn{!}`

where:

#### **`addr_mode`**

is any one of the following:

##### **IA**

Increment address After each transfer (Full Descending stack)

##### **IB**

Increment address Before each transfer (A32 only)

##### **DA**

Decrement address After each transfer (A32 only)

##### **DB**

Decrement address Before each transfer.

If `addr_mode` is omitted, it defaults to Increment After.

#### **`cond`**

is an optional condition code.



`cond` is permitted only in T32 code, using a preceding `IT` instruction, but this is deprecated in Arm®v8. This is an unconditional instruction in A32 code.

#### **`Rn`**

specifies the base register. `Rn` must not be PC.

#### **!**

is an optional suffix. If `!` is present, the final address is written back into `Rn`.

### Usage

You can use `RFE` to return from an exception if you previously saved the return state using the `SRS` instruction. `Rn` is usually the `SP` where the return state information was saved.

## Operation

Loads the PC and the CPSR from the address contained in  $Rn$ , and the following address. Optionally updates  $Rn$ .

## Notes

`RFE` writes an address to the PC. The alignment of this address must be correct for the instruction set in use after the exception return:

- For a return to A32, the address written to the PC must be word-aligned.
- For a return to T32, the address written to the PC must be halfword-aligned.
- For a return to Jazelle, there are no alignment restrictions on the address written to the PC.

No special precautions are required in software to follow these rules, if you use the instruction to return after a valid exception entry mechanism.

Where addresses are not word-aligned, `RFE` ignores the least significant two bits of  $Rn$ .

The time order of the accesses to individual words of memory generated by `RFE` is not architecturally defined. Do not use this instruction on memory-mapped I/O locations where access order matters.

Do not use `RFE` in unprivileged software execution.

## Architectures

This instruction is available in A32.

This 32-bit T32 instruction is available, except in the Armv7-M and Armv8-M.mainline architectures.

There is no 16-bit version of this instruction.

## Example

```
RFE sp!
```

## Related information

[Processor modes, and privileged and unprivileged software execution](#) on page 76

[SRS \(A32\)](#) on page 418

[Condition code suffixes](#) on page 138

## 14.97 ROR (A32)

Rotate Right. This instruction is a preferred synonym for `MOV` instructions with shifted register operands.

### Syntax

`ROR{S}{cond} Rd, Rm, Rs`

`ROR{S}{cond} Rd, Rm, #sh`

where:

**s**

is an optional suffix. If `s` is specified, the condition flags are updated on the result of the operation.

**Rd**

is the destination register.

**Rm**

is the register holding the first operand. This operand is shifted right.

**Rs**

is a register holding a shift value to apply to the value in `Rm`. Only the least significant byte is used.

**sh**

is a constant shift. The range of values is 1-31.

### Operation

`ROR` provides the value of the contents of a register rotated by a value. The bits that are rotated off the right end are inserted into the vacated bit positions on the left.

### Restrictions in T32 code

T32 instructions must not use PC or SP.

### Use of SP and PC in A32 instructions

You can use SP in these A32 instructions but this is deprecated.

You cannot use PC in instructions with the `ROR{S}{cond} Rd, Rm, Rs` syntax. You can use PC for `Rd` and `Rm` in the other syntax, but this is deprecated.

If you use PC as `Rm`, the value used is the address of the instruction plus 8.

If you use PC as `Rd`:

- Execution branches to the address corresponding to the result.
- If you use the `s` suffix, the SPSR of the current mode is copied to the CPSR. You can use this to return from exceptions.



The A32 instruction `RORS{cond} pc, Rm, #sh` always disassembles to the preferred form `MOVS{cond} pc, Rm{, shift}`.



Do not use the s suffix when using PC as  $Rd$  in User mode or System mode. The assembler cannot warn you about this because it has no information about what the processor mode is likely to be at execution time.

You cannot use PC for  $Rd$  or any operand in this instruction if it has a register-controlled shift.

## Condition flags

If s is specified, the instruction updates the N and Z flags according to the result.

The C flag is unaffected if the shift value is 0. Otherwise, the C flag is updated to the last bit shifted out.

## 16-bit instructions

The following forms of this instruction are available in T32 code, and are 16-bit instructions:

**RORS Rd, Rd, Rs**

$Rd$  and  $Rs$  must both be Lo registers. This form can only be used outside an IT block.

**ROR{cond} Rd, Rd, Rs**

$Rd$  and  $Rs$  must both be Lo registers. This form can only be used inside an IT block.

## Architectures

This instruction is available in A32 and T32.

## Example

```
ROR      r4, r5, r6
```

## Related information

[MOV \(A32\)](#) on page 333

[Condition code suffixes](#) on page 138

## 14.98 RRX (A32)

Rotate Right with Extend. This instruction is a preferred synonym for `mov` instructions with shifted register operands.

### Syntax

`RRX S{cond} Rd, Rm`

where:

**s**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

**Rd**

is the destination register.

**Rm**

is the register holding the first operand. This operand is shifted right.

### Operation

`RRX` provides the value of the contents of a register shifted right one bit. The old carry flag is shifted into bit[31]. If the s suffix is present, the old bit[0] is placed in the carry flag.

### Restrictions in T32 code

T32 instructions must not use PC or SP.

### Use of SP and PC in A32 instructions

You can use SP in this A32 instruction but this is deprecated.

If you use PC as `Rm`, the value used is the address of the instruction plus 8.

If you use PC as `Rd`:

- Execution branches to the address corresponding to the result.
- If you use the s suffix, the SPSR of the current mode is copied to the CPSR. You can use this to return from exceptions.



The A32 instruction `RRXS{cond} pc, Rm` always disassembles to the preferred form `MOVS{cond} pc, Rm{,shift}`.



Do not use the s suffix when using PC as `Rd` in User mode or System mode. The assembler cannot warn you about this because it has no information about what the processor mode is likely to be at execution time.

You cannot use PC for *Rd* or any operand in this instruction if it has a register-controlled shift.

## Condition flags

If *s* is specified, the instruction updates the N and Z flags according to the result.

The C flag is unaffected if the shift value is 0. Otherwise, the C flag is updated to the last bit shifted out.

## Architectures

The 32-bit instruction is available in A32 and T32.

There is no 16-bit instruction in T32.

## Related information

[MOV \(A32\)](#) on page 333

[Condition code suffixes](#) on page 138

# 14.99 RSB (A32)

Reverse Subtract without carry.

## Syntax

`RSB{S}{cond} Rd, Rn, Operand2`

where:

**s**

is an optional suffix. If *s* is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Operand2**

is a flexible second operand.

## Operation

The RSB instruction subtracts the value in *Rn* from the value of *operand2*. This is useful because of the wide range of options for *operand2*.

In certain circumstances, the assembler can substitute one instruction for another. Be aware of this when reading disassembly listings.

## Use of PC and SP in T32 instructions

You cannot use PC (*R15*) for *Rd* or any operand.

You cannot use SP (*R13*) for *Rd* or any operand.

## Use of PC and SP in A32 instructions

You cannot use PC for *Rn* or any operand in an *RSB* instruction that has a register-controlled shift.

Use of PC for any operand, in instructions without register-controlled shift, is deprecated.

If you use PC (*R15*) as *Rn* or *Rm*, the value used is the address of the instruction plus 8.

If you use PC as *Rd*:

- Execution branches to the address corresponding to the result.
- If you use the s suffix, see the *SUBS pc, lr* instruction.

Use of SP and PC in A32 instructions is deprecated.

## Condition flags

If s is specified, the *RSB* instruction updates the N, Z, C,, and V flags according to the result.

## 16-bit instructions

The following forms of this instruction are available in T32 code, and are 16-bit instructions:

### **RSBS Rd, Rn, #0**

*Rd* and *Rn* must both be Lo registers. This form can only be used outside an IT block.

### **RSB{cond} Rd, Rn, #0**

*Rd* and *Rn* must both be Lo registers. This form can only be used inside an IT block.

## Example

```
RSB      r4, r4, #1280      ; subtracts contents of R4 from 1280
```

## Related information

[Flexible second operand \(Operand2\)](#) on page 244

[Condition code suffixes](#) on page 138

## 14.100 RSC (A32)

Reverse Subtract with Carry.

## Syntax

```
RSC{S}{cond} Rd, Rn, Operand2
```

where:

**s**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Operand2**

is a flexible second operand.

**Usage**

The RSC instruction subtracts the value in *Rn* from the value of *operand2*. If the carry flag is clear, the result is reduced by one.

You can use RSC to synthesize multiword arithmetic.

In certain circumstances, the assembler can substitute one instruction for another. Be aware of this when reading disassembly listings.

RSC is not available in T32 code.

**Use of PC and SP**

Use of PC and SP is deprecated.

You cannot use PC for *Rd* or any operand in an RSC instruction that has a register-controlled shift.

If you use PC (R15) as *Rn* or *Rm*, the value used is the address of the instruction plus 8.

If you use PC as *Rd*:

- Execution branches to the address corresponding to the result.
- If you use the s suffix, see the SUBS pc,lr instruction.

**Condition flags**

If s is specified, the RSC instruction updates the N, Z, C, and V flags according to the result.

**Correct example**

```
RSCSLE r0,r5,r0,LSL r4      ; conditional, flags set
```

**Incorrect example**

```
RSCSLE r0,pc,r0,LSL r4      ; PC not permitted with register
                                ; controlled shift
```

## Related information

[Flexible second operand \(Operand2\) on page 244](#)  
[Condition code suffixes on page 138](#)

# 14.101 SADD8 (A32)

Signed parallel byte-wise addition.

## Syntax

`SADD8{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Rm, Rn***

are the general-purpose registers holding the operands.

## Operation

This instruction performs four signed integer additions on the corresponding bytes of the operands and writes the results into the corresponding bytes of the destination. The results are modulo  $2^8$ . It sets the APSR GE flags.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## GE flags

This instruction does not affect the N, Z, C, V, or Q flags.

It sets the GE flags in the APSR as follows:

**GE[0]**

for bits[7:0] of the result.

**GE[1]**

for bits[15:8] of the result.

**GE[2]**

for bits[23:16] of the result.

**GE[3]**

for bits[31:24] of the result.

It sets a GE flag to 1 to indicate that the corresponding result is greater than or equal to zero. This is equivalent to an `ADDS` instruction setting the N and V condition flags to the same value, so that the GE condition passes.

You can use these flags to control a following `SEL` instruction.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[SEL \(A32\)](#) on page 389

[Condition code suffixes](#) on page 138

## 14.102 SADD16 (A32)

Signed parallel halfword-wise addition.

### Syntax

`SADD16 {cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction performs two signed integer additions on the corresponding halfwords of the operands and writes the results into the corresponding halfwords of the destination. The results are modulo  $2^{16}$ . It sets the APSR GE flags.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## GE flags

This instruction does not affect the N, Z, C, V, or Q flags.

It sets the GE flags in the APSR as follows:

### GE[1:0]

for bits[15:0] of the result.

### GE[3:2]

for bits[31:16] of the result.

It sets a pair of GE flags to 1 to indicate that the corresponding result is greater than or equal to zero. This is equivalent to an `ADDS` instruction setting the N and V condition flags to the same value, so that the GE condition passes.

You can use these flags to control a following `SEL` instruction.



Note

GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[SEL \(A32\)](#) on page 389

[Condition code suffixes](#) on page 138

## 14.103 SASX (A32)

Signed parallel add and subtract halfwords with exchange.

### Syntax

`SASX{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

**Operation**

This instruction exchanges the two halfwords of the second operand, then performs an addition on the two top halfwords of the operands and a subtraction on the bottom two halfwords. It writes the results into the corresponding halfwords of the destination. The results are modulo  $2^{16}$ . It sets the APSR GE flags.

**Register restrictions**

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

**GE flags**

This instruction does not affect the N, Z, C, V, or Q flags.

It sets the GE flags in the APSR as follows:

**GE[1:0]**

for bits[15:0] of the result.

**GE[3:2]**

for bits[31:16] of the result.

It sets a pair of GE flags to 1 to indicate that the corresponding result is greater than or equal to zero. This is equivalent to an **ADDS** or **SUBS** instruction setting the N and V condition flags to the same value, so that the GE condition passes.

You can use these flags to control a following **SEL** instruction.



GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

**Note**

**Availability**

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[SEL \(A32\)](#) on page 389

[Condition code suffixes](#) on page 138

## 14.104 SBC (A32)

Subtract with Carry.

### Syntax

`SBC{S}{cond} {Rd}, Rn, Operand2`

where:

**s**

is an optional suffix. If *s* is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Operand2**

is a flexible second operand.

### Usage

The `SBC` (Subtract with Carry) instruction subtracts the value of *Operand2* from the value in *Rn*. If the carry flag is clear, the result is reduced by one.

You can use `SBC` to synthesize multiword arithmetic.

In certain circumstances, the assembler can substitute one instruction for another. Be aware of this when reading disassembly listings.

### Use of PC and SP in T32 instructions

You cannot use PC (`r15`) for *Rd*, or any operand.

You cannot use SP (`r13`) for *Rd*, or any operand.

### Use of PC and SP in A32 instructions

You cannot use PC for *Rd* or any operand in an `SBC` instruction that has a register-controlled shift.

Use of PC for any operand in instructions without register-controlled shift, is deprecated.

If you use PC (*r15*) as *Rn* or *Rm*, the value used is the address of the instruction plus 8.

If you use PC as *Rd*:

- Execution branches to the address corresponding to the result.
- If you use the s suffix, see the `SUBS pc,lr` instruction.

Use of SP and PC in `SBC` A32 instructions is deprecated.

### Condition flags

If s is specified, the `SBC` instruction updates the N, Z, C, and V flags according to the result.

### 16-bit instructions

The following forms of this instruction are available in T32 code, and are 16-bit instructions:

#### `SBCS Rd, Rd, Rm`

*Rd* and *Rm* must both be Lo registers. This form can only be used outside an IT block.

#### `SBC{cond} Rd, Rd, Rm`

*Rd* and *Rm* must both be Lo registers. This form can only be used inside an IT block.

### Multiword arithmetic examples

These instructions subtract one 96-bit integer contained in *r9*, *r10*, and *r11* from another 96-bit integer contained in *r6*, *r7*, and *r8*, and place the result in *r3*, *r4*, and *r5*:

```
SUBS    r3, r6, r9
SBCS    r4, r7, r10
SBC     r5, r8, r11
```

For clarity, these examples use consecutive registers for multiword values. There is no requirement to do this. The following, for example, is perfectly valid:

```
SUBS    r6, r6, r9
SBCS    r9, r2, r1
SBC     r2, r8, r11
```

### Related information

[Flexible second operand \(Operand2\) on page 244](#)

[Condition code suffixes on page 138](#)

## 14.105 SBFX (A32)

Signed Bit Field Extract.

### Syntax

`SBFX{cond} Rd, Rn, #lsb, #width`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Rn***

is the source register.

***lsb***

is the bit number of the least significant bit in the bitfield, in the range 0 to 31.

***width***

is the width of the bitfield, in the range 1 to (32-*lsb*).

## Operation

Copies adjacent bits from one register into the least significant bits of a second register, and sign extends to 32 bits.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not alter any flags.

## Architectures

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.106 SDIV (A32)

Signed Divide.

### Syntax

`SDIV{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the value to be divided.

**Rm**

is a register holding the divisor.

## Register restrictions

PC or SP cannot be used for Rd, Rn, or Rm.

## Architectures

This 32-bit T32 instruction is available in Arm®v7-R, Armv7-M, and Armv8-M.mainline.

This 32-bit A32 instruction is optional in Armv7-R.

This 32-bit A32 and T32 instruction is available in Armv7-A if Virtualization Extensions are implemented, and optional if not.

There is no 16-bit T32 SDIV instruction.

## Related information

[Condition code suffixes](#) on page 138

## 14.107 SEL (A32)

Select bytes from each operand according to the state of the APSR GE flags.

### Syntax

SEL{cond} {Rd}, Rn, Rm

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Rm**

is the register holding the second operand.

## Operation

The SEL instruction selects bytes from Rn or Rm according to the APSR GE flags:

- If GE[0] is set, Rd[7:0] come from Rn[7:0], otherwise from Rm[7:0].
- If GE[1] is set, Rd[15:8] come from Rn[15:8], otherwise from Rm[15:8].
- If GE[2] is set, Rd[23:16] come from Rn[23:16], otherwise from Rm[23:16].
- If GE[3] is set, Rd[31:24] come from Rn[31:24], otherwise from Rm[31:24].

## Usage

Use the SEL instruction after one of the signed parallel instructions. You can use this to select maximum or minimum values in multiple byte or halfword data.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Examples

```
SEL      r0, r4, r5
SELLT   r4, r0, r4
```

The following instruction sequence sets each byte in R4 equal to the unsigned minimum of the corresponding bytes of R1 and R2:

```
USUB8   r4, r1, r2
SEL      r4, r2, r1
```

## Related information

[Application Program Status Register](#) on page 83

[Condition code suffixes](#) on page 138

## 14.108 SETEND (A32)

Set the endianness bit in the CPSR, without affecting any other bits in the CPSR.



This instruction is deprecated in Arm®v8.

**Note**

### Syntax

`SETEND specifier`

where:

***specifier***

is one of:

**BE**

Big-endian.

**LE**

Little-endian.

### Usage

Use `SETEND` to access data of different endianness, for example, to access several big-endian DMA-formatted data fields from an otherwise little-endian application.

`SETEND` cannot be conditional, and is not permitted in an IT block.

### Architectures

This instruction is available in A32 and 16-bit T32.

This 16-bit instruction is available in T32, except in the Armv6-M and Armv7-M architectures.

There is no 32-bit version of this instruction in T32.

### Example

```
SETEND BE      ; Set the CPSR E bit for big-endian accesses
LDR    r0, [r2, #header]
LDR    r1, [r2, #CRC32]
SETEND le      ; Set the CPSR E bit for little-endian accesses
                ; for the rest of the application
```

## 14.109 SETPAN (A32)

Set Privileged Access Never.

### Syntax

```
SETPAN{q} #imm ; A1 general registers (A32)
```

```
SETPAN{q} #imm ; T1 general registers (T32)
```

Where:

**q**

Is an optional instruction width specifier. See [Instruction width specifiers](#).

**imm**

Is the unsigned immediate 0 or 1.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Set Privileged Access Never writes a new value to PSTATE.PAN.

This instruction is available only in privileged mode and it is a `NOP` when executed in User mode.

### Related information

[A32 and T32 instruction summary](#) on page 240

## 14.110 SEV (A32)

Set Event.

### Syntax

```
SEV{cond}
```

where:

**cond**

is an optional condition code.

### Operation

This is a hint instruction. It is optional whether it is implemented or not. If it is not implemented, it executes as a `NOP`. The assembler produces a diagnostic message if the instruction executes as a `NOP` on the target.

`SEV` causes an event to be signaled to all cores within a multiprocessor system. If `SEV` is implemented, `WFE` must also be implemented.

## Availability

This instruction is available in A32 and T32.

## Related information

[SEVL \(A32\)](#) on page 393

[NOP \(A32\)](#) on page 350

[Condition code suffixes](#) on page 138

## 14.111 SEVL (A32)

Set Event Locally.



This instruction is supported only in Arm®v8.

Note

### Syntax

`SEVL{cond}`

where:

**cond**

is an optional condition code.

### Operation

This is a hint instruction. It is optional whether it is implemented or not. If it is not implemented, it executes as a `NOP`. `armasm` produces a diagnostic message if the instruction executes as a `NOP` on the target.

`SEVL` causes an event to be signaled to all cores the current processor. `SEVL` is not required to affect other processors although it is permitted to do so.

## Availability

This instruction is available in A32 and T32.

## Related information

[SEV \(A32\)](#) on page 392

[NOP \(A32\)](#) on page 350

[Condition code suffixes](#) on page 138

## 14.112 SG (A32)

Secure Gateway.

### Syntax

SG

### Usage

Secure Gateway marks a valid branch target for branches from Non-secure code that wants to call Secure code.

## 14.113 SHADD8 (A32)

Signed halving parallel byte-wise addition.

### Syntax

SHADD8 {cond} {Rd}, Rn, Rm

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction performs four signed integer additions on the corresponding bytes of the operands, halves the results, and writes the results into the corresponding bytes of the destination. This cannot cause overflow.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.114 SHADD16 (A32)

Signed halving parallel halfword-wise addition.

### Syntax

`SHADD16{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction performs two signed integer additions on the corresponding halfwords of the operands, halves the results, and writes the results into the corresponding halfwords of the destination. This cannot cause overflow.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

# 14.115 SHASX (A32)

Signed halving parallel add and subtract halfwords with exchange.

## Syntax

`SHASX{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

## Operation

This instruction exchanges the two halfwords of the second operand, then performs an addition on the two top halfwords of the operands and a subtraction on the bottom two halfwords. It halves the results and writes them into the corresponding halfwords of the destination. This cannot cause overflow.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.116 SHSAX (A32)

Signed halving parallel subtract and add halfwords with exchange.

### Syntax

`SHSAX{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction exchanges the two halfwords of the second operand, then performs a subtraction on the two top halfwords of the operands and an addition on the bottom two halfwords. It halves the results and writes them into the corresponding halfwords of the destination. This cannot cause overflow.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.117 SHSUB8 (A32)

Signed halving parallel byte-wise subtraction.

### Syntax

`SHSUB8{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction subtracts each byte of the second operand from the corresponding byte of the first operand, halves the results, and writes the results into the corresponding bytes of the destination. This cannot cause overflow.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.118 SHSUB16 (A32)

Signed halving parallel halfword-wise subtraction.

### Syntax

`SHSUB16{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction subtracts each halfword of the second operand from the corresponding halfword of the first operand, halves the results, and writes the results into the corresponding halfwords of the destination. This cannot cause overflow.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.119 SMC (A32)

Secure Monitor Call.

### Syntax

```
SMC{cond} #imm4
```

where:

**cond**

is an optional condition code.

**imm4**

is a 4-bit immediate value. This is ignored by the Arm® processor, but can be used by the SMC exception handler to determine what service is being requested.



SMC was called `SMI` in earlier versions of the A32 assembly language. `SMI` instructions disassemble to `SMC`, with a comment to say that this was formerly `SMI`.

### Architectures

This 32-bit instruction is available in A32 and T32, if the Arm architecture has the Security Extensions.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

[A-Profile Architectures](#)

## 14.120 SMLAxy

Signed Multiply Accumulate, with 16-bit operands and a 32-bit result and accumulator.

### Syntax

```
SMLA<x><y>{cond} Rd, Rn, Rm, Ra
```

where:

**<x>**

is either `B` or `T`. `B` means use the bottom half (bits [15:0]) of `Rn`, `T` means use the top half (bits [31:16]) of `Rn`.

**<y >**

is either **b** or **t**. **b** means use the bottom half (bits [15:0]) of **Rm**, **t** means use the top half (bits [31:16]) of **Rm**.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn, Rm**

are the registers holding the values to be multiplied.

**Ra**

is the register holding the value to be added.

**Operation**

**SMLAXy** multiplies the 16-bit signed integers from the selected halves of **Rn** and **Rm**, adds the 32-bit result to the 32-bit value in **Ra**, and places the result in **Rd**.

**Register restrictions**

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

**Condition flags**

This instruction does not affect the N, Z, C,, or V flags.

If overflow occurs in the accumulation, **SMLAXy** sets the Q flag. To read the state of the Q flag, use an **MRS** instruction.



**SMLAXy** never clears the Q flag. To clear the Q flag, use an **MSR** instruction.

Note

**Availability**

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

**Examples**

|                 |                        |
|-----------------|------------------------|
| <b>SMLABBNE</b> | <b>r0, r2, r1, r10</b> |
| <b>SMLABT</b>   | <b>r0, r0, r3, r5</b>  |

## Related information

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340  
[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343  
[Condition code suffixes](#) on page 138

## 14.121 SMLAD (A32)

Dual 16-bit Signed Multiply with Addition of products and 32-bit accumulation.

### Syntax

`SMLAD{X} {cond} Rd, Rn, Rm, Ra`

where:

**cond**

is an optional condition code.

**X**

is an optional parameter. If X is present, the most and least significant halfwords of the second operand are exchanged before the multiplications occur.

**Rd**

is the destination register.

**Rn, Rm**

are the registers holding the operands.

**Ra**

is the register holding the accumulate operand.

### Operation

SMLAD multiplies the bottom halfword of *Rn* with the bottom halfword of *Rm*, and the top halfword of *Rn* with the top halfword of *Rm*. It then adds both products to the value in *Ra* and stores the sum to *Rd*.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not change the flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Example

```
SMLADLT      r1, r2, r4, r1
```

## Related information

[Condition code suffixes](#) on page 138

## 14.122 SMLAL (A32)

Signed Long Multiply, with optional Accumulate, with 32-bit operands, and 64-bit result and accumulator.

### Syntax

```
SMLAL{S} {cond} RdLo, RdHi, Rn, Rm
```

where:

**s**

is an optional suffix available in A32 state only. If s is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**RdLo, RdHi**

are the destination registers. They also hold the accumulating value. *RdHi* and *RdHi* must be different registers

**Rn, Rm**

are general-purpose registers holding the operands.

### Operation

The **SMLAL** instruction interprets the values from *Rn* and *Rm* as two's complement signed integers. It multiplies these integers, and adds the 64-bit result to the 64-bit signed integer contained in *RdHi* and *RdHi*.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

If s is specified, this instruction:

- Updates the N and Z flags according to the result.
- Does not affect the C or V flags.

## Architectures

This instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.123 SMLALD (A32)

Dual 16-bit Signed Multiply with Addition of products and 64-bit Accumulation.

### Syntax

`SMLALD{X} {cond} RdLo, RdHi, Rn, Rm`

where:

**x**

is an optional parameter. If x is present, the most and least significant halfwords of the second operand are exchanged before the multiplications occur.

**cond**

is an optional condition code.

**RdLo, RdHi**

are the destination registers for the 64-bit result. They also hold the 64-bit accumulate operand. `RdHi` and `RdLo` must be different registers.

**Rn, Rm**

are the general-purpose registers holding the operands.

### Operation

`SMLALD` multiplies the bottom halfword of `Rn` with the bottom halfword of `Rm`, and the top halfword of `Rn` with the top halfword of `Rm`. It then adds both products to the value in `RdLo`, `RdHi` and stores the sum to `RdLo`, `RdHi`.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not change the flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Example

```
SMLALD      r10, r11, r5, r1
```

## Related information

[Condition code suffixes](#) on page 138

# 14.124 SMLALxy

Signed Multiply-Accumulate with 16-bit operands and a 64-bit accumulator.

## Syntax

`SMLAL<x><y>{cond} RdLo, RdHi, Rn, Rm`

where:

**<x>**

is either `B` or `T`. `B` means use the bottom half (bits [15:0]) of `Rn`, `T` means use the top half (bits [31:16]) of `Rn`.

**<y>**

is either `B` or `T`. `B` means use the bottom half (bits [15:0]) of `Rm`, `T` means use the top half (bits [31:16]) of `Rm`.

**cond**

is an optional condition code.

**RdLo, RdHi**

are the destination registers. They also hold the accumulate value. `RdHi` and `RdLo` must be different registers.

**Rn, Rm**

are the general-purpose registers holding the values to be multiplied.

## Operation

`SMLALxy` multiplies the signed integer from the selected half of `Rm` by the signed integer from the selected half of `Rn`, and adds the 32-bit result to the 64-bit value in `RdHi` and `RdLo`.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.



`SMLALxy` cannot raise an exception. If overflow occurs on this instruction, the result wraps round without any warning.

Note

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Examples

```
SMLALTB      r2, r3, r7, r1
SMLALBTVS   r0, r1, r9, r2
```

## Related information

[Condition code suffixes](#) on page 138

# 14.125 SMLAWy

Signed Multiply-Accumulate Wide, with one 32-bit and one 16-bit operand, and a 32-bit accumulate value, providing the top 32 bits of the result.

## Syntax

`SMLAW <y>{cond} Rd, Rn, Rm, Ra`

where:

**<y >**

is either `B` or `T`. `B` means use the bottom half (bits [15:0]) of `Rm`, `T` means use the top half (bits [31:16]) of `Rm`.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn, Rm**

are the registers holding the values to be multiplied.

**Ra**

is the register holding the value to be added.

**Operation**

`SMLAWy` multiplies the signed 16-bit integer from the selected half of *Rm* by the signed 32-bit integer from *Rn*, adds the top 32 bits of the 48-bit result to the 32-bit value in *Ra*, and places the result in *Rd*.

**Register restrictions**

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

**Condition flags**

This instruction does not affect the N, Z, C,, or V flags.

If overflow occurs in the accumulation, `SMLAWy` sets the Q flag.

**Availability**

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

**Related information**

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[Condition code suffixes](#) on page 138

## 14.126 SMLSD (A32)

Dual 16-bit Signed Multiply with Subtraction of products and 32-bit accumulation.

**Syntax**

`SMLSD{X} {cond} Rd, Rn, Rm, Ra`

where:

**cond**

is an optional condition code.

**x**

is an optional parameter. If x is present, the most and least significant halfwords of the second operand are exchanged before the multiplications occur.

**Rd**

is the destination register.

**Rn, Rm**

are the registers holding the operands.

**Ra**

is the register holding the accumulate operand.

## Operation

`SMLSD` multiplies the bottom halfword of  $Rn$  with the bottom halfword of  $Rm$ , and the top halfword of  $Rn$  with the top halfword of  $Rm$ . It then subtracts the second product from the first, adds the difference to the value in  $Ra$ , and stores the result to  $Rd$ .

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Examples

|                     |                               |
|---------------------|-------------------------------|
| <code>SMLSD</code>  | <code>r1, r2, r0, r7</code>   |
| <code>SMLSDX</code> | <code>r11, r10, r2, r3</code> |

## Related information

[Condition code suffixes](#) on page 138

## 14.127 SMLSLD (A32)

Dual 16-bit Signed Multiply with Subtraction of products and 64-bit accumulation.

### Syntax

`SMLSLD{X} {cond} RdLo, RdHi, Rn, Rm`

where:

**x**

is an optional parameter. If x is present, the most and least significant halfwords of the second operand are exchanged before the multiplications occur.

**cond**

is an optional condition code.

**RdLo, RdHi**

are the destination registers for the 64-bit result. They also hold the 64-bit accumulate operand. `RdHi` and `RdLo` must be different registers.

**Rn, Rm**

are the general-purpose registers holding the operands.

### Operation

`SMLSLD` multiplies the bottom halfword of `Rn` with the bottom halfword of `Rm`, and the top halfword of `Rn` with the top halfword of `Rn`. It then subtracts the second product from the first, adds the difference to the value in `RdLo`, `RdHi`, and stores the result to `RdLo`, `RdHi`.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not change the flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®V7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Example

```
SMLSLD      r3, r0, r5, r1
```

## Related information

[Condition code suffixes](#) on page 138

# 14.128 SMMLA (A32)

Signed Most significant word Multiply with Accumulation.

## Syntax

`SMMLA{R} {cond} Rd, Rn, Rm, Ra`

where:

**R**

is an optional parameter. If **R** is present, the result is rounded, otherwise it is truncated.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn, Rm**

are the registers holding the operands.

**Ra**

is a register holding the value to be added or subtracted from.

## Operation

`SMMLA` multiplies the values from `Rn` and `Rm`, adds the value in `Ra` to the most significant 32 bits of the product, and stores the result in `Rd`.

If the optional `R` parameter is specified, `0x80000000` is added before extracting the most significant 32 bits. This has the effect of rounding the result.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.129 SMMLS (A32)

Signed Most significant word Multiply with Subtraction.

### Syntax

`SMMLS{R}{cond} Rd, Rn, Rm, Ra`

where:

**R**

is an optional parameter. If **R** is present, the result is rounded, otherwise it is truncated.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn, Rm**

are the registers holding the operands.

**Ra**

is a register holding the value to be added or subtracted from.

### Operation

`SMMLS` multiplies the values from **Rn** and **Rm**, subtracts the product from the value in **Ra** shifted left by 32 bits, and stores the most significant 32 bits of the result in **Rd**.

If the optional **R** parameter is specified, `0x80000000` is added before extracting the most significant 32 bits. This has the effect of rounding the result.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not change the flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.130 SMMUL (A32)

Signed Most significant word Multiply.

### Syntax

`SMMUL{R} {cond} {Rd}, Rn, Rm`

where:

**R**

is an optional parameter. If **R** is present, the result is rounded, otherwise it is truncated.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn, Rm**

are the registers holding the operands.

**Ra**

is a register holding the value to be added or subtracted from.

### Operation

`SMMUL` multiplies the 32-bit values from **Rn** and **Rm**, and stores the most significant 32 bits of the 64-bit result to **Rd**.

If the optional **R** parameter is specified, 0x80000000 is added before extracting the most significant 32 bits. This has the effect of rounding the result.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not change the flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Examples

```
SMMULGE      r6, r4, r3
SMMULR      r2, r2, r2
```

## Related information

[Condition code suffixes](#) on page 138

## 14.131 SMUAD (A32)

Dual 16-bit Signed Multiply with Addition of products, and optional exchange of operand halves.

### Syntax

```
SMUAD{X} {cond} {Rd}, Rn, Rm
```

where:

**x**

is an optional parameter. If x is present, the most and least significant halfwords of the second operand are exchanged before the multiplications occur.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn, Rm**

are the registers holding the operands.

### Operation

SMUAD multiplies the bottom halfword of *Rn* with the bottom halfword of *Rm*, and the top halfword of *Rn* with the top halfword of *Rm*. It then adds the products and stores the sum to *Rd*.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## **Q flag**

The `SMUAD` instruction sets the Q flag if the addition overflows.

## **Availability**

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## **Examples**

```
SMUAD      r2, r3, r2
```

## **Related information**

[Condition code suffixes](#) on page 138

## **14.132 SMULxy**

Signed Multiply, with 16-bit operands and a 32-bit result.

### **Syntax**

```
SMUL<x><y>{cond} {Rd}, Rn, Rm
```

where:

**<x>**

is either `B` or `T`. `B` means use the bottom half (bits [15:0]) of `Rn`, `T` means use the top half (bits [31:16]) of `Rn`.

**<y>**

is either `B` or `T`. `B` means use the bottom half (bits [15:0]) of `Rm`, `T` means use the top half (bits [31:16]) of `Rm`.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn, Rm**

are the registers holding the values to be multiplied.

### **Operation**

`SMULxy` multiplies the 16-bit signed integers from the selected halves of `Rn` and `Rm`, and places the 32-bit result in `Rd`.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

These instructions do not affect the N, Z, C,, or V flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Examples

```
SMULTBEQ    r8, r7, r9
```

## Related information

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[MSR \(general-purpose register to PSR\) \(A32\)](#) on page 343

[Condition code suffixes](#) on page 138

## 14.133 SMULL (A32)

Signed Long Multiply, with 32-bit operands and 64-bit result.

### Syntax

```
SMULL{S} {cond} RdLo, RdHi, Rn, Rm
```

where:

**s**

is an optional suffix available in A32 state only. If s is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**RdLo, RdHi**

are the destination registers. *RdLo* and *RdHi* must be different registers

**Rn, Rm**

are general-purpose registers holding the operands.

## Operation

The `SMULL` instruction interprets the values from  $Rn$  and  $Rm$  as two's complement signed integers. It multiplies these integers and places the least significant 32 bits of the result in  $RdLo$ , and the most significant 32 bits of the result in  $RdLo$ .

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

If s is specified, this instruction:

- Updates the N and Z flags according to the result.
- Does not affect the C or V flags.

## Architectures

This instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.134 SMULW<sub>y</sub>

Signed Multiply Wide, with one 32-bit and one 16-bit operand, providing the top 32 bits of the result.

### Syntax

`SMULW<y>{cond} {Rd}, Rn, Rm`

where:

**<y>**

is either **B** or **T**. **B** means use the bottom half (bits [15:0]) of  $Rm$ , **T** means use the top half (bits [31:16]) of  $Rm$ .

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn, Rm**

are the registers holding the values to be multiplied.

## Operation

`SMULWY` multiplies the signed integer from the selected half of  $R_m$  by the signed integer from  $R_n$ , and places the upper 32-bits of the 48-bit result in  $R_d$ .

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not affect the N, Z, C,, or V flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[Condition code suffixes](#) on page 138

# 14.135 SMUSD (A32)

Dual 16-bit Signed Multiply with Subtraction of products, and optional exchange of operand halves.

## Syntax

`SMUSD{X} {cond} {Rd}, Rn, Rm`

where:

**x**

is an optional parameter. If x is present, the most and least significant halfwords of the second operand are exchanged before the multiplications occur.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn, Rm**

are the registers holding the operands.

## Operation

`SMUSD` multiplies the bottom halfword of  $Rn$  with the bottom halfword of  $Rm$ , and the top halfword of  $Rn$  with the top halfword of  $Rm$ . It then subtracts the second product from the first, and stores the difference to  $Rd$ .

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Example

```
SMUSDXNE    r0, r1, r2
```

## Related information

[Condition code suffixes](#) on page 138

## 14.136 SRS (A32)

Store Return State onto a stack.

### Syntax

```
SRS{addr_mode}{cond} sp{!}, #modenum
SRS{addr_mode}{cond} #modenum{!} ; This is pre-UAL syntax
```

where:

#### **addr\_mode**

is any one of the following:

##### **IA**

Increment address After each transfer

##### **IB**

Increment address Before each transfer (A32 only)

##### **DA**

Decrement address After each transfer (A32 only)

**DB**

Decrement address Before each transfer (Full Descending stack).

If *addr\_mode* is omitted, it defaults to Increment After. You can also use stack oriented addressing mode suffixes, for example, when implementing stacks.

**cond**

is an optional condition code.



*cond* is permitted only in T32 code, using a preceding `IT` instruction, but this is deprecated in the Arm®v8 architecture. This is an unconditional instruction in A32.

!

is an optional suffix. If `!` is present, the final address is written back into the SP of the mode specified by *modenum*.

**modenum**

specifies the number of the mode whose banked SP is used as the base register. You must use only the defined mode numbers.

## Operation

`SRS` stores the LR and the SPSR of the current mode, at the address contained in SP of the mode specified by *modenum*, and the following word respectively. Optionally updates SP of the mode specified by *modenum*. This is compatible with the normal use of the `STM` instruction for stack accesses.



For full descending stack, you must use `SRSFD` or `SRSDB`.

## Usage

You can use `SRS` to store return state for an exception handler on a different stack from the one automatically selected.

## Notes

Where addresses are not word-aligned, `SRS` ignores the least significant two bits of the specified address.

The time order of the accesses to individual words of memory generated by `SRS` is not architecturally defined. Do not use this instruction on memory-mapped I/O locations where access order matters.

Do not use `SRS` in User and System modes because these modes do not have a SPSR.

`SRS` is not permitted in a non-secure state if `modenum` specifies monitor mode.

## Availability

This 32-bit instruction is available in A32 and T32.

The 32-bit T32 instruction is not available in the Armv7-M architecture.

There is no 16-bit version of this instruction in T32.

## Example

```
R13_usr    EQU      16
           SRSFD    sp, #R13_usr
```

## Related information

[Stack implementation using LDM and STM](#) on page 119

[Processor modes, and privileged and unprivileged software execution](#) on page 76

[LDM \(A32\)](#) on page 307

[Condition code suffixes](#) on page 138

## 14.137 SSAT (A32)

Signed Saturate to any bit position, with optional shift before saturating.

### Syntax

`SSAT{cond} Rd, #sat, Rm{, shift}`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**sat**

specifies the bit position to saturate to, in the range 1 to 32.

**Rm**

is the register containing the operand.

**shift**

is an optional shift. It must be one of the following:

**ASR #n**

where `n` is in the range 1-32 (A32) or 1-31 (T32)

**LSL #n**

where `n` is in the range 0-31.

## Operation

The `SSAT` instruction applies the specified shift, then saturates a signed value to the signed range  $-2^{\text{sat}-1} \leq x \leq 2^{\text{sat}-1}-1$ .

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Q flag

If saturation occurs, this instruction sets the Q flag. To read the state of the Q flag, use an `MRS` instruction.

## Architectures

This instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Example

```
SSAT      r7, #16, r7, LSL #4
```

## Related information

[SSAT16 \(A32\)](#) on page 421

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[Condition code suffixes](#) on page 138

## 14.138 SSAT16 (A32)

Parallel halfword Saturate.

### Syntax

```
SSAT16{cond} Rd, #sat, Rn
```

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**sat**

specifies the bit position to saturate to, in the range 1 to 16.

**Rn**

is the register holding the operand.

## Operation

Halfword-wise signed saturation to any bit position.

The `SSAT16` instruction saturates each signed halfword to the signed range  $-2^{\text{sat}-1} \leq n \leq 2^{\text{sat}-1}-1$ .

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Q flag

If saturation occurs on either halfword, this instruction sets the Q flag. To read the state of the Q flag, use an `MRS` instruction.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Correct example

```
SSAT16 r7, #12, r7
```

## Incorrect example

```
SSAT16 r1, #16, r2, LSL #4 ; shifts not permitted with halfword
                                ; saturations
```

## Related information

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[Condition code suffixes](#) on page 138

## 14.139 SSAX (A32)

Signed parallel subtract and add halfwords with exchange.

### Syntax

`SSAX{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

**Operation**

This instruction exchanges the two halfwords of the second operand, then performs a subtraction on the two top halfwords of the operands and an addition on the bottom two halfwords. It writes the results into the corresponding halfwords of the destination. The results are modulo  $2^{16}$ . It sets the APSR GE flags.

**Register restrictions**

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

**GE flags**

This instruction does not affect the N, Z, C, V, or Q flags.

It sets the GE flags in the APSR as follows:

**GE[1:0]**

for bits[15:0] of the result.

**GE[3:2]**

for bits[31:16] of the result.

It sets a pair of GE flags to 1 to indicate that the corresponding result is greater than or equal to zero. This is equivalent to an `ADDS` or `SUBS` instruction setting the N and V condition flags to the same value, so that the GE condition passes.

You can use these flags to control a following `SEL` instruction.



GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

**Note**

**Availability**

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[SEL \(A32\)](#) on page 389

[Condition code suffixes](#) on page 138

## 14.140 SSUB8 (A32)

Signed parallel byte-wise subtraction.

### Syntax

`SSUB8{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction subtracts each byte of the second operand from the corresponding byte of the first operand and writes the results into the corresponding bytes of the destination. The results are modulo  $2^8$ . It sets the APSR GE flags.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### GE flags

This instruction does not affect the N, Z, C, V, or Q flags.

It sets the GE flags in the APSR as follows:

**GE[0]**

for bits[7:0] of the result.

**GE[1]**

for bits[15:8] of the result.

**GE[2]**

for bits[23:16] of the result.

**GE[3]**

for bits[31:24] of the result.

It sets a GE flag to 1 to indicate that the corresponding result is greater than or equal to zero. This is equivalent to a `SUBS` instruction setting the and V and V condition flags to the same value, so that the GE condition passes.

You can use these flags to control a following `SEL` instruction.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[SEL \(A32\)](#) on page 389

[Condition code suffixes](#) on page 138

## 14.141 SSUB16 (A32)

Signed parallel halfword-wise subtraction.

### Syntax

`SSUB16{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction subtracts each halfword of the second operand from the corresponding halfword of the first operand and writes the results into the corresponding halfwords of the destination. The results are modulo  $2^{16}$ . It sets the APSR GE flags.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## GE flags

This instruction does not affect the N, Z, C, V, or Q flags.

It sets the GE flags in the APSR as follows:

### GE[1:0]

for bits[15:0] of the result.

### GE[3:2]

for bits[31:16] of the result.

It sets a pair of GE flags to 1 to indicate that the corresponding result is greater than or equal to zero. This is equivalent to a `SUBS` instruction setting the N and V condition flags to the same value, so that the GE condition passes.

You can use these flags to control a following `SEL` instruction.



Note

GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[SEL \(A32\)](#) on page 389

[Condition code suffixes](#) on page 138

## 14.142 STC and STC2 (A32)

Transfer Data between memory and Coprocessor.



STC2 is not supported in Arm®v8.

Note

## Syntax

`op{L}{cond} coproc, CRd, [Rn]`

```
op{L}{cond} coproc, CRd, [Rn, #\{-}{offset}] ; offset addressing
op{L}{cond} coproc, CRd, [Rn, #\{-}{offset}]! ; pre-index addressing
op{L}{cond} coproc, CRd, [Rn], #\{-}{offset} ; post-index addressing
op{L}{cond} coproc, CRd, [Rn], {{option}}
```

where:

***op***

is one of stc or stc2.

***cond***

is an optional condition code.

In A32 code, *cond* is not permitted for stc2.

***L***

is an optional suffix specifying a long transfer.

***coproc***

is the name of the coprocessor the instruction is for. The standard name is  $p_n$ , where  $n$  is an integer that must be:

- In the range 0-15 in Armv7 and earlier.
- 14 in Armv8.

***CRd***

is the coprocessor register to store.

***Rn***

is the register on which the memory address is based. If PC is specified, the value used is the address of the current instruction plus eight.

**-**

is an optional minus sign. If - is present, the offset is subtracted from *Rn*. Otherwise, the offset is added to *Rn*.

***offset***

is an expression evaluating to a multiple of 4, in the range 0 to 1020.

**!**

is an optional suffix. If ! is present, the address including the offset is written back into *Rn*.

***option***

is a coprocessor option in the range 0-255, enclosed in braces.

## Usage

The use of these instructions depends on the coprocessor. See the coprocessor documentation for details.

## Architectures

These 32-bit instructions are available in A32 and T32.

There are no 16-bit versions of these instructions in T32.

## Register restrictions

You cannot use PC for  $R_n$  in the pre-index and post-index instructions. These are the forms that write back to  $R_n$ .

You cannot use PC for  $R_n$  in T32 stc and stc2 instructions.

A32 stc and stc2 instructions where  $R_n$  is PC, are deprecated.

## Related information

[Register-relative and PC-relative expressions](#) on page 222

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.143 STL (A32)

Store-Release Register.



This instruction is supported only in Arm®v8.

Note

### Syntax

`STL{cond} Rt, [Rn]`

`STLB{cond} Rt, [Rn]`

`STLH{cond} Rt, [Rn]`

where:

**cond**

is an optional condition code.

**Rt**

is the register to store.

**Rn**

is the register on which the memory address is based.

## Operation

`STL` stores data to memory. If any loads or stores appear before a store-release in program order, then all observers are guaranteed to observe the loads and stores before observing the store-release. Loads and stores appearing after a store-release are unaffected.

If a store-release follows a load-acquire, each observer is guaranteed to observe them in program order.

There is no requirement that a store-release be paired with a load-acquire.

All store-release operations are multi-copy atomic, meaning that in a multiprocessing system, if one observer observes a write to memory because of a store-release operation, then all observers observe it. Also, all observers observe all such writes to the same location in the same order.

## Restrictions

The address specified must be naturally aligned, or an alignment fault is generated.

The PC must not be used for `Rt` or `Rn`.

## Availability

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction.

## Related information

[LDAEX \(A32\)](#) on page 304

[LDA \(A32\)](#) on page 303

[STLEX \(A32\)](#) on page 429

[Condition code suffixes](#) on page 138

## 14.144 STLEX (A32)

Store-Release Register Exclusive.



This instruction is supported only in Arm®v8.

## Syntax

`STLEX{cond} Rd, Rt, [Rn]`

`STLEXB{cond} Rd, Rt, [Rn]`

`STLEXH{cond} Rd, Rt, [Rn]`

`STLEXD{cond} Rd, Rt, Rt2, [Rn]`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register for the returned status.

**Rt**

is the register to load or store.

**Rt2**

is the second register for doubleword loads or stores.

**Rn**

is the register on which the memory address is based.

## Operation

`STLEX` performs a conditional store to memory. The conditions are as follows:

- If the physical address does not have the Shared TLB attribute, and the executing processor has an outstanding tagged physical address, the store takes place, the tag is cleared, and the value 0 is returned in `Rd`.
- If the physical address does not have the Shared TLB attribute, and the executing processor does not have an outstanding tagged physical address, the store does not take place, and the value 1 is returned in `Rd`.
- If the physical address has the Shared TLB attribute, and the physical address is tagged as exclusive access for the executing processor, the store takes place, the tag is cleared, and the value 0 is returned in `Rd`.
- If the physical address has the Shared TLB attribute, and the physical address is not tagged as exclusive access for the executing processor, the store does not take place, and the value 1 is returned in `Rd`.

If any loads or stores appear before `STLEX` in program order, then all observers are guaranteed to observe the loads and stores before observing the store-release. Loads and stores appearing after `STLEX` are unaffected.

All store-release operations are multi-copy atomic.

## Restrictions

The PC must not be used for any of `Rd`, `Rt`, `Rt2`, or `Rn`.

For `STLEX`, `Rd` must not be the same register as `Rt`, `Rt2`, or `Rn`.

For A32 instructions:

- SP can be used but use of SP for any of `Rd`, `Rt`, or `Rt2` is deprecated.
- For `STLEXD`, `Rt` must be an even numbered register, and not LR.

- $Rt2$  must be  $R(t+1)$ .

For T32 instructions, SP can be used for  $Rn$ , but must not be used for any of  $Rd$ ,  $Rt$ , or  $Rt2$ .

## Usage

Use `LDAEX` and `STLEX` to implement interprocess communication in multiple-processor and shared-memory systems.

For reasons of performance, keep the number of instructions between corresponding `LDAEX` and `STLEX` instructions to a minimum.



The address used in a `STLEX` instruction must be the same as the address in the most recently executed `LDAEX` instruction.

**Note**

## Availability

These 32-bit instructions are available in A32 and T32.

There are no 16-bit versions of these instructions.

## Related information

[LDAEX \(A32\)](#) on page 304

[STL \(A32\)](#) on page 428

[LDA \(A32\)](#) on page 303

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.145 STM (A32)

Store Multiple registers.

### Syntax

`STM{addr_mode}{cond} Rn{!}, reglist{^}`

where:

#### **addr\_mode**

is any one of the following:

##### **IA**

Increment address After each transfer. This is the default, and can be omitted.

##### **IB**

Increment address Before each transfer (A32 only).

**DA**

Decrement address After each transfer (A32 only).

**DB**

Decrement address Before each transfer.

You can also use the stack-oriented addressing mode suffixes, for example when implementing stacks.

***cond***

is an optional condition code.

***Rn***

is the base register, the general-purpose register holding the initial address for the transfer. *Rn* must not be PC.

**!**

is an optional suffix. If ! is present, the final address is written back into *Rn*.

***reglist***

is a list of one or more registers to be stored, enclosed in braces. It can contain register ranges. It must be comma-separated if it contains more than one register or register range. Any combination of registers R0 to R15 (PC) can be transferred in A32 state, but there are some restrictions in T32 state.

**^**

is an optional suffix, available in A32 state only. You must not use it in User mode or System mode. Data is transferred into or out of the User mode registers instead of the current mode registers.

## Restrictions on reglist in 32-bit T32 instructions

In 32-bit T32 instructions:

- The SP cannot be in the list.
- The PC cannot be in the list.
- There must be two or more registers in the list.

If you write an STM instruction with only one register in *reglist*, the assembler automatically substitutes the equivalent STR instruction. Be aware of this when comparing disassembly listings with source code.

You can use the --diag\_warning 1645 assembler command-line option to check when an instruction substitution occurs.

## Restrictions on reglist in A32 instructions

A32 store instructions can have SP and PC in the *reglist* but these instructions that include SP or PC in the *reglist* are deprecated.

## 16-bit instruction

A 16-bit version of this instruction is available in T32 code.

The following restrictions apply to the 16-bit instruction:

- All registers in *reglist* must be Lo registers.
- *Rn* must be a Lo register.
- *addr\_mode* must be omitted (or `IA`), meaning increment address after each transfer.
- Writeback must be specified for `STM` instructions.



**Note**

16-bit T32 `STM` instructions with writeback that specify *Rn* as the lowest register in the *reglist* are deprecated.

In addition, the `PUSH` and `POP` instructions are subsets of the `STM` and `LDM` instructions and can therefore be expressed using the `STM` and `LDM` instructions. Some forms of `PUSH` and `POP` are also 16-bit instructions.

### Storing the base register, with writeback

In A32 or 16-bit T32 instructions, if *Rn* is in *reglist*, and writeback is specified with the `!` suffix:

- If the instruction is `STM{addr_mode}{cond}` and *Rn* is the lowest-numbered register in *reglist*, the initial value of *Rn* is stored. These instructions are deprecated.
- Otherwise, the stored value of *Rn* cannot be relied on, so these instructions are not permitted.

32-bit T32 instructions are not permitted if *Rn* is in *reglist*, and writeback is specified with the `!` suffix.

### Correct example

```
STMDB    r1!, {r3-r6,r11,r12}
```

### Incorrect example

```
STM      r5!, {r5,r4,r9} ; value stored for R5 unknown
```

### Related information

[Stack implementation using LDM and STM](#) on page 119

[POP \(A32\)](#) on page 357

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.146 STR (immediate offset) (A32)

Store with immediate offset, pre-indexed immediate offset, or post-indexed immediate offset.

### Syntax

```
STR{type}{cond} Rt, [Rn {, #offset}] ; immediate offset
STR{type}{cond} Rt, [Rn, #offset]! ; pre-indexed
STR{type}{cond} Rt, [Rn], #offset ; post-indexed
STRD{cond} Rt, Rt2, [Rn {, #offset}] ; immediate offset, doubleword
STRD{cond} Rt, Rt2, [Rn, #offset]! ; pre-indexed, doubleword
STRD{cond} Rt, Rt2, [Rn], #offset ; post-indexed, doubleword
```

where:

**type**

can be any one of:

**B**

Byte

**H**

Halfword

-

omitted, for Word.

**cond**

is an optional condition code.

**Rt**

is the general-purpose register to store.

**Rn**

is the general-purpose register on which the memory address is based.

**offset**

is an offset. If **offset** is omitted, the address is the contents of **Rn**.

**Rt2**

is the additional register to store for doubleword operations.

Not all options are available in every instruction set and architecture.

### Offset ranges and architectures

The following table shows the ranges of offsets and availability of this instruction:

**Table 14-15: Offsets and architectures, STR, word, halfword, and byte**

| Instruction                                       | Immediate offset            | Pre-indexed                 | Post-indexed                |
|---------------------------------------------------|-----------------------------|-----------------------------|-----------------------------|
| A32, word or byte                                 | -4095 to 4095               | -4095 to 4095               | -4095 to 4095               |
| A32, halfword                                     | -255 to 255                 | -255 to 255                 | -255 to 255                 |
| A32, doubleword                                   | -255 to 255                 | -255 to 255                 | -255 to 255                 |
| T32 32-bit encoding, word, halfword, or byte      | -255 to 4095                | -255 to 255                 | -255 to 255                 |
| T32 32-bit encoding, doubleword                   | -1020 to 1020 <sup>22</sup> | -1020 to 1020 <sup>22</sup> | -1020 to 1020 <sup>22</sup> |
| T32 16-bit encoding, word <sup>23</sup>           | 0 to 124 <sup>22</sup>      | Not available               | Not available               |
| T32 16-bit encoding, halfword <sup>23</sup>       | 0 to 62 <sup>24</sup>       | Not available               | Not available               |
| T32 16-bit encoding, byte <sup>23</sup>           | 0 to 31                     | Not available               | Not available               |
| T32 16-bit encoding, word, Rn is SP <sup>25</sup> | 0 to 1020 <sup>22</sup>     | Not available               | Not available               |

## Register restrictions

R<sub>n</sub> must be different from R<sub>t</sub> in the pre-index and post-index forms.

## Doubleword register restrictions

R<sub>n</sub> must be different from R<sub>t2</sub> in the pre-index and post-index forms.

For T32 instructions, you must not specify SP or PC for either R<sub>t</sub> or R<sub>t2</sub>.

For A32 instructions:

- R<sub>t</sub> must be an even-numbered register.
- R<sub>t</sub> must not be LR.
- Arm strongly recommends that you do not use R12 for R<sub>t</sub>.
- R<sub>t2</sub> must be R<sub>(t + 1)</sub>.

## Use of PC

In A32 instructions you can use PC for R<sub>t</sub> in STR word instructions and PC for R<sub>n</sub> in STR instructions with immediate offset syntax (that is the forms that do not writeback to the R<sub>n</sub>). However, this is deprecated.

Other uses of PC are not permitted in these A32 instructions.

In T32 code, using PC in STR instructions is not permitted.

<sup>22</sup> Must be divisible by 4.

<sup>23</sup> Rt and Rn must be in the range R0-R7.

<sup>24</sup> Must be divisible by 2.

<sup>25</sup> Rt must be in the range R0-R7.

## Use of SP

You can use SP for  $Rn$ .

In A32 code, you can use SP for  $Rt$  in word instructions. You can use SP for  $Rt$  in non-word instructions in A32 code but this is deprecated.

In T32 code, you can use SP for  $Rt$  in word instructions only. All other use of SP for  $Rt$  in this instruction is not permitted in T32 code.

### Example

```
STR      r2, [r9,#consta-struc] ; consta-struc is an expression
; evaluating to a constant in
; the range 0-4095.
```

### Related information

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.147 STR (register offset) (A32)

Store with register offset, pre-indexed register offset, or post-indexed register offset.

### Syntax

```
STR{type}{cond} Rt, [Rn, ±Rm {, shift}] ; register offset
STR{type}{cond} Rt, [Rn, ±Rm {, shift}]! ; pre-indexed ; A32 only
STR{type}{cond} Rt, [Rn], ±Rm {, shift} ; post-indexed ; A32 only
STRD{cond} Rt, Rt2, [Rn, ±Rm] ; register offset, doubleword ; A32 only
STRD{cond} Rt, Rt2, [Rn, ±Rm]! ; pre-indexed, doubleword ; A32 only
STRD{cond} Rt, Rt2, [Rn], ±Rm ; post-indexed, doubleword ; A32 only
```

where:

#### **type**

can be any one of:

#### **B**

Byte

#### **H**

Halfword

#### -

omitted, for Word.

***cond***

is an optional condition code.

***Rt***

is the general-purpose register to store.

***Rn***

is the general-purpose register on which the memory address is based.

***Rm***

is a general-purpose register containing a value to be used as the offset.  $<-Rm>$  is not permitted in T32 code.

***shift***

is an optional shift.

***Rt2***

is the additional register to store for doubleword operations.

Not all options are available in every instruction set and architecture.

## Offset register and shift options

The following table shows the ranges of offsets and availability of this instruction:

| Instruction                                                                                                                                                                          | $\pm Rm$ | shift           |                 |     |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|-----------------|-----------------|-----|
| A32, word or byte Where $\pm Rm$ is shown, you can use $-Rm$ , $+Rm$ , or $Rm$ . Where $+Rm$ is shown, you cannot use $-Rm$ . $Rt$ , $Rn$ , and $Rm$ must all be in the range R0-R7. | $\pm Rm$ | LSL<br>$\#0-31$ | LSR<br>$\#1-32$ |     |
|                                                                                                                                                                                      |          | ASR<br>$\#1-32$ | ROR<br>$\#1-31$ | RRX |
| A32, halfword                                                                                                                                                                        | $\pm Rm$ |                 | Not available   |     |
| A32, doubleword                                                                                                                                                                      | $\pm Rm$ |                 | Not available   |     |
| T32 32-bit encoding, word, halfword, or byte                                                                                                                                         | $+Rm$    | LSL<br>$\#0-3$  |                 |     |
| T32 16-bit encoding, all except doubleword 2                                                                                                                                         | $+Rm$    |                 | Not available   |     |

## Register restrictions

In the pre-index and post-index forms,  $Rn$  must be different from  $Rt$ .

## Doubleword register restrictions

For A32 instructions:

- $Rt$  must be an even-numbered register.
- $Rt$  must not be LR.
- Arm strongly recommends that you do not use R12 for  $Rt$ .
- $Rt2$  must be  $R(t + 1)$ .
- $Rn$  must be different from  $Rt2$  in the pre-index and post-index forms.

## Use of PC

In A32 instructions you can use PC for  $Rt$  in `STR` word instructions, and you can use PC for  $Rn$  in `STR` instructions with register offset syntax (that is, the forms that do not writeback to the  $Rn$ ). However, this is deprecated.

Other uses of PC are not permitted in A32 instructions.

Use of PC in `STR` T32 instructions is not permitted.

## Use of SP

You can use SP for  $Rn$ .

In A32 code, you can use SP for  $Rt$  in word instructions. You can use SP for  $Rt$  in non-word A32 instructions but this is deprecated.

You can use SP for  $Rm$  in A32 instructions but this is deprecated.

In T32 code, you can use SP for  $Rt$  in word instructions only. All other use of SP for  $Rt$  in this instruction is not permitted in T32 code.

Use of SP for  $Rm$  is not permitted in T32 state.

## Related information

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.148 STR, unprivileged (A32)

Unprivileged Store, byte, halfword, or word.

### Syntax

`STR{type}T{cond} Rt, [Rn {, #offset}] ; immediate offset (T32, 32-bit encoding only)`

`STR{type}T{cond} Rt, [Rn] {, #offset} ; post-indexed (A32 only)`

`STR{type}T{cond} Rt, [Rn], ±Rm {, shift} ; post-indexed (register) (A32 only)`

where:

#### **type**

can be any one of:

#### **B**

Byte

#### **H**

Halfword

omitted, for Word.

***cond***

is an optional condition code.

***Rt***

is the register to load or store.

***Rn***

is the register on which the memory address is based.

***offset***

is an offset. If offset is omitted, the address is the value in *Rn*.

***Rm***

is a register containing a value to be used as the offset. *Rm* must not be PC.

***shift***

is an optional shift.

## Operation

When these instructions are executed by privileged software, they access memory with the same restrictions as they would have if they were executed by unprivileged software.

When executed by unprivileged software, these instructions behave in the same way as the corresponding store instruction, for example STRBT behaves in the same way as STRB.

## Offset ranges and architectures

The following table shows the ranges of offsets and availability of this instruction:

| Instruction                                    | Immediate offset | Post-indexed  | $\pm Rm1$ | shift         |
|------------------------------------------------|------------------|---------------|-----------|---------------|
| A32, word or byte You can use -Rm, +Rm, or Rm. | Not available    | -4095 to 4095 | $\pm Rm$  | LSL #0-31     |
|                                                |                  |               |           | LSR #1-32     |
|                                                |                  |               |           | ASR #1-32     |
|                                                |                  |               |           | ROR #1-31     |
|                                                |                  |               |           | RRX           |
| A32, halfword                                  | Not available    | -255 to 255   | $\pm Rm$  | Not available |
| T32 32-bit encoding, word, halfword, or byte   | 0 to 255         | Not available |           | Not available |

## Related information

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.149 STREX (A32)

Store Register Exclusive.

### Syntax

`STREX{cond} Rd, Rt, [Rn {, #offset}]`

`STREXB{cond} Rd, Rt, [Rn]`

`STREXH{cond} Rd, Rt, [Rn]`

`STREXD{cond} Rd, Rt, Rt2, [Rn]`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register for the returned status.

***Rt***

is the register to store.

***Rt2***

is the second register for doubleword stores.

***Rn***

is the register on which the memory address is based.

***offset***

is an optional offset applied to the value in *Rn*. *offset* is permitted only in T32 instructions. If *offset* is omitted, an offset of 0 is assumed.

### Operation

STREX performs a conditional store to memory. The conditions are as follows:

- If the physical address does not have the Shared TLB attribute, and the executing processor has an outstanding tagged physical address, the store takes place, the tag is cleared, and the value 0 is returned in *Rd*.
- If the physical address does not have the Shared TLB attribute, and the executing processor does not have an outstanding tagged physical address, the store does not take place, and the value 1 is returned in *Rd*.
- If the physical address has the Shared TLB attribute, and the physical address is tagged as exclusive access for the executing processor, the store takes place, the tag is cleared, and the value 0 is returned in *Rd*.
- If the physical address has the Shared TLB attribute, and the physical address is not tagged as exclusive access for the executing processor, the store does not take place, and the value 1 is returned in *Rd*.

## Restrictions

PC must not be used for any of *Rd*, *Rt*, *Rt2*, or *Rn*.

For **STREX**, *Rd* must not be the same register as *Rt*, *Rt2*, or *Rn*.

For A32 instructions:

- SP can be used but use of SP for any of *Rd*, *Rt*, or *Rt2* is deprecated.
- For **STREXD**, *Rt* must be an even numbered register, and not LR.
- *Rt2* must be *R(t+1)*.
- *offset* is not permitted.

For T32 instructions:

- SP can be used for *Rn*, but must not be used for any of *Rd*, *Rt*, or *Rt2*.
- The value of *offset* can be any multiple of four in the range 0-1020.

## Usage

Use **LDREX** and **STREX** to implement interprocess communication in multiple-processor and shared-memory systems.

For reasons of performance, keep the number of instructions between corresponding **LDREX** and **STREX** instructions to a minimum.



The address used in a **STREX** instruction must be the same as the address in the most recently executed **LDREX** instruction.

Note

## Availability

All these 32-bit instructions are available in A32 and T32.

There are no 16-bit versions of these instructions.

## Examples

```

    MOV r1, #0x1          ; load the 'lock taken' value
try   LDREX r0, [LockAddr] ; load the lock value
      CMP r0, #0           ; is the lock free?
      STREXEQ r0, r1, [LockAddr]; try and claim the lock
      CMPEQ r0, #0          ; did this succeed?
      BNE try               ; no - try again
      ....                  ; yes - we have the lock

```

## Related information

[Condition code suffixes](#) on page 138

[Address alignment in A32/T32 code](#) on page 157

## 14.150 SUB (A32)

Subtract without carry.

### Syntax

`SUB{S}{cond} {Rd, } Rn, Operand2`

`SUB{cond} {Rd, } Rn, #imm12 ; T32, 32-bit encoding only`

where:

**S**

is an optional suffix. If s is specified, the condition flags are updated on the result of the operation.

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Operand2**

is a flexible second operand.

**imm12**

is any value in the range 0-4095.

### Operation

The `SUB` instruction subtracts the value of `Operand2` or `imm12` from the value in `Rn`.

In certain circumstances, the assembler can substitute one instruction for another. Be aware of this when reading disassembly listings.

### Use of PC and SP in T32 instructions

In general, you cannot use PC (`R15`) for `Rd`, or any operand. The exception is you can use PC for `Rn` in 32-bit T32 `SUB` instructions, with a constant `Operand2` value in the range 0-4095, and no s suffix. These instructions are useful for generating PC-relative addresses. Bit[1] of the PC value reads as 0 in this case, so that the base address for the calculation is always word-aligned.

Generally, you cannot use SP (`R13`) for `Rd`, or any operand, except that you can use SP for `Rn`.

### Use of PC and SP in A32 instructions

You cannot use PC for `Rd` or any operand in a `SUB` instruction that has a register-controlled shift.

In `SUB` instructions without register-controlled shift, use of PC is deprecated except for the following cases:

- Use of PC for `Rd`.
- Use of PC for `Rn` in the instruction `SUB{cond} Rd, Rn, #Constant`.

If you use PC (`R15`) as `Rn` or `Rm`, the value used is the address of the instruction plus 8.

If you use PC as `Rd`:

- Execution branches to the address corresponding to the result.
- If you use the `s` suffix, see the `SUBS pc,lr` instruction.

You can use SP for `Rn` in `SUB` instructions, however, `SUBS pc, sp, #Constant` is deprecated.

You can use SP in `SUB` (register) if `Rn` is SP and `shift` is omitted or `LSL #1`, `LSL #2`, or `LSL #3`.

Other uses of SP in A32 `SUB` instructions are deprecated.



Use of SP and PC is deprecated in A32 instructions.

**Note**

## Condition flags

If `s` is specified, the `SUB` instruction updates the N, Z, C and V flags according to the result.

## 16-bit instructions

The following forms of this instruction are available in T32 code, and are 16-bit instructions:

### `SUBS Rd, Rn, Rm`

`Rd`, `Rn` and `Rm` must all be Lo registers. This form can only be used outside an IT block.

### `SUB{cond} Rd, Rn, Rm`

`Rd`, `Rn` and `Rm` must all be Lo registers. This form can only be used inside an IT block.

### `SUBS Rd, Rn, #imm`

`imm` range 0-7. `Rd` and `Rn` must both be Lo registers. This form can only be used outside an IT block.

### `SUB{cond} Rd, Rn, #imm`

`imm` range 0-7. `Rd` and `Rn` must both be Lo registers. This form can only be used inside an IT block.

### `SUBS Rd, Rd, #imm`

`imm` range 0-255. `Rd` must be a Lo register. This form can only be used outside an IT block.

### `SUB{cond} Rd, Rd, #imm`

`imm` range 0-255. `Rd` must be a Lo register. This form can only be used inside an IT block.

**SUB{cond} SP, SP, #imm***imm* range 0-508, word aligned.

## Example

```
SUBS    r8, r6, #240      ; sets the flags based on the result
```

## Multiword arithmetic examples

These instructions subtract one 96-bit integer contained in R9, R10, and R11 from another 96-bit integer contained in R6, R7, and R8, and place the result in R3, R4, and R5:

```
SUBS    r3, r6, r9
SBCS    r4, r7, r10
SBC     r5, r8, r11
```

For clarity, these examples use consecutive registers for multiword values. There is no requirement to do this. The following, for example, is perfectly valid:

```
SUBS    r6, r6, r9
SBCS    r9, r2, r1
SBC     r2, r8, r11
```

## Related information

[Flexible second operand \(Operand2\)](#) on page 244

[SUBS pc, lr \(A32\)](#) on page 444

[Condition code suffixes](#) on page 138

## 14.151 SUBS pc, lr (A32)

Exception return, without popping anything from the stack.

### Syntax

```
SUBS{cond} pc, lr, #imm ; A32 and T32 code
```

```
MOVS{cond} pc, lr ; A32 and T32 code
```

```
op1 S{cond} pc, Rn, #imm ; A32 code only and is deprecated
```

```
op1 S{cond} pc, Rn, Rm {, shift} ; A32 code only and is deprecated
```

```
op2 S{cond} pc, #imm ; A32 code only and is deprecated
```

```
op2 S{cond} pc, Rm {, shift} ; A32 code only and is deprecated
```

where:

***op1***

is one of ADC, ADD, AND, BIC, EOR, ORN, ORR, RSB, RSC, SBC, and SUB.

***op2***

is one of MOV and MVN.

***cond***

is an optional condition code.

***imm***

is an immediate value. In T32 code, it is limited to the range 0-255. In A32 code, it is a flexible second operand.

**Rn**

is the first general-purpose source register. Arm deprecates the use of any register except LR.

**Rm**

is the optionally shifted second or only general-purpose register.

***shift***

is an optional condition code.

## Usage

SUBS pc, lr, #*imm* subtracts a value from the link register and loads the PC with the result, then copies the SPSR to the CPSR.

You can use SUBS pc, lr, #*imm* to return from an exception if there is no return state on the stack. The value of #*imm* depends on the exception to return from.

## Notes

SUBS pc, lr, #*imm* writes an address to the PC. The alignment of this address must be correct for the instruction set in use after the exception return:

- For a return to A32, the address written to the PC must be word-aligned.
- For a return to T32, the address written to the PC must be halfword-aligned.
- For a return to Jazelle, there are no alignment restrictions on the address written to the PC.

No special precautions are required in software to follow these rules, if you use the instruction to return after a valid exception entry mechanism.

In T32, only SUBS{cond} pc, lr, #*imm* is a valid instruction. MOVS pc, lr is a synonym of SUBS pc, lr, #0. Other instructions are undefined.

In A32, only SUBS{cond} pc, lr, #*imm* and MOVS{cond} pc, lr are valid instructions. Other instructions are deprecated.



Do not use these instructions in User mode or System mode. The assembler cannot warn you about this.

**Caution**

## Availability

This 32-bit instruction is available in A32 and T32.

The 32-bit T32 instruction is not available in the Arm®v7-M architecture.

There is no 16-bit version of this instruction in T32.

## Related information

[AND \(A32\)](#) on page 260

## 14.152 SVC (A32)

SuperVisor Call.

### Syntax

`SVC{cond} #imm`

where:

**cond**

is an optional condition code.

**imm**

is an expression evaluating to an integer in the range:

- 0 to  $2^{24}$ -1 (a 24-bit value) in an A32 instruction.
- 0-255 (an 8-bit value) in a T32 instruction.

### Operation

The svc instruction causes an exception. This means that the processor mode changes to Supervisor, the CPSR is saved to the Supervisor mode SPSR, and execution branches to the SVC vector.

*imm* is ignored by the processor. However, it can be retrieved by the exception handler to determine what service is being requested.



svc was called swi in earlier versions of the A32 assembly language. swi instructions disassemble to svc, with a comment to say that this was formerly swi.

## Condition flags

This instruction does not change the flags.

## Availability

This instruction is available in A32 and 16-bit T32 and in the Arm®v7 architectures.

There is no 32-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.153 SWP and SWPB (A32)

Swap data between registers and memory.



These instructions are not supported in Arm®v8.

Note

### Syntax

`SWP{B}{cond} Rt, Rt2, [Rn]`

where:

**cond**

is an optional condition code.

**B**

is an optional suffix. If **B** is present, a byte is swapped. Otherwise, a 32-bit word is swapped.

**Rt**

is the destination register. **Rt** must not be PC.

**Rt2**

is the source register. **Rt2** can be the same register as **Rt**. **Rt2** must not be PC.

**Rn**

contains the address in memory. **Rn** must be a different register from both **Rt** and **Rt2**. **Rn** must not be PC.

### Usage

You can use `SWP` and `SWPB` to implement semaphores:

- Data from memory is loaded into **Rt**.
- The contents of **Rt2** are saved to memory.

- If  $Rt2$  is the same register as  $Rt$ , the contents of the register are swapped with the contents of the memory location.

## Note

The use of `SWP` and `SWPB` is deprecated. You can use `LDREX` and `STREX` instructions to implement more sophisticated semaphores.

## Availability

These instructions are available in A32.

There are no T32 `SWP` or `SWPB` instructions.

## Related information

[LDREX \(A32\)](#) on page 323

[Condition code suffixes](#) on page 138

## 14.154 SXTAB (A32)

Sign extend Byte with Add, to extend an 8-bit value to a 32-bit value.

### Syntax

`SXTAB{cond} {Rd}, Rn, Rm {,rotation}`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the number to add.

**Rm**

is the register holding the value to extend.

**rotation**

is one of:

**ROR #8**

Value from  $Rm$  is rotated right 8 bits.

**ROR #16**

Value from  $Rm$  is rotated right 16 bits.

**ROR #24**

Value from  $Rm$  is rotated right 24 bits.

If *rotation* is omitted, no rotation is performed.

## Operation

This instruction does the following:

1. Rotate the value from *Rm* right by 0, 8, 16 or 24 bits.
2. Extract bits[7:0] from the value obtained.
3. Sign extend to 32 bits.
4. Add the value from *Rn*.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.155 SXTAB16 (A32)

Sign extend two Bytes with Add, to extend two 8-bit values to two 16-bit values.

### Syntax

`SXTAB16{cond} {Rd}, Rn, Rm{,rotation}`

where:

#### **cond**

is an optional condition code.

#### **Rd**

is the destination register.

#### **Rn**

is the register holding the number to add.

***Rm***

is the register holding the value to extend.

***rotation***

is one of:

**ROR #8**

Value from *Rm* is rotated right 8 bits.

**ROR #16**

Value from *Rm* is rotated right 16 bits.

**ROR #24**

Value from *Rm* is rotated right 24 bits.

If *rotation* is omitted, no rotation is performed.

## Operation

This instruction does the following:

1. Rotate the value from *Rm* right by 0, 8, 16 or 24 bits.
2. Extract bits[23:16] and bits[7:0] from the value obtained.
3. Sign extend to 16 bits.
4. Add them to bits[31:16] and bits[15:0] respectively of *Rn* to form bits[31:16] and bits[15:0] of the result.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.156 SXTAH (A32)

Sign extend Halfword with Add, to extend a 16-bit value to a 32-bit value.

### Syntax

`SXTAH{cond} {Rd}, Rn, Rm {,rotation}`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the number to add.

**Rm**

is the register holding the value to extend.

**rotation**

is one of:

**ROR #8**

Value from *Rm* is rotated right 8 bits.

**ROR #16**

Value from *Rm* is rotated right 16 bits.

**ROR #24**

Value from *Rm* is rotated right 24 bits.

If *rotation* is omitted, no rotation is performed.

### Operation

This instruction does the following:

1. Rotate the value from *Rm* right by 0, 8, 16 or 24 bits.
2. Extract bits[15:0] from the value obtained.
3. Sign extend to 32 bits.
4. Add the value from *Rn*.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not change the flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7 E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.157 SXTB (A32)

Sign extend Byte, to extend an 8-bit value to a 32-bit value.

### Syntax

`SXTB{cond} {Rd}, Rm[, rotation]`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm**

is the register holding the value to extend.

**rotation**

is one of:

**ROR #8**

Value from *Rm* is rotated right 8 bits.

**ROR #16**

Value from *Rm* is rotated right 16 bits.

**ROR #24**

Value from *Rm* is rotated right 24 bits.

If *rotation* is omitted, no rotation is performed.

## Operation

This instruction does the following:

1. Rotates the value from *Rm* right by 0, 8, 16 or 24 bits.
2. Extracts bits[7:0] from the value obtained.
3. Sign extends to 32 bits.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.

## 16-bit instructions

The following form of this instruction is available in T32 code, and is a 16-bit instruction:

**SXTB Rd, Rm**

Rd and Rm must both be Lo registers.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

The 16-bit instruction is available in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.158 SXTB16 (A32)

Sign extend two bytes.

### Syntax

`SXTB16{cond} {Rd}, Rm{, rotation}`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm**

is the register holding the value to extend.

**rotation**

is one of:

**ROR #8**

Value from Rm is rotated right 8 bits.

**ROR #16**

Value from  $Rm$  is rotated right 16 bits.

**ROR #24**

Value from  $Rm$  is rotated right 24 bits.

If  $rotation$  is omitted, no rotation is performed.

**Operation**

`SXTB16` extends two 8-bit values to two 16-bit values. It does this by:

1. Rotating the value from  $Rm$  right by 0, 8, 16 or 24 bits.
2. Extracting bits[23:16] and bits[7:0] from the value obtained.
3. Sign extending to 16 bits each.

**Register restrictions**

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

**Condition flags**

This instruction does not change the flags.

**Availability**

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

**Related information**

[Condition code suffixes](#) on page 138

## 14.159 SXTH (A32)

Sign extend Halfword.

**Syntax**

`SXTH{cond} {Rd}, Rm{,rotation}`

where:

**cond**

is an optional condition code.

***Rd***

is the destination register.

***Rm***

is the register holding the value to extend.

***rotation***

is one of:

**ROR #8**

Value from *Rm* is rotated right 8 bits.

**ROR #16**

Value from *Rm* is rotated right 16 bits.

**ROR #24**

Value from *Rm* is rotated right 24 bits.

If *rotation* is omitted, no rotation is performed.

## Operation

**SXTB** extends a 16-bit value to a 32-bit value. It does this by:

1. Rotating the value from *Rm* right by 0, 8, 16 or 24 bits.
2. Extracting bits[15:0] from the value obtained.
3. Sign extending to 32 bits.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.

## 16-bit instructions

The following form of this instruction is available in T32 code, and is a 16-bit instruction:

**SXTB Rd, Rm**

*Rd* and *Rm* must both be Lo registers.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

The 16-bit instruction is available in T32.

## Example

```
SXTH      r3, r9
```

## Incorrect example

```
SXTH      r3, r9, ROR #12 ; rotation must be 0, 8, 16, or 24.
```

## Related information

[Condition code suffixes](#) on page 138

## 14.160 SYS (A32)

Execute system coprocessor instruction.

### Syntax

```
SYS{cond} instruction{}, Rn}
```

where:

**cond**

is an optional condition code.

**instruction**

is the coprocessor instruction to execute.

**Rn**

is an operand to the instruction. For instructions that take an argument, *Rn* is compulsory. For instructions that do not take an argument, *Rn* is optional and if it is not specified, *R0* is used.

*Rn* must not be PC.

### Usage

You can use this pseudo-instruction to execute special coprocessor instructions such as cache, branch predictor, and TLB operations. The instructions operate by writing to special write-only coprocessor registers. The instruction names are the same as the write-only coprocessor register names and are listed in the *Arm Architecture Reference Manual*. For example:

```
SYS  ICIAALLUIS ; invalidates all instruction caches Inner Shareable
      ; to Point of Unification and also flushes branch
      ; target cache.
```

### Availability

This 32-bit instruction is available in A32 and T32.

The 32-bit T32 instruction is not available in the Arm®v7-M architecture.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138  
[A-Profile Architectures](#)

# 14.161 TBB and TBH (A32)

Table Branch Byte and Table Branch Halfword.

## Syntax

TBB [Rn, Rm]

TBH [Rn, Rm, LSL #1]

where:

**Rn**

is the base register. This contains the address of the table of branch lengths. *Rn* must not be SP.

If PC is specified for *Rn*, the value used is the address of the instruction plus 4.

**Rm**

is the index register. This contains an index into the table.

*Rm* must not be PC or SP.

## Operation

These instructions cause a PC-relative forward branch using a table of single byte offsets (TBB) or halfword offsets (TBH). *Rn* provides a pointer to the table, and *Rm* supplies an index into the table. The branch length is twice the value of the byte (TBB) or the halfword (TBH) returned from the table. The target of the branch table must be in the same execution state.

## Architectures

These 32-bit T32 instructions are available.

There are no versions of these instructions in A32 or in 16-bit T32 encodings.

## Related information

[Address alignment in A32/T32 code](#) on page 157

## 14.162 TEQ (A32)

Test Equivalence.

### Syntax

`TEQ{cond} Rn, Operand2`

where:

**cond**

is an optional condition code.

**Rn**

is the general-purpose register holding the first operand.

**Operand2**

is a flexible second operand.

### Usage

This instruction tests the value in a register against *operand2*. It updates the condition flags on the result, but does not place the result in any register.

The `TEQ` instruction performs a bitwise Exclusive OR operation on the value in *Rn* and the value of *operand2*. This is the same as an `EORS` instruction, except that the result is discarded.

Use the `TEQ` instruction to test if two values are equal, without affecting the V or C flags (as `CMP` does).

`TEQ` is also useful for testing the sign of a value. After the comparison, the N flag is the logical Exclusive OR of the sign bits of the two operands.

### Register restrictions

In this T32 instruction, you cannot use SP or PC for *Rn* or *operand2*.

In this A32 instruction, use of SP or PC is deprecated.

For A32 instructions:

- If you use PC (`R15`) as *Rn*, the value used is the address of the instruction plus 8.
- You cannot use PC for any operand in any data processing instruction that has a register-controlled shift.

### Condition flags

This instruction:

- Updates the N and Z flags according to the result.
- Can update the C flag during the calculation of *operand2*.
- Does not affect the V flag.

## Architectures

This instruction is available in A32 and T32.

### Correct example

```
TEQEQ    r10, r9
```

### Incorrect example

```
TEQ      pc, r1, ROR r0      ; PC not permitted with register
                                ; controlled shift
```

### Related information

[Flexible second operand \(Operand2\)](#) on page 244

[Condition code suffixes](#) on page 138

## 14.163 TST (A32)

Test bits.

### Syntax

TST{*cond*} *Rn*, *Operand2*

where:

***cond***

is an optional condition code.

***Rn***

is the general-purpose register holding the first operand.

***Operand2***

is a flexible second operand.

### Operation

This instruction tests the value in a register against *operand2*. It updates the condition flags on the result, but does not place the result in any register.

The `TST` instruction performs a bitwise AND operation on the value in *Rn* and the value of *operand2*. This is the same as an `ANDS` instruction, except that the result is discarded.

### Register restrictions

In this T32 instruction, you cannot use SP or PC for *Rn* or *operand2*.

In this A32 instruction, use of SP or PC is deprecated.

For A32 instructions:

- If you use PC (R15) as  $R_n$ , the value used is the address of the instruction plus 8.
- You cannot use PC for any operand in any data processing instruction that has a register-controlled shift.

## Condition flags

This instruction:

- Updates the N and Z flags according to the result.
- Can update the C flag during the calculation of  $operand_2$ .
- Does not affect the V flag.

## 16-bit instructions

The following form of the `TST` instruction is available in T32 code, and is a 16-bit instruction:

**TST  $R_n$ ,  $R_m$**

$R_n$  and  $R_m$  must both be Lo registers.

## Architectures

This instruction is available A32 and T32.

## Examples

```
TST      r0, #0x3F8
TSTNE   r1, r5, ASR r1
```

## Related information

[Flexible second operand \(Operand2\)](#) on page 244

[Condition code suffixes](#) on page 138

## 14.164 TT, TTT, TTA, TTAT (A32)

Test Target (Alternate Domain, Unprivileged).

### Syntax

```
TT{cond}{q} Rd, Rn ; T1 TT general registers (T32)
```

```
TTA{cond}{q} Rd, Rn ; T1 TTA general registers (T32)
```

```
TTAT{cond}{q} Rd, Rn ; T1 TTAT general registers (T32)
```

```
TTT{cond}{q} Rd, Rn ; T1 TTT general registers (T32)
```

Where:

***cond***

Is an optional instruction condition code. See [Condition Codes](#). It specifies the condition under which the instruction is executed. If **cond** is omitted, it defaults to always (`A1`).

***q***

Is an optional instruction width specifier. See [Instruction width specifiers](#).

***Rd***

Is the destination general-purpose register into which the status result of the target test is written.

***Rn***

Is the general-purpose base register.

## Usage

Test Target (TT) queries the security state and access permissions of a memory location.

Test Target Unprivileged (TTT) queries the security state and access permissions of a memory location for an unprivileged access to that location.

Test Target Alternate Domain (TTA) and Test Target Alternate Domain Unprivileged (TTAT) query the security state and access permissions of a memory location for a Non-secure access to that location. These instructions are only valid when executing in Secure state, and are UNDEFINED if used from Non-secure state.

These instructions return the security state and access permissions in the destination register, the contents of which are as follows:

| Bits   | Name    | Description                                                                                                                                                                                                                                                                                                                       |
|--------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [7:0]  | MREGION | The MPU region that the address maps to. This field is 0 if MRVALID is 0.                                                                                                                                                                                                                                                         |
| [15:8] | SREGION | The SAU region that the address maps to. This field is only valid if the instruction is executed from Secure state. This field is 0 if SRVALID is 0.                                                                                                                                                                              |
| [16]   | MRVALID | Set to 1 if the MREGION content is valid. Set to 0 if the MREGION content is invalid.                                                                                                                                                                                                                                             |
| [17]   | SRVALID | Set to 1 if the SREGION content is valid. Set to 0 if the SREGION content is invalid.                                                                                                                                                                                                                                             |
| [18]   | R       | Read accessibility. Set to 1 if the memory location can be read according to the permissions of the selected MPU when operating in the current mode. For <code>TTT</code> and <code>TTAT</code> , this bit returns the permissions for unprivileged access, regardless of whether the current mode is privileged or unprivileged. |

| Bits    | Name    | Description                                                                                                                                                                                                                                                                                                                                                                             |
|---------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [19]    | RW      | Read/write accessibility. Set to 1 if the memory location can be read and written according to the permissions of the selected MPU when operating in the current mode. For <b>TTT</b> and <b>TTAT</b> , this bit returns the permissions for unprivileged access, regardless of whether the current mode is privileged or unprivileged.                                                 |
| [20]    | NSR     | Equal to R AND NOT S. Can be used in combination with the <b>LSLS</b> (immediate) instruction to check both the MPU and SAU/IDAU permissions. This bit is only valid if the instruction is executed from Secure state and the R field is valid.                                                                                                                                         |
| [21]    | NSRW    | Equal to RW AND NOT S. Can be used in combination with the <b>LSLS</b> (immediate) instruction to check both the MPU and SAU/IDAU permissions. This bit is only valid if the instruction is executed from Secure state and the RW field is valid.                                                                                                                                       |
| [22]    | S       | Security. A value of 1 indicates the memory location is Secure, and a value of 0 indicates the memory location is Non-secure. This bit is only valid if the instruction is executed from Secure state.                                                                                                                                                                                  |
| [23]    | IRVALID | IREGION valid flag. For a Secure request, indicates the validity of the IREGION field. Set to 1 if the IREGION content is valid. Set to 0 if the IREGION content is invalid.<br><br>This bit is always 0 if the IDAU cannot provide a region number, the address is exempt from security attribution, or if the requesting <b>TT</b> instruction is executed from the Non-secure state. |
| [31:24] | IREGION | IDAU region number. Indicates the IDAU region number containing the target address. This field is 0 if IRVALID is 0.                                                                                                                                                                                                                                                                    |

Invalid fields are 0.

The MREGION field is invalid and 0 if any of the following conditions are true:

- The MPU is not present or MPU\_CTRL.ENABLE is 0.
- The address did not match any enabled MPU regions.
- The address matched multiple MPU regions.
- **TT** or **TTT** was executed from an unprivileged mode.

The SREGION field is invalid and 0 if any of the following conditions are true:

- SAU\_CTRL.ENABLE is set to 0.
- The address did not match any enabled SAU regions.

- The address matched multiple SAU regions.
- The SAU attributes were overridden by the IDAU.
- The instruction is executed from Non-secure state, or is executed on a processor that does not implement the Arm®v8-M Security Extensions.

The R and RW bits are invalid and 0 if any of the following conditions are true:

- The address matched multiple MPU regions.
- TT or TTT is executed from an unprivileged mode.

## Related information

[Condition code suffixes](#) on page 138

[Instruction width specifiers](#) on page 244

## 14.165 UADD8 (A32)

Unsigned parallel byte-wise addition.

### Syntax

`UADD8{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination general-purpose register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction performs four unsigned integer additions on the corresponding bytes of the operands and writes the results into the corresponding bytes of the destination. The results are modulo  $2^8$ . It sets the APSR GE flags.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### GE flags

This instruction does not affect the N, Z, C, V, or Q flags.

It sets the GE flags in the APSR as follows:

**GE[0]**

for bits[7:0] of the result.

**GE[1]**

for bits[15:8] of the result.

**GE[2]**

for bits[23:16] of the result.

**GE[3]**

for bits[31:24] of the result.

It sets a GE flag to 1 to indicate that the corresponding result overflowed, generating a carry. This is equivalent to an `ADDS` instruction setting the C condition flag to 1.

You can use these flags to control a following `SEL` instruction.

**Availability**

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

**Related information**

[SEL \(A32\)](#) on page 389

[Condition code suffixes](#) on page 138

## 14.166 UADD16 (A32)

Unsigned parallel halfword-wise addition.

**Syntax**

`UADD16{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination general-purpose register.

**Rm, Rn**

are the general-purpose registers holding the operands.

## Operation

This instruction performs two unsigned integer additions on the corresponding halfwords of the operands and writes the results into the corresponding halfwords of the destination. The results are modulo  $2^{16}$ . It sets the APSR GE flags.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## GE flags

This instruction does not affect the N, Z, C, V, or Q flags.

It sets the GE flags in the APSR as follows:

### GE[1:0]

for bits[15:0] of the result.

### GE[3:2]

for bits[31:16] of the result.

It sets a pair of GE flags to 1 to indicate that the corresponding result overflowed, generating a carry. This is equivalent to an `ADDS` instruction setting the C condition flag to 1.

You can use these flags to control a following `SEL` instruction.



GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

Note

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[SEL \(A32\)](#) on page 389

[Condition code suffixes](#) on page 138

## 14.167 UASX (A32)

Unsigned parallel add and subtract halfwords with exchange.

### Syntax

`UASX{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination general-purpose register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction exchanges the two halfwords of the second operand, then performs an addition on the two top halfwords of the operands and a subtraction on the bottom two halfwords. It writes the results into the corresponding halfwords of the destination. The results are modulo  $2^{16}$ . It sets the APSR GE flags.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### GE flags

This instruction does not affect the N, Z, C, V, or Q flags.

It sets the GE flags in the APSR as follows:

**GE[1:0]**

for bits[15:0] of the result.

**GE[3:2]**

for bits[31:16] of the result.

It sets GE[1:0] to 1 to indicate that the subtraction gave a result greater than or equal to zero, meaning a borrow did not occur. This is equivalent to a `SUBS` instruction setting the C condition flag to 1.

It sets GE[3:2] to 1 to indicate that the addition overflowed, generating a carry. This is equivalent to an `ADDS` instruction setting the C condition flag to 1.

You can use these flags to control a following `SEI` instruction.



GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

**Note**

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[SEL \(A32\)](#) on page 389

[Condition code suffixes](#) on page 138

## 14.168 UBFX (A32)

Unsigned Bit Field Extract.

### Syntax

`UBFX{cond} Rd, Rn, #lsb, #width`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the source register.

**lsb**

is the bit number of the least significant bit in the bitfield, in the range 0 to 31.

**width**

is the width of the bitfield, in the range 1 to (32-*lsb*).

### Operation

Copies adjacent bits from one register into the least significant bits of a second register, and zero extends to 32 bits.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not alter any flags.

## Architectures

This instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

# 14.169 UDF (A32)

Permanently Undefined.

## Syntax

```
UDF{c}{q} {#}imm ; A1 general registers (A32)
```

```
UDF{c}{q} {#}imm ; T1 general registers (T32)
```

```
UDF{c}.W {#}imm ; T2 general registers (T32)
```

Where:

### ***imm***

The value depends on the instruction variant:

### **A1 general registers**

For A32, is a 16-bit unsigned immediate, in the range 0 to 65535.

### **T1 general registers**

For T32, is an 8-bit unsigned immediate, in the range 0 to 255.

### **T2 general registers**

For T32, is a 16-bit unsigned immediate, in the range 0 to 65535.



The PE ignores the value of this constant.

Note

### **c**

Is an optional instruction condition code. See [Condition Codes](#). Arm deprecates using any *c* value other than `AL`.

**q**

Is an optional instruction width specifier. See [Instruction width specifiers](#).

## Usage

Permanently Undefined generates an Undefined Instruction exception.

The encodings for `UDF` used in this section are defined as permanently undefined in the Arm®v8-A architecture. However:

- With the T32 instruction set, Arm deprecates using the `UDF` instruction in an IT block.
- In the A32 instruction set, `UDF` is not conditional.

## Related information

[A32 and T32 instruction summary](#) on page 240

# 14.170 UDIV (A32)

Unsigned Divide.

## Syntax

`UDIV{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the value to be divided.

**Rm**

is a register holding the divisor.

## Register restrictions

PC or SP cannot be used for `Rd`, `Rn`, or `Rm`.

## Architectures

This 32-bit T32 instruction is available in Arm®v7-R, Armv7-M and Armv8-M.mainline.

This 32-bit A32 instruction is optional in Armv7-R.

This 32-bit A32 and T32 instruction is available in Armv7-A if Virtualization Extensions are implemented, and optional if not.

There is no 16-bit T32 `UDIV` instruction.

## Related information

[Condition code suffixes](#) on page 138

# 14.171 UHADD8 (A32)

Unsigned halving parallel byte-wise addition.

## Syntax

`UHADD8 {cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination general-purpose register.

**Rm, Rn**

are the general-purpose registers holding the operands.

## Operation

This instruction performs four unsigned integer additions on the corresponding bytes of the operands, halves the results, and writes the results into the corresponding bytes of the destination. This cannot cause overflow.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.172 UHADD16 (A32)

Unsigned halving parallel halfword-wise addition.

### Syntax

`UHADD16{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination general-purpose register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction performs two unsigned integer additions on the corresponding halfwords of the operands, halves the results, and writes the results into the corresponding halfwords of the destination. This cannot cause overflow.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.173 UHASX (A32)

Unsigned halving parallel add and subtract halfwords with exchange.

### Syntax

`UHASX{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination general-purpose register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction exchanges the two halfwords of the second operand, then performs an addition on the two top halfwords of the operands and a subtraction on the bottom two halfwords. It halves the results and writes them into the corresponding halfwords of the destination. This cannot cause overflow.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.174 UHSAX (A32)

Unsigned halving parallel subtract and add halfwords with exchange.

### Syntax

`UHSAX{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination general-purpose register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction exchanges the two halfwords of the second operand, then performs a subtraction on the two top halfwords of the operands and an addition on the bottom two halfwords. It halves the results and writes them into the corresponding halfwords of the destination. This cannot cause overflow.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.175 UHSUB8 (A32)

Unsigned halving parallel byte-wise subtraction.

### Syntax

`UHSUB8 {cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination general-purpose register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction subtracts each byte of the second operand from the corresponding byte of the first operand, halves the results, and writes the results into the corresponding bytes of the destination. This cannot cause overflow.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.176 UHSUB16 (A32)

Unsigned halving parallel halfword-wise subtraction.

### Syntax

`UHSUB16{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination general-purpose register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction subtracts each halfword of the second operand from the corresponding halfword of the first operand, halves the results, and writes the results into the corresponding halfwords of the destination. This cannot cause overflow.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.177 UMAAL (A32)

Unsigned Multiply Accumulate Accumulate Long.

### Syntax

`UMAAL{cond} RdLo, RdHi, Rn, Rm`

where:

**`cond`**

is an optional condition code.

**`RdLo, RdHi`**

are the destination registers for the 64-bit result. They also hold the two 32-bit accumulate operands. `RdLo` and `RdHi` must be different registers.

**`Rn, Rm`**

are the general-purpose registers holding the multiply operands.

### Operation

The `UMAAL` instruction multiplies the 32-bit values in `Rn` and `Rm`, adds the two 32-bit values in `RdHi` and `RdLo`, and stores the 64-bit result to `RdLo, RdHi`.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not change the flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Examples

|                      |                             |
|----------------------|-----------------------------|
| <code>UMAAL</code>   | <code>r8, r9, r2, r3</code> |
| <code>UMAALGE</code> | <code>r2, r0, r5, r3</code> |

### Related information

[Condition code suffixes](#) on page 138

## 14.178 UMLAL (A32)

Unsigned Long Multiply, with optional Accumulate, with 32-bit operands and 64-bit result and accumulator.

### Syntax

`UMLAL{S} {cond} RdLo, RdHi, Rn, Rm`

where:

**S**

is an optional suffix available in A32 state only. If s is specified, the condition flags are updated based on the result of the operation.

**cond**

is an optional condition code.

**RdLo, RdHi**

are the destination registers. They also hold the accumulating value. `RdLo` and `RdHi` must be different registers.

**Rn, Rm**

are general-purpose registers holding the operands.

### Operation

The `UMLAL` instruction interprets the values from `Rn` and `Rm` as unsigned integers. It multiplies these integers, and adds the 64-bit result to the 64-bit unsigned integer contained in `RdHi` and `RdLo`.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

If s is specified, this instruction:

- Updates the N and Z flags according to the result.
- Does not affect the C or V flags.

### Architectures

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

### Example

```
UMLALS      r4, r5, r3, r8
```

## Related information

[Condition code suffixes](#) on page 138

# 14.179 UMULL (A32)

Unsigned Long Multiply, with 32-bit operands, and 64-bit result.

## Syntax

`UMULL{S} {cond} RdLo, RdHi, Rn, Rm`

where:

**S**

is an optional suffix available in A32 state only. If s is specified, the condition flags are updated based on the result of the operation.

**cond**

is an optional condition code.

**RdLo, RdHi**

are the destination general-purpose registers. `RdLo` and `RdHi` must be different registers.

**Rn, Rm**

are general-purpose registers holding the operands.

## Operation

The `UMULL` instruction interprets the values from `Rn` and `Rm` as unsigned integers. It multiplies these integers and places the least significant 32 bits of the result in `RdLo`, and the most significant 32 bits of the result in `RdHi`.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

If s is specified, this instruction:

- Updates the N and Z flags according to the result.
- Does not affect the C or V flags.

## Architectures

This 32-bit instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Example

```
UMULL      r0, r4, r5, r6
```

## Related information

[Condition code suffixes](#) on page 138

## 14.180 UND pseudo-instruction (A32)

Generate an architecturally undefined instruction.

### Syntax

```
UND{cond} { .W } { #expr }
```

where:

**cond**

is an optional condition code.

**.W**

is an optional instruction width specifier.

**expr**

evaluates to a numeric value. The following table shows the range and encoding of *expr* in the instruction, where Y shows the locations of the bits that encode for *expr* and V is the 4 bits that encode for the condition code.

If *expr* is omitted, the value 0 is used.

**Table 14-19: Range and encoding of expr**

| Instruction         | Encoding   | Number of bits for <i>expr</i> | Range   |
|---------------------|------------|--------------------------------|---------|
| A32                 | 0xV7FYFY   | 16                             | 0-65535 |
| T32 32-bit encoding | 0xF7FYAYFY | 12                             | 0-4095  |
| T32 16-bit encoding | 0xDEYY     | 8                              | 0-255   |

### Usage

An attempt to execute an undefined instruction causes the Undefined instruction exception. Architecturally undefined instructions are expected to remain undefined.

### UND in T32 code

You can use the *.w* width specifier to force *UND* to generate a 32-bit instruction in T32 code. *UND.w* always generates a 32-bit instruction, even if *expr* is in the range 0-255.

### Disassembly

The encodings that this pseudo-instruction produces disassemble to DCI.

## Related information

[Condition code suffixes](#) on page 138

# 14.181 UQADD8 (A32)

Unsigned saturating parallel byte-wise addition.

## Syntax

`UQADD8 {cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination general-purpose register.

**Rm, Rn**

are the general-purpose registers holding the operands.

## Operation

This instruction performs four unsigned integer additions on the corresponding bytes of the operands and writes the results into the corresponding bytes of the destination. It saturates the results to the unsigned range  $0 \leq x \leq 2^8 - 1$ . The Q flag is not affected even if this operation saturates.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.182 UQADD16 (A32)

Unsigned saturating parallel halfword-wise addition.

### Syntax

`UQADD16{cond} {Rd}, Rn, Rm`

where:

**cond**

is an optional condition code.

**Rd**

is the destination general-purpose register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction performs two unsigned integer additions on the corresponding halfwords of the operands and writes the results into the corresponding halfwords of the destination. It saturates the results to the unsigned range  $0 \leq x \leq 2^{16}-1$ . The Q flag is not affected even if this operation saturates.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.183 UQASX (A32)

Unsigned saturating parallel add and subtract halfwords with exchange.

### Syntax

`UQASX{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination general-purpose register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction exchanges the two halfwords of the second operand, then performs an addition on the two top halfwords of the operands and a subtraction on the bottom two halfwords. It writes the results into the corresponding halfwords of the destination. It saturates the results to the unsigned range  $0 \leq x \leq 2^{16}-1$ . The Q flag is not affected even if this operation saturates.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.184 UQSAX (A32)

Unsigned saturating parallel subtract and add halfwords with exchange.

### Syntax

`UQSAX{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination general-purpose register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction exchanges the two halfwords of the second operand, then performs a subtraction on the two top halfwords of the operands and an addition on the bottom two halfwords. It writes the results into the corresponding halfwords of the destination. It saturates the results to the unsigned range  $0 \leq x \leq 2^{16}-1$ . The Q flag is not affected even if this operation saturates.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.185 UQSUB8 (A32)

Unsigned saturating parallel byte-wise subtraction.

### Syntax

`UQSUB8{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination general-purpose register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction subtracts each byte of the second operand from the corresponding byte of the first operand and writes the results into the corresponding bytes of the destination. It saturates the results to the unsigned range  $0 \leq x \leq 2^8 - 1$ . The Q flag is not affected even if this operation saturates.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.186 UQSUB16 (A32)

Unsigned saturating parallel halfword-wise subtraction.

### Syntax

```
UQSUB16{cond} {Rd}, Rn, Rm
```

where:

**cond**

is an optional condition code.

**Rd**

is the destination general-purpose register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### Operation

This instruction subtracts each halfword of the second operand from the corresponding halfword of the first operand and writes the results into the corresponding halfwords of the destination. It saturates the results to the unsigned range  $0 \leq x \leq 2^{16}-1$ . The Q flag is not affected even if this operation saturates.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, Q, or GE flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Related information

[Condition code suffixes](#) on page 138

## 14.187 USAD8 (A32)

Unsigned Sum of Absolute Differences.

### Syntax

`USAD8 {cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination register.

***Rn***

is the register holding the first operand.

***Rm***

is the register holding the second operand.

### Operation

The `USAD8` instruction finds the four differences between the unsigned values in corresponding bytes of `Rn` and `Rm`. It adds the absolute values of the four differences, and saves the result to `Rd`.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not alter any flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Example

```
USAD8      r2, r4, r6
```

### Related information

[Condition code suffixes](#) on page 138

## 14.188 USADA8 (A32)

Unsigned Sum of Absolute Differences and Accumulate.

### Syntax

```
USADA8{ {cond} } {Rd}, {Rn}, {Rm}, {Ra}
```

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the first operand.

**Rm**

is the register holding the second operand.

**Ra**

is the register holding the accumulate operand.

### Operation

The `usada8` instruction adds the absolute values of the four differences to the value in `Ra`, and saves the result to `Rd`.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not alter any flags.

### Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### Correct examples

|          |                |
|----------|----------------|
| USADA8   | r0, r3, r5, r2 |
| USADA8VS | r0, r4, r0, r1 |

## Incorrect examples

```
USADA8      r2, r4, r6      ; USADA8 requires four registers
USADA16      r0, r4, r0, r1 ; no such instruction
```

## Related information

[Condition code suffixes](#) on page 138

## 14.189 USAT (A32)

Unsigned Saturate to any bit position, with optional shift before saturating.

### Syntax

```
USAT{cond} Rd, #sat, Rm{, shift}
```

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**sat**

specifies the bit position to saturate to, in the range 0 to 31.

**Rm**

is the register containing the operand.

**shift**

is an optional shift. It must be one of the following:

**ASR #n**

where  $n$  is in the range 1-32 (A32) or 1-31 (T32).

**LSL #n**

where  $n$  is in the range 0-31.

### Operation

The `usat` instruction applies the specified shift to a signed value, then saturates to the unsigned range  $0 \leq x \leq 2^{\text{sat}} - 1$ .

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Q flag

If saturation occurs, this instruction sets the Q flag. To read the state of the Q flag, use an `MRS` instruction.

## Architectures

This instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Example

```
USATNE r0, #7, r5
```

## Related information

[SSAT16 \(A32\)](#) on page 421

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340

[Condition code suffixes](#) on page 138

## 14.190 USAT16 (A32)

Parallel halfword Saturate.

### Syntax

`USAT16{cond} Rd, #sat, Rn`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**sat**

specifies the bit position to saturate to, in the range 0 to 15.

**Rn**

is the register holding the operand.

### Operation

Halfword-wise unsigned saturation to any bit position.

The `usat16` instruction saturates each signed halfword to the unsigned range  $0 \leq x \leq 2^{\text{sat}} - 1$ .

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### **Q flag**

If saturation occurs on either halfword, this instruction sets the Q flag. To read the state of the Q flag, use an `MRS` instruction.

### **Availability**

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

### **Example**

```
USAT16 r0, #7, r5
```

### **Related information**

[MRS \(PSR to general-purpose register\) \(A32\)](#) on page 340  
[Condition code suffixes](#) on page 138

## **14.191 USAX (A32)**

Unsigned parallel subtract and add halfwords with exchange.

### **Syntax**

```
USAX{cond} {Rd}, Rn, Rm
```

where:

**cond**

is an optional condition code.

**Rd**

is the destination general-purpose register.

**Rm, Rn**

are the general-purpose registers holding the operands.

### **Operation**

This instruction exchanges the two halfwords of the second operand, then performs a subtraction on the two top halfwords of the operands and an addition on the bottom two halfwords. It writes the results into the corresponding halfwords of the destination. The results are modulo  $2^{16}$ . It sets the APSR GE flags.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## GE flags

This instruction does not affect the N, Z, C, V, or Q flags.

It sets the GE flags in the APSR as follows:

### GE[1:0]

for bits[15:0] of the result.

### GE[3:2]

for bits[31:16] of the result.

It sets GE[1:0] to 1 to indicate that the addition overflowed, generating a carry. This is equivalent to an `ADDS` instruction setting the C condition flag to 1.

It sets GE[3:2] to 1 to indicate that the subtraction gave a result greater than or equal to zero, meaning a borrow did not occur. This is equivalent to a `SUBS` instruction setting the C condition flag to 1.

You can use these flags to control a following `SEL` instruction.



GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

Note

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[SEL \(A32\)](#) on page 389

[Condition code suffixes](#) on page 138

## 14.192 USUB8 (A32)

Unsigned parallel byte-wise subtraction.

### Syntax

`USUB8 {cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination general-purpose register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction subtracts each byte of the second operand from the corresponding byte of the first operand and writes the results into the corresponding bytes of the destination. The results are modulo  $2^8$ . It sets the APSR GE flags.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### GE flags

This instruction does not affect the N, Z, C, V, or Q flags.

It sets the GE flags in the APSR as follows:

**GE[0]**

for bits[7:0] of the result.

**GE[1]**

for bits[15:8] of the result.

**GE[2]**

for bits[23:16] of the result.

**GE[3]**

for bits[31:24] of the result.

It sets a GE flag to 1 to indicate that the corresponding result is greater than or equal to zero, meaning a borrow did not occur. This is equivalent to a `SUBS` instruction setting the C condition flag to 1.

You can use these flags to control a following `SEI` instruction.

## Availability

This instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Related information

[SEL \(A32\)](#) on page 389

[Condition code suffixes](#) on page 138

## 14.193 USUB16 (A32)

Unsigned parallel halfword-wise subtraction.

### Syntax

`USUB16{cond} {Rd}, Rn, Rm`

where:

***cond***

is an optional condition code.

***Rd***

is the destination general-purpose register.

***Rm, Rn***

are the general-purpose registers holding the operands.

### Operation

This instruction subtracts each halfword of the second operand from the corresponding halfword of the first operand and writes the results into the corresponding halfwords of the destination. The results are modulo  $2^{16}$ . It sets the APSR GE flags.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not affect the N, Z, C, V, or Q flags.

It sets the GE flags in the APSR as follows:

**GE[1:0]**

for bits[15:0] of the result.

**GE[3:2]**

for bits[31:16] of the result.

It sets a pair of GE flags to 1 to indicate that the corresponding result is greater than or equal to zero, meaning a borrow did not occur. This is equivalent to a `SUBS` instruction setting the C condition flag to 1.

You can use these flags to control a following `SEL` instruction.



GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

**Note**

## Availability

This instruction is available in A32 and T32.

There is no 16-bit version of this instruction in T32.

## Related information

[SEL \(A32\)](#) on page 389

[Condition code suffixes](#) on page 138

## 14.194 UX TAB (A32)

Zero extend Byte and Add.

### Syntax

`UX TAB{cond} {Rd}, Rn, Rm {,rotation}`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the number to add.

**Rm**

is the register holding the value to extend.

**rotation**

is one of:

**ROR #8**

Value from `Rm` is rotated right 8 bits.

**ROR #16**

Value from  $Rm$  is rotated right 16 bits.

**ROR #24**

Value from  $Rm$  is rotated right 24 bits.

If  $rotation$  is omitted, no rotation is performed.

**Operation**

`UXTAB` extends an 8-bit value to a 32-bit value. It does this by:

1. Rotating the value from  $Rm$  right by 0, 8, 16 or 24 bits.
2. Extracting bits[7:0] from the value obtained.
3. Zero extending to 32 bits.
4. Adding the value from  $Rn$ .

**Register restrictions**

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

**Condition flags**

This instruction does not change the flags.

**Availability**

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

**Related information**

[Condition code suffixes](#) on page 138

## 14.195 UXTAB16 (A32)

Zero extend two Bytes and Add.

**Syntax**

`UXTAB16{cond} {Rd}, Rn, Rm {,rotation}`

where:

**cond**

is an optional condition code.

***Rd***

is the destination register.

***Rn***

is the register holding the number to add.

***Rm***

is the register holding the value to extend.

***rotation***

is one of:

**ROR #8**

Value from *Rm* is rotated right 8 bits.

**ROR #16**

Value from *Rm* is rotated right 16 bits.

**ROR #24**

Value from *Rm* is rotated right 24 bits.

If *rotation* is omitted, no rotation is performed.

## Operation

UXTAB16 extends two 8-bit values to two 16-bit values. It does this by:

1. Rotating the value from *Rm* right by 0, 8, 16 or 24 bits.
2. Extracting bits[23:16] and bits[7:0] from the value obtained.
3. Zero extending them to 16 bits.
4. Adding them to bits[31:16] and bits[15:0] respectively of *Rn* to form bits[31:16] and bits[15:0] of the result.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Example

```
UXTAB16EQ      r0, r0, r4, ROR #16
```

## Related information

[Condition code suffixes](#) on page 138

## 14.196 UXTAH (A32)

Zero extend Halfword and Add.

### Syntax

```
UXTAH{cond} {Rd}, Rn, Rm {,rotation}
```

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rn**

is the register holding the number to add.

**Rm**

is the register holding the value to extend.

**rotation**

is one of:

**ROR #8**

Value from *Rm* is rotated right 8 bits.

**ROR #16**

Value from *Rm* is rotated right 16 bits.

**ROR #24**

Value from *Rm* is rotated right 24 bits.

If *rotation* is omitted, no rotation is performed.

### Operation

UXTAH extends a 16-bit value to a 32-bit value. It does this by:

1. Rotating the value from *Rm* right by 0, 8, 16 or 24 bits.
2. Extracting bits[15:0] from the value obtained.
3. Zero extending to 32 bits.
4. Adding the value from *Rn*.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

# 14.197 UXTB (A32)

Zero extend Byte.

## Syntax

`UXTB{cond} {Rd}, Rm[, rotation]`

where:

### **cond**

is an optional condition code.

### **Rd**

is the destination register.

### **Rm**

is the register holding the value to extend.

### **rotation**

is one of:

#### **ROR #8**

Value from *Rm* is rotated right 8 bits.

#### **ROR #16**

Value from *Rm* is rotated right 16 bits.

#### **ROR #24**

Value from *Rm* is rotated right 24 bits.

If *rotation* is omitted, no rotation is performed.

## Operation

`UXTB` extends an 8-bit value to a 32-bit value. It does this by:

1. Rotating the value from *Rm* right by 0, 8, 16, or 24 bits.
2. Extracting bits[7:0] from the value obtained.
3. Zero extending to 32 bits.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.

## 16-bit instruction

The following form of this instruction is available in T32 code, and is a 16-bit instruction:

**UXTB Rd, Rm**

*Rd* and *Rm* must both be Lo registers.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7 E-M implementation.

The 16-bit instruction is available in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.198 UXTB16 (A32)

Zero extend two Bytes.

### Syntax

`UXTB16{cond} {Rd}, Rm{, rotation}`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm**

is the register holding the value to extend.

**rotation**

is one of:

**ROR #8**

Value from *Rm* is rotated right 8 bits.

**ROR #16**

Value from *Rm* is rotated right 16 bits.

**ROR #24**

Value from *Rm* is rotated right 24 bits.

If *rotation* is omitted, no rotation is performed.

## Operation

UXTB16 extends two 8-bit values to two 16-bit values. It does this by:

1. Rotating the value from *Rm* right by 0, 8, 16 or 24 bits.
2. Extracting bits[23:16] and bits[7:0] from the value obtained.
3. Zero extending each to 16 bits.

## Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

## Condition flags

This instruction does not change the flags.

## Availability

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

There is no 16-bit version of this instruction in T32.

## Related information

[Condition code suffixes](#) on page 138

## 14.199 UXTH (A32)

Zero extend Halfword.

### Syntax

`UXTH{cond} {Rd}, Rm[, rotation]`

where:

**cond**

is an optional condition code.

**Rd**

is the destination register.

**Rm**

is the register holding the value to extend.

**rotation**

is one of:

**ROR #8**

Value from *Rm* is rotated right 8 bits.

**ROR #16**

Value from *Rm* is rotated right 16 bits.

**ROR #24**

Value from *Rm* is rotated right 24 bits.

If *rotation* is omitted, no rotation is performed.

### Operation

`uxth` extends a 16-bit value to a 32-bit value. It does this by:

1. Rotating the value from *Rm* right by 0, 8, 16, or 24 bits.
2. Extracting bits[15:0] from the value obtained.
3. Zero extending to 32 bits.

### Register restrictions

You cannot use PC for any operand.

You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.

### Condition flags

This instruction does not change the flags.

### 16-bit instructions

The following form of this instruction is available in T32 code, and is a 16-bit instruction:

**UXTH Rd, Rm**

*Rd* and *Rm* must both be Lo registers.

**Availability**

The 32-bit instruction is available in A32 and T32.

For the Arm®v7-M architecture, the 32-bit T32 instruction is only available in an Armv7E-M implementation.

The 16-bit instruction is available in T32.

**Related information**

[Condition code suffixes](#) on page 138

## 14.200 WFE (A32)

Wait For Event.

**Syntax**

`WFE { cond }`

where:

**cond**

is an optional condition code.

**Operation**

This is a hint instruction. It is optional whether this instruction is implemented or not. If this instruction is not implemented, it executes as a `NOP`. The assembler produces a diagnostic message if the instruction executes as a `NOP` on the target.

If the Event Register is not set, `WFE` suspends execution until one of the following events occurs:

- An IRQ interrupt, unless masked by the CPSR I-bit.
- An FIQ interrupt, unless masked by the CPSR F-bit.
- An Imprecise Data abort, unless masked by the CPSR A-bit.
- A Debug Entry request, if Debug is enabled.
- An Event signaled by another processor using the `SEV` instruction, or by the current processor using the `SEVL` instruction.

If the Event Register is set, `WFE` clears it and returns immediately.

If `WFE` is implemented, `SEV` must also be implemented.

## Availability

This instruction is available in A32 and T32.

## Related information

[NOP \(A32\)](#) on page 350

[Condition code suffixes](#) on page 138

[SEV \(A32\)](#) on page 392

[SEVL \(A32\)](#) on page 393

[WFI \(A32\)](#) on page 503

## 14.201 WFI (A32)

Wait for Interrupt.

### Syntax

`WFI {cond}`

where:

**`cond`**

is an optional condition code.

### Operation

This is a hint instruction. It is optional whether this instruction is implemented or not. If this instruction is not implemented, it executes as a `NOP`. The assembler produces a diagnostic message if the instruction executes as a `NOP` on the target.

`WFI` suspends execution until one of the following events occurs:

- An IRQ interrupt, regardless of the CPSR I-bit.
- An FIQ interrupt, regardless of the CPSR F-bit.
- An Imprecise Data abort, unless masked by the CPSR A-bit.
- A Debug Entry request, regardless of whether Debug is enabled.

## Availability

This instruction is available in A32 and T32.

## Related information

[NOP \(A32\)](#) on page 350

[Condition code suffixes](#) on page 138

[WFE \(A32\)](#) on page 502

## 14.202 YIELD (A32)

Yield.

### Syntax

```
YIELD{cond}
```

where:

**cond**

is an optional condition code.

### Operation

This is a hint instruction. It is optional whether this instruction is implemented or not. If this instruction is not implemented, it executes as a `nop`. The assembler produces a diagnostic message if the instruction executes as a `nop` on the target.

`YIELD` indicates to the hardware that the current thread is performing a task, for example a spinlock, that can be swapped out. Hardware can use this hint to suspend and resume threads in a multithreading system.

### Availability

This instruction is available in A32 and T32.

### Related information

[NOP \(A32\)](#) on page 350

[Condition code suffixes](#) on page 138

# 15. Advanced SIMD Instructions (32-bit)

Describes Advanced SIMD assembly language instructions.

## 15.1 Summary of Advanced SIMD instructions

Most Advanced SIMD instructions are not available in floating-point.

The following table shows a summary of Advanced SIMD instructions that are not available as floating-point instructions:

**Table 15-1: Summary of Advanced SIMD instructions**

| Mnemonic           | Brief description                                                                          |
|--------------------|--------------------------------------------------------------------------------------------|
| FLDMDBX, FLDMIAX   | FLDMX                                                                                      |
| FSTMDBX, FSTMIAX   | FSTMX                                                                                      |
| VABA, VABD         | Absolute difference and Accumulate, Absolute Difference                                    |
| VABS               | Absolute value                                                                             |
| VACGE, VACGT       | Absolute Compare Greater than or Equal, Greater Than                                       |
| VACLE, VACLT       | Absolute Compare Less than or Equal, Less Than (pseudo-instructions)                       |
| VADD               | Add                                                                                        |
| VADDHN             | Add, select High half                                                                      |
| VAND               | Bitwise AND                                                                                |
| VAND               | Bitwise AND (pseudo-instruction)                                                           |
| VBIC               | Bitwise Bit Clear (register)                                                               |
| VBIC               | Bitwise Bit Clear (immediate)                                                              |
| VBIF, VBIT, VBSL   | Bitwise Insert if False, Insert if True, Select                                            |
| VCADD              | Vector Complex Add                                                                         |
| VCEQ, VCLE, VCLT   | Compare Equal, Less than or Equal, Compare Less Than                                       |
| VCGE, VCGT         | Compare Greater than or Equal, Greater Than                                                |
| VCLE, VCLT         | Compare Less than or Equal, Compare Less Than (pseudo-instruction)                         |
| VCLS, VCLZ, VCNT   | Count Leading Sign bits, Count Leading Zeros, and Count set bits                           |
| VCMLA              | Vector Complex Multiply Accumulate                                                         |
| VCMLA (by element) | Vector Complex Multiply Accumulate (by element)                                            |
| VCVT               | Convert fixed-point or integer to floating-point, floating-point to integer or fixed-point |
| VCVT               | Convert floating-point to integer with directed rounding modes                             |
| VCVT               | Convert between half-precision and single-precision floating-point numbers                 |
| VDUP               | Duplicate scalar to all lanes of vector                                                    |
| VEOR               | Bitwise Exclusive OR                                                                       |

| Mnemonic         | Brief description                                              |
|------------------|----------------------------------------------------------------|
| VEXT             | Extract                                                        |
| VFMA, VFMS       | Fused Multiply Accumulate, Fused Multiply Subtract             |
| VHADD, VHSUB     | Halving Add, Halving Subtract                                  |
| VLD              | Vector Load                                                    |
| VMAX, VMIN       | Maximum, Minimum                                               |
| VMAXNM, VMINNM   | Maximum, Minimum, consistent with IEEE 754-2008                |
| VMLA, VMLS       | Multiply Accumulate, Multiply Subtract (vector)                |
| VMLA, VMLS       | Multiply Accumulate, Multiply Subtract (by scalar)             |
| VMOV             | Move (immediate)                                               |
| VMOV             | Move (register)                                                |
| VMOVL, VMOV{U}N  | Move Long, Move Narrow (register)                              |
| VMUL             | Multiply (vector)                                              |
| VMUL             | Multiply (by scalar)                                           |
| VMVN             | Move Negative (immediate)                                      |
| VNEG             | Negate                                                         |
| VORN             | Bitwise OR NOT                                                 |
| VORN             | Bitwise OR NOT (pseudo-instruction)                            |
| VORR             | Bitwise OR (register)                                          |
| VORR             | Bitwise OR (immediate)                                         |
| VPADD, VPADAL    | Pairwise Add, Pairwise Add and Accumulate                      |
| VPMAX, VPMIN     | Pairwise Maximum, Pairwise Minimum                             |
| VQABS            | Absolute value, saturate                                       |
| VQADD            | Add, saturate                                                  |
| VQDMLAL, VQDMLSL | Saturating Doubling Multiply Accumulate, and Multiply Subtract |
| VQDMULL          | Saturating Doubling Multiply                                   |
| VQDMULH          | Saturating Doubling Multiply returning High half               |
| VQMOV{U}N        | Saturating Move (register)                                     |
| VQNEG            | Negate, saturate                                               |
| VQRDMULH         | Saturating Doubling Multiply returning High half               |
| VQRSHL           | Shift Left, Round, saturate (by signed variable)               |
| VQRSHR{U}N       | Shift Right, Round, saturate (by immediate)                    |
| VQSHL            | Shift Left, saturate (by immediate)                            |
| VQSHL            | Shift Left, saturate (by signed variable)                      |
| VQSHR{U}N        | Shift Right, saturate (by immediate)                           |
| VQSUB            | Subtract, saturate                                             |
| VRADDHN          | Add, select High half, Round                                   |
| VRECPE           | Reciprocal Estimate                                            |
| VRECP            | Reciprocal Step                                                |
| VREV             | Reverse elements                                               |
| VRHADD           | Halving Add, Round                                             |

| Mnemonic   | Brief description                                 |
|------------|---------------------------------------------------|
| VRINT      | Round to integer                                  |
| VRSHR      | Shift Right and Round (by immediate)              |
| VRSHRN     | Shift Right, Round, Narrow (by immediate)         |
| VRSQRT E   | Reciprocal Square Root Estimate                   |
| VRSQRT S   | Reciprocal Square Root Step                       |
| VRSRA      | Shift Right, Round, and Accumulate (by immediate) |
| VRSUBHN    | Subtract, select High half, Round                 |
| VSHL       | Shift Left (by immediate)                         |
| VSHR       | Shift Right (by immediate)                        |
| VSHRN      | Shift Right, Narrow (by immediate)                |
| VSLI       | Shift Left and Insert                             |
| VSRA       | Shift Right, Accumulate (by immediate)            |
| VSRI       | Shift Right and Insert                            |
| VST        | Vector Store                                      |
| VSUB       | Subtract                                          |
| VSUBHN     | Subtract, select High half                        |
| VSWP       | Swap vectors                                      |
| VTBL, VTBX | Vector table look-up                              |
| VTRN       | Vector transpose                                  |
| VTST       | Test bits                                         |
| VUZP, VZIP | Vector interleave and de-interleave               |

## 15.2 Summary of shared Advanced SIMD and floating-point instructions

Some instructions are common to Advanced SIMD and floating-point.

The following table shows a summary of instructions that are common to the Advanced SIMD and floating-point instruction sets.

**Table 15-2: Summary of shared Advanced SIMD and floating-point instructions**

| Mnemonic | Brief description                                                                                            |
|----------|--------------------------------------------------------------------------------------------------------------|
| VLDM     | Load multiple                                                                                                |
| VLDR     | Load                                                                                                         |
| -        | Load (post-increment and pre-decrement)                                                                      |
| VMOV     | Transfer from one general-purpose register to a scalar                                                       |
| -        | Transfer from two general-purpose registers to either one double-precision or two single-precision registers |
| -        | Transfer from a scalar to a general-purpose register                                                         |

| Mnemonic | Brief description                                                                                            |
|----------|--------------------------------------------------------------------------------------------------------------|
| -        | Transfer from either one double-precision or two single-precision registers to two general-purpose registers |
| VMRS     | Transfer from a SIMD and floating-point system register to a general-purpose register                        |
| VMSR     | Transfer from a general-purpose register to a SIMD and floating-point system register                        |
| VPOP     | Pop floating-point or SIMD registers from full-descending stack                                              |
| VPUSH    | Push floating-point or SIMD registers to full-descending stack                                               |
| VSTM     | Store multiple                                                                                               |
| VSTR     | Store                                                                                                        |
| -        | Store (post-increment and pre-decrement)                                                                     |

## Related information

[VLDM \(A32\)](#) on page 551

[VLDR \(A32\)](#) on page 552

[VLDR \(post-increment and pre-decrement\) \(A32\)](#) on page 553

[VLDR pseudo-instruction \(A32\)](#) on page 554

[VMOV \(between two general-purpose registers and a 64-bit extension register\) \(A32\)](#) on page 563

[VMOV \(between a general-purpose register and an Advanced SIMD scalar\) \(A32\)](#) on page 564

[VMRS \(A32\)](#) on page 567

[VMSR \(A32\)](#) on page 568

[VPOP \(A32\)](#) on page 580

[VPUSH \(A32\)](#) on page 581

[VSTM \(A32\)](#) on page 613

[VSTR \(A32\)](#) on page 617

[VSTR \(post-increment and pre-decrement\) \(A32\)](#) on page 618

## 15.3 Cryptographic instructions

A set of cryptographic instructions is available in some implementations of the Armv8 architecture.

These instructions use the 128-bit Advanced SIMD registers and support the acceleration of the following cryptographic and hash algorithms:

- AES.
- SHA1.
- SHA256.

## 15.4 Interleaving provided by load and store element and structure instructions

Many instructions in this group provide interleaving when structures are stored to memory, and de-interleaving when structures are loaded from memory.

The following figure shows an example of de-interleaving. Interleaving is the inverse process.

**Figure 15-1: De-interleaving an array of 3-element structures**



### Related information

[Alignment restrictions in load and store element and structure instructions](#) on page 509

[VLDn \(single n-element structure to one lane\) \(A32\)](#) on page 546

[VLDn \(single n-element structure to all lanes\) \(A32\)](#) on page 548

[VLDn \(multiple n-element structures\) \(A32\)](#) on page 549

[VSTn \(multiple n-element structures\) \(A32\)](#) on page 614

[VSTn \(single n-element structure to one lane\) \(A32\)](#) on page 616

## 15.5 Alignment restrictions in load and store element and structure instructions

Many of these instructions allow you to specify memory alignment restrictions.

When the alignment is not specified in the instruction, the alignment restriction is controlled by the A bit (SCTLR bit[1]):

- If the A bit is 0, there are no alignment restrictions (except for strongly-ordered or device memory, where accesses must be element-aligned).
- If the A bit is 1, accesses must be element-aligned.

If an address is not correctly aligned, an alignment fault occurs.

## Related information

[Interleaving provided by load and store element and structure instructions](#) on page 508

## 15.6 FLDMDBX, FLDMIAX (A32)

FLDMX.

### Syntax

`FLDMDBX{c}{q} Rn!, dreglist ; A1 Decrement Before FP/SIMD registers (A32)`

`FLDMIAX{c}{q} Rn{!}, dreglist ; A1 Increment After FP/SIMD registers (A32)`

`FLDMDBX{c}{q} Rn!, dreglist ; T1 Decrement Before FP/SIMD registers (T32)`

`FLDMIAX{c}{q} Rn{!}, dreglist ; T1 Increment After FP/SIMD registers (T32)`

Where:

**c**

Is an optional instruction condition code. See [Condition Codes](#).

**q**

Is an optional instruction width specifier. See [Instruction width specifiers](#).

**Rn**

Is the general-purpose base register. If writeback is not specified, the PC can be used.

**!**

Specifies base register writeback.

**dreglist**

Is the list of consecutively numbered 64-bit SIMD and FP registers to be transferred. The list must contain at least one register, all registers must be in the range D0-D15, and must not contain more than 16 registers.

### Usage

`FLDMX` loads multiple SIMD and FP registers from consecutive locations in the Advanced SIMD and floating-point register file using an address from a general-purpose register.

Arm deprecates use of `FLDMDBX` and `FLDMIAX`, except for disassembly purposes, and reassembly of disassembled code.

Depending on settings in the CPACR, NSACR, and HCPTR registers, and the security state and mode in which the instruction is executed, an attempt to execute the instruction might be **UNDEFINED**, or trapped to Hyp mode. For more information see [Enabling Advanced SIMD and floating-point support in the Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



**Note** For more information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[Summary of Advanced SIMD instructions](#) on page 505

## 15.7 FSTMDBX, FSTMIAX (A32)

FSTMX.

### Syntax

`FSTMDBX{c}{q} Rn!, dreglist ; A1 Decrement Before FP/SIMD registers (A32)`

`FSTMIAX{c}{q} Rn{!}, dreglist ; A1 Increment After FP/SIMD registers (A32)`

`FSTMDBX{c}{q} Rn!, dreglist ; T1 Decrement Before FP/SIMD registers (T32)`

`FSTMIAX{c}{q} Rn{!}, dreglist ; T1 Increment After FP/SIMD registers (T32)`

Where:

**c**

Is an optional instruction condition code. See [Condition Codes](#).

**q**

Is an optional instruction width specifier. See [Instruction width specifiers](#).

**Rn**

Is the general-purpose base register. If writeback is not specified, the PC can be used. However, Arm deprecates use of the PC.

**!**

Specifies base register writeback.

**dreglist**

Is the list of consecutively numbered 64-bit SIMD and FP registers to be transferred. The list must contain at least one register, all registers must be in the range D0-D15, and must not contain more than 16 registers.

### Usage

FSTMX stores multiple SIMD and FP registers from the Advanced SIMD and floating-point register file to consecutive locations in using an address from a general-purpose register.

Arm deprecates use of FLDMDBX and FLDMIAX, except for disassembly purposes, and reassembly of disassembled code.

Depending on settings in the CPACR, NSACR, HCPTR, and FPEXC registers, and the security state and mode in which the instruction is executed, an attempt to execute the instruction might be **UNDEFINED**, or trapped to Hyp mode. For more information see *Enabling Advanced SIMD and floating-point support in the Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile*.



**Note** For more information about the **CONSTRAINED UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the *Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile*.

## Related information

[Summary of Advanced SIMD instructions](#) on page 505

## 15.8 VABA and VABAL (A32)

Vector Absolute Difference and Accumulate.

### Syntax

VABA{cond}.datatype {Qd}, Qn, Qm

VABA{cond}.datatype {Dd}, Dn, Dm

VABAL{cond}.datatype Qd, Dn, Dm

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, u8, u16, or u32.

**Qd, Qn, Qm**

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

**Dd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

**Qd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a long operation.

### Operation

VABA subtracts the elements of one vector from the corresponding elements of another vector, and accumulates the absolute values of the results into the elements of the destination vector.

`VABAL` is the long version of the `VABA` instruction.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.9 VABD and VABDL (A32)

Vector Absolute Difference.

### Syntax

`VABD{cond}.datatype {Qd}, Qn, Qm`

`VABD{cond}.datatype {Dd}, Dn, Dm`

`VABDL{cond}.datatypeQd, Dn, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of:

- S8, S16, S32, U8, U16, or U32 for `VABDL`.
- S8, S16, S32, U8, U16, U32 or F32 for `VABD`.

**Qd, Qn, Qm**

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

**Dd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

**Qd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a long operation.

### Operation

`VABD` subtracts the elements of one vector from the corresponding elements of another vector, and places the absolute values of the results into the elements of the destination vector.

`VABDL` is the long version of the `VABD` instruction.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.10 VABS (A32)

Vector Absolute

### Syntax

`VABS {cond} .datatypeQd, Qm`

`VABS {cond} .datatypeDd, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, or f32.

**Qd, Qm**

are the destination vector and the operand vector, for a quadword operation.

**Dd, Dm**

are the destination vector and the operand vector, for a doubleword operation.

### Operation

`VABS` takes the absolute value of each element in a vector, and places the results in a second vector. (The floating-point version only clears the sign bit.)

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[VQABS \(A32\)](#) on page 582

[Condition code suffixes](#) on page 138

## 15.11 VACLE, VACLT, VACGE and VACGT (A32)

Vector Absolute Compare.

### Syntax

`VACOp {cond} .F32 {Qd}, Qn, Qm`

`VACOp {cond} .F32 {Dd}, Dn, Dm`

where:

***op***

must be one of:

**GE**

Absolute Greater than or Equal.

**GT**

Absolute Greater Than.

**LE**

Absolute Less than or Equal.

**LT**

Absolute Less Than.

***cond***

is an optional condition code.

***Qd, Qn, Qm***

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

***Dd, Dn, Dm***

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

The result datatype is `I32`.

## Operation

These instructions take the absolute value of each element in a vector, and compare it with the absolute value of the corresponding element of a second vector. If the condition is true, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.



On disassembly, the `VACLE` and `VACLT` pseudo-instructions are disassembled to the corresponding `VACGE` and `VACGT` instructions, with the operands reversed.

## Related information

[Condition code suffixes](#) on page 138

## 15.12 VADD (A32)

Vector Add.

### Syntax

`VADD{cond}.datatype {Qd}, Qn, Qm`

`VADD{cond}.datatype {Dd}, Dn, Dm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of `I8`, `I16`, `I32`, `I64`, or `F32`

***Dd, Qn, Qm***

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

***Dd, Dn, Dm***

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

## Operation

`VADD` adds corresponding elements in two vectors, and places the results in the destination vector.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[VADDL and VADDW \(A32\)](#) on page 517

[VQADD \(A32\)](#) on page 583

[Condition code suffixes](#) on page 138

## 15.13 VADDHN (A32)

Vector Add and Narrow, selecting High half.

### Syntax

`VADDHN{cond}.datatype Dd, Qn, Qm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of `I16`, `I32`, or `I64`.

***Dd, Qn, Qm***

are the destination vector, the first operand vector, and the second operand vector.

## Operation

`VADDHN` adds corresponding elements in two vectors, selects the most significant halves of the results, and places the final results in the destination vector. Results are truncated.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[VRADDHN \(A32\)](#) on page 594  
[Condition code suffixes](#) on page 138

## 15.14 VADDL and VADDW (A32)

Vector Add Long, Vector Add Wide.

### Syntax

```
VADDL{cond}.datatype Qd, Dn, Dm ; Long operation
VADDW{cond}.datatype {Qd,} Qn, Dm ; Wide operation
```

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, , u16, or u32.

**Qd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a long operation.

**Qd, Qn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a wide operation.

### Operation

VADDL adds corresponding elements in two doubleword vectors, and places the results in the destination quadword vector.

VADDW adds corresponding elements in one quadword and one doubleword vector, and places the results in the destination quadword vector.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[VADD \(A32\)](#) on page 515  
[Condition code suffixes](#) on page 138

## 15.15 VAND (immediate) (A32)

Vector bitwise AND immediate pseudo-instruction.

### Syntax

```
VAND{cond}.datatypeQd, #imm
```

```
VAND{cond}.datatypeDd, #imm
```

where:

**cond**

is an optional condition code.

**datatype**

must be either `I8`, `I16`, `I32`, or `I64`.

**Qd or Dd**

is the Advanced SIMD register for the result.

**imm**

is the immediate value.

### Operation

`VAND` takes each element of the destination vector, performs a bitwise AND with an immediate value, and returns the result into the destination vector.



On disassembly, this pseudo-instruction is disassembled to a corresponding `VBIC` instruction, with the complementary immediate value.

**Note**

### Immediate values

If `datatype` is `I16`, the immediate value must have one of the following forms:

- `0xFFFFXY`.
- `0xXYFF`.

If `datatype` is `I32`, the immediate value must have one of the following forms:

- `0xFFFFFFFFXY`.
- `0xFFFFFFFFXYFF`.
- `0xFFXYFFFFFF`.
- `0xXYFFFFFF`.

### Related information

[VBIC \(immediate\) \(A32\)](#) on page 519

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.16 VAND (register) (A32)

Vector bitwise AND.

### Syntax

VAND{cond} {datatype} {Qd}, Qn, Qm

VAND{cond} {datatype} {Dd}, Dn, Dm

where:

**cond**

is an optional condition code.

**datatype**

is an optional data type. The assembler ignores *datatype*.

**Qd, Qn, Qm**

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

**Dd, Dn, Dm**

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

### Operation

VAND performs a bitwise logical AND between two registers, and places the result in the destination register.

### Related information

[Condition code suffixes](#) on page 138

## 15.17 VBIC (immediate) (A32)

Vector Bit Clear immediate.

### Syntax

VBIC{cond} .datatype Qd, #imm

VBIC{cond} .datatype Dd, #imm

where:

***cond***

is an optional condition code.

***datatype***

must be either `I8`, `I16`, `I32`, or `I64`.

***Qd* or *Dd***

is the Advanced SIMD register for the source and result.

***imm***

is the immediate value.

## Operation

`VBIC` takes each element of the destination vector, performs a bitwise AND complement with an immediate value, and returns the result in the destination vector.

### Immediate values

You can either specify *imm* as a pattern which the assembler repeats to fill the destination register, or you can directly specify the immediate value (that conforms to the pattern) in full. The pattern for *imm* depends on *datatype* as shown in the following table:

**Table 15-3: Patterns for immediate value in VBIC (immediate)**

| I16                 | I32                     |
|---------------------|-------------------------|
| <code>0x00XY</code> | <code>0x000000XY</code> |
| <code>0XXY00</code> | <code>0x0000XY00</code> |
| <code>-</code>      | <code>0x00XY0000</code> |
| <code>-</code>      | <code>0xXY000000</code> |

If you use the `I8` or `I64` datatypes, the assembler converts it to either the `I16` or `I32` instruction to match the pattern of *imm*. If the immediate value does not match any of the patterns in the preceding table, the assembler generates an error.

### Related information

[VAND \(immediate\) \(A32\)](#) on page 517

## 15.18 VBIC (register) (A32)

Vector Bit Clear.

### Syntax

`VBIC{cond}{.datatype} {Qd}, Qn, Qm`

`VBIC{cond}{.datatype} {Dd}, Dn, Dm`

where:

***cond***

is an optional condition code.

***datatype***

is an optional data type. The assembler ignores *datatype*.

***Qd, Qn, Qm***

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

***Dd, Dn, Dm***

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

**Operation**

`VBIC` performs a bitwise logical AND complement between two registers, and places the result in the destination register.

**Related information**

[Condition code suffixes](#) on page 138

## 15.19 VBIF (A32)

Vector Bitwise Insert if False.

**Syntax**

```
VBIF{cond} {datatype} {Qd}, Qn, Qm
```

```
VBIF{cond} {datatype} {Dd}, Dn, Dm
```

where:

***cond***

is an optional condition code.

***datatype***

is an optional datatype. The assembler ignores *datatype*.

***Qd, Qn, Qm***

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

***Dd, Dn, Dm***

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

**Operation**

`VBIF` inserts each bit from the first operand into the destination if the corresponding bit of the second operand is 0, otherwise it leaves the destination bit unchanged.

## Related information

[Condition code suffixes](#) on page 138

## 15.20 VBIT (A32)

Vector Bitwise Insert if True.

### Syntax

`VBIT{cond}{.datatype} {Qd}, Qn, Qm`

`VBIT{cond}{.datatype} {Dd}, Dn, Dm`

where:

**`cond`**

is an optional condition code.

**`datatype`**

is an optional datatype. The assembler ignores `datatype`.

**`Qd, Qn, Qm`**

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

**`Dd, Dn, Dm`**

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

### Operation

`VBIT` inserts each bit from the first operand into the destination if the corresponding bit of the second operand is 1, otherwise it leaves the destination bit unchanged.

### Related information

[Condition code suffixes](#) on page 138

## 15.21 VBSL (A32)

Vector Bitwise Select.

### Syntax

`VBSL{cond}{.datatype} {Qd}, Qn, Qm`

`VBSL{cond}{.datatype} {Dd}, Dn, Dm`

where:

***cond***

is an optional condition code.

***datatype***

is an optional datatype. The assembler ignores *datatype*.

***Qd, Qn, Qm***

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

***Dd, Dn, Dm***

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

## Operation

`VBSL` selects each bit for the destination from the first operand if the corresponding bit of the destination is 1, or from the second operand if the corresponding bit of the destination is 0.

### Related information

[Condition code suffixes](#) on page 138

## 15.22 VCADD (A32)

Vector Complex Add.

### Syntax

```
VCADD{q}.dt {Dd,} Dn, Dm, #rotate ; 64-bit SIMD vector FP/SIMD registers
```

```
VCADD{q}.dt {Qd,} Qn, Qm, #rotate ; 128-bit SIMD vector FP/SIMD registers
```

Where:

***q***

Is an optional instruction width specifier. See [Instruction width specifiers](#).

***dt***

Is the data type for the elements of the vectors, and can be either `F16` or `F32`.

***Dd***

Is the 64-bit name of the SIMD and FP destination register.

***Dn***

Is the 64-bit name of the first SIMD and FP source register.

***Dm***

Is the 64-bit name of the second SIMD and FP source register.

***Qd***

Is the 128-bit name of the SIMD and FP destination register.

***Qn***

Is the 128-bit name of the first SIMD and FP source register.

***Qm***

Is the 128-bit name of the second SIMD and FP source register.

***rotate***

Is the rotation to be applied to elements in the second SIMD and FP source register, and can be either 90 or 270.

**Architectures supported**

Supported in the Arm®v8.3-A architecture and later.

**Usage**

Depending on settings in the CPACR, NSACR, and HCPTR registers, and the security state and mode in which the instruction is executed, an attempt to execute the instruction might be **UNDEFINED**, or trapped to Hyp mode. For more information see *Enabling Advanced SIMD and floating-point support* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[Summary of Advanced SIMD instructions](#) on page 505

## 15.23 VCEQ (immediate #0) (A32)

Vector Compare Equal to zero.

**Syntax**

VCEQ{cond}.datatype {Qd}, Qn, #0

VCEQ{cond}.datatype {Dd}, Dn, #0

where:

***cond***

is an optional condition code.

***datatype***

must be one of I8, I16, I32, or F32.

The result datatype is:

- I32 for operand datatypes I32 or F32.
- I16 for operand datatype I16.
- I8 for operand datatype I8.

***Qd, Qn, Qm***

specifies the destination register and the operand register, for a quadword operation.

**Dd, Dn, Dm**

specifies the destination register and the operand register, for a doubleword operation.

**#0**

specifies a comparison with zero.

**Operation**

vceq takes the value of each element in a vector, and compares it with zero. If the condition is true, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.

**Related information**

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.24 VCEQ (register) (A32)

Vector Compare Equal.

**Syntax**

VCEQ{cond}.datatype {Qd}, Qn, Qm

VCEQ{cond}.datatype {Dd}, Dn, Dm

where:

**cond**

is an optional condition code.

**datatype**

must be one of I8, I16, I32, or F32.

The result datatype is:

- I32 for operand datatypes I32 or F32.
- I16 for operand datatype I16.
- I8 for operand datatype I8.

**Qd, Qn, Qm**

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

**Dd, Dn, Dm**

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

## Operation

`VCNEQ` takes the value of each element in a vector, and compares it with the value of the corresponding element of a second vector. If the condition is true, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.25 VCGE (immediate #0) (A32)

Vector Compare Greater than or Equal to zero.

### Syntax

`VCGE{cond}.datatype {Qd}, Qn, #0`

`VCGE{cond}.datatype {Dd}, Dn, #0`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, or f32.

The result datatype is:

- i32 for operand datatypes s32 or f32.
- i16 for operand datatype s16.
- i8 for operand datatype s8.

**Qd, Qn, Qm**

specifies the destination register and the operand register, for a quadword operation.

**Dd, Dn, Dm**

specifies the destination register and the operand register, for a doubleword operation.

**#0**

specifies a comparison with zero.

## Operation

`VCGE` takes the value of each element in a vector, and compares it with zero. If the condition is true, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.26 VCGE (register) (A32)

Vector Compare Greater than or Equal.

### Syntax

`VCGE{cond}.datatype {Qd}, Qn, Qm`

`VCGE{cond}.datatype {Dd}, Dn, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, u8, u16, u32, or f32.

The result datatype is:

- i32 for operand datatypes s32, u32, or f32.
- i16 for operand datatypes s16 or u16.
- i8 for operand datatypes s8 or u8.

**Qd, Qn, Qm**

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

**Dd, Dn, Dm**

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

### Operation

`vcge` takes the value of each element in a vector, and compares it with the value of the corresponding element of a second vector. If the condition is true, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.27 VCGT (immediate #0) (A32)

Vector Compare Greater Than zero.

### Syntax

`VCGT{cond}.datatype {Qd}, Qn, #0`

`VCGT{cond}.datatype {Dd}, Dn, #0`

where:

***cond***

is an optional condition code.

***datatype***

must be one of s8, s16, s32, or f32.

The result datatype is:

- i32 for operand datatypes s32 or f32.
- i16 for operand datatype s16.
- i8 for operand datatype s8.

***Qd, Qn, Qm***

specifies the destination register and the operand register, for a quadword operation.

***Dd, Dn, Dm***

specifies the destination register and the operand register, for a doubleword operation.

### Operation

`vCGT` takes the value of each element in a vector, and compares it with zero. If the condition is true, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.28 VCGT (register) (A32)

Vector Compare Greater Than.

### Syntax

`VCGT{cond}.datatype {Qd}, Qn, Qm`

`VCGT{cond}.datatype {Dd}, Dn, Dm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of s8, s16, s32, u8, u16, u32, or f32.

The result datatype is:

- i32 for operand datatypes s32, u32, or f32.
- i16 for operand datatypes s16 or u16.
- i8 for operand datatypes s8 or u8.

***Qd, Qn, Qm***

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

***Dd, Dn, Dm***

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

## Operation

`VCGR` takes the value of each element in a vector, and compares it with the value of the corresponding element of a second vector. If the condition is true, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.29 VCLE (immediate #0) (A32)

Vector Compare Less than or Equal to zero.

### Syntax

`VCLE{cond}.datatype {Qd}, Qn, #0`

`VCLE{cond}.datatype {Dd}, Dn, #0`

where:

***cond***

is an optional condition code.

***datatype***

must be one of s8, s16, s32, or f32.

The result datatype is:

- `I32` for operand datatypes `S32` or `F32`.
- `I16` for operand datatype `S16`.
- `I8` for operand datatype `S8`.

***Qd, Qn, Qm***

specifies the destination register and the operand register, for a quadword operation.

***Dd, Dn, Dm***

specifies the destination register and the operand register, for a doubleword operation.

**#0**

specifies a comparison with zero.

## Operation

`VCLE` takes the value of each element in a vector, and compares it with zero. If the condition is true, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.30 VCLS (A32)

Vector Count Leading Sign bits.

### Syntax

`VCLS {cond} .datatype Qd, Qm`

`VCLS {cond} .datatype Dd, Dm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of `S8`, `S16`, or `S32`.

***Qd, Qm***

are the destination vector and the operand vector, for a quadword operation.

***Dd, Dm***

are the destination vector and the operand vector, for a doubleword operation.

## Operation

`vcls` counts the number of consecutive bits following the topmost bit, that are the same as the topmost bit, in each element in a vector, and places the results in a second vector.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.31 VCLE (register) (A32)

Vector Compare Less than or Equal pseudo-instruction.

### Syntax

`VCLE{cond}.datatype {Qd}, Qn, Qm`

`VCLE{cond}.datatype {Dd}, Dn, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, u8, u16, u32, or f32.

The result datatype is:

- i32 for operand datatypes s32, u32, or f32.
- i16 for operand datatypes s16 or u16.
- i8 for operand datatypes s8 or u8.

**Qd, Qn, Qm**

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

**Dd, Dn, Dm**

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

## Operation

`vcle` takes the value of each element in a vector, and compares it with the value of the corresponding element of a second vector. If the condition is true, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.

On disassembly, this pseudo-instruction is disassembled to the corresponding `vcge` instruction, with the operands reversed.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

# 15.32 VCLT (immediate #0) (A32)

Vector Compare Less Than zero.

## Syntax

`VCLT{cond}.datatype {Qd}, Qn, #0`

`VCLT{cond}.datatype {Dd}, Dn, #0`

where:

### **cond**

is an optional condition code.

### **datatype**

must be one of s8, s16, s32, or f32.

The result datatype is:

- i32 for operand datatypes s32 or f32.
- i16 for operand datatype s16.
- i8 for operand datatype s8.

### **Qd, Qn, Qm**

specifies the destination register and the operand register, for a quadword operation.

### **Dd, Dn, Dm**

specifies the destination register and the operand register, for a doubleword operation.

### **#0**

specifies a comparison with zero.

## Operation

`vclt` takes the value of each element in a vector, and compares it with zero. If the condition is true, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.33 VCLT (register) (A32)

Vector Compare Less Than.

### Syntax

`VCLT{cond}.datatype {Qd}, Qn, Qm`

`VCLT{cond}.datatype {Dd}, Dn, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, u8, u16, u32, or f32.

The result datatype is:

- i32 for operand datatypes s32, u32, or f32.
- i16 for operand datatypes s16 or u16.
- i8 for operand datatypes s8 or u8.

**Qd, Qn, Qm**

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

**Dd, Dn, Dm**

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

### Operation

`vclt` takes the value of each element in a vector, and compares it with the value of the corresponding element of a second vector. If the condition is true, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.



On disassembly, this pseudo-instruction is disassembled to the corresponding `vcgt` instruction, with the operands reversed.

Note

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.34 VCLZ (A32)

Vector Count Leading Zeros.

### Syntax

`VCLZ{cond}.datatypeQd, Qm`

`VCLZ{cond}.datatypeDd, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of `I8`, `I16`, or `I32`.

**Qd, Qm**

are the destination vector and the operand vector, for a quadword operation.

**Dd, Dm**

are the destination vector and the operand vector, for a doubleword operation.

### Operation

`VCLZ` counts the number of consecutive zeros, starting from the top bit, in each element in a vector, and places the results in a second vector.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.35 VCMLA (A32)

Vector Complex Multiply Accumulate.

### Syntax

`VCMLA{q}.dt {Dd,} Dn, Dm, #rotate ; 64-bit SIMD vector FP/SIMD registers`

`VCMLA{q}.dt {Qd,} Qn, Qm, #rotate ; 128-bit SIMD vector FP/SIMD registers`

Where:

**q**

Is an optional instruction width specifier. See [Instruction width specifiers](#).

**dt**

Is the data type for the elements of the vectors, and can be either `F16` or `F32`.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the first SIMD and FP source register.

**Dm**

Is the 64-bit name of the second SIMD and FP source register.

**Qd**

Is the 128-bit name of the SIMD and FP destination register.

**Qn**

Is the 128-bit name of the first SIMD and FP source register.

**Qm**

Is the 128-bit name of the second SIMD and FP source register.

**rotate**

Is the rotation to be applied to elements in the second SIMD and FP source register, and can be one of 0, 90, 180 or 270.

## Architectures supported

Supported in the Arm®v8.3-A architecture and later.

## Usage

Depending on settings in the CPACR, NSACR, and HCPTR registers, and the security state and mode in which the instruction is executed, an attempt to execute the instruction might be **UNDEFINED**, or trapped to Hyp mode. For more information see *Enabling Advanced SIMD and floating-point support* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[Summary of Advanced SIMD instructions](#) on page 505

## 15.36 VCMLA (by element) (A32)

Vector Complex Multiply Accumulate (by element).

### Syntax

```
VCMLA{q}.F16 Dd, Dn, Dm[index], #rotate ; Double,halfprec FP/SIMD registers
VCMLA{q}.F32 Dd, Dn, Dm[0], #rotate ; Double,singleprec FP/SIMD registers
VCMLA{q}.F32 Qd, Qn, Dm[0], #rotate ; Quad,singleprec FP/SIMD registers
VCMLA{q}.F16 Qd, Qn, Dm[index], #rotate ; Halfprec,quad FP/SIMD registers
```

Where:

**q**

Is an optional instruction width specifier. See [Instruction width specifiers](#).

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the first SIMD and FP source register.

**Dm**

Is the 64-bit name of the second SIMD and FP source register.

**index**

Is the element index in the range 0 to 1.

**Qd**

Is the 128-bit name of the SIMD and FP destination register.

**Qn**

Is the 128-bit name of the first SIMD and FP source register.

**rotate**

Is the rotation to be applied to elements in the second SIMD and FP source register, and can be one of 0, 90, 180 or 270.

## Architectures supported

Supported in the Arm®v8.3-A architecture and later.

## Usage

Depending on settings in the CPACR, NSACR, and HCPTR registers, and the security state and mode in which the instruction is executed, an attempt to execute the instruction might be **UNDEFINED**, or trapped to Hyp mode. For more information see *Enabling Advanced SIMD and floating-point support* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[Summary of Advanced SIMD instructions](#) on page 505

## 15.37 VCNT (A32)

Vector Count set bits.

### Syntax

`VCNT{cond}.datatypeQd, Qm`

`VCNT{cond}.datatypeDd, Dm`

where:

***cond***

is an optional condition code.

***datatype***

must be `I8`.

***Qd, Qm***

are the destination vector and the operand vector, for a quadword operation.

***Dd, Dm***

are the destination vector and the operand vector, for a doubleword operation.

## Operation

`VCNT` counts the number of bits that are one in each element in a vector, and places the results in a second vector.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.38 VCVT (between fixed-point or integer, and floating-point) (A32)

Vector Convert.

### Syntax

`VCVT{cond}.typeQd, Qm {, #fbits}`

`VCVT{cond}.typeDd, Dm {, #fbits}`

where:

***cond***

is an optional condition code.

***type***

specifies the data types for the elements of the vectors. It must be one of:

**`S32.F32`**

Floating-point to signed integer or fixed-point.

**`U32.F32`**

Floating-point to unsigned integer or fixed-point.

**`F32.S32`**

Signed integer or fixed-point to floating-point.

**F32.U32**

Unsigned integer or fixed-point to floating-point.

***Qd, Qm***

specifies the destination vector and the operand vector, for a quadword operation.

***Dd, Dm***

specifies the destination vector and the operand vector, for a doubleword operation.

***fbits***

if present, specifies the number of fraction bits in the fixed point number. Otherwise, the conversion is between floating-point and integer. *fbits* must lie in the range 0-32. If *fbits* is omitted, the number of fraction bits is 0.

## Operation

vcvt converts each element in a vector in one of the following ways, and places the results in the destination vector:

- From floating-point to integer.
- From integer to floating-point.
- From floating-point to fixed-point.
- From fixed-point to floating-point.

## Rounding

Integer or fixed-point to floating-point conversions use round to nearest.

Floating-point to integer or fixed-point conversions use round towards zero.

## Related information

[Condition code suffixes](#) on page 138

## 15.39 VCVT (between half-precision and single-precision floating-point) (A32)

Vector Convert.

### Syntax

VCVT{*cond*} .F32.F16 *Qd, Dm*

VCVT{*cond*} .F16.F32 *Dd, Qm*

where:

***cond***

is an optional condition code.

***Qd, Dm***

specifies the destination vector for the single-precision results and the half-precision operand vector.

***Dd, Qm***

specifies the destination vector for half-precision results and the single-precision operand vector.

## Operation

`vcvt` with half-precision extension, converts each element in a vector in one of the following ways, and places the results in the destination vector:

- From half-precision floating-point to single-precision floating-point (`F32.F16`).
- From single-precision floating-point to half-precision floating-point (`F16.F32`).

## Architectures

This instruction is available in Arm®v8. In earlier architectures, it is only available in Neon® systems with the half-precision extension.

### Related information

[Condition code suffixes](#) on page 138

## 15.40 VCVT (from floating-point to integer with directed rounding modes) (A32)

`vcvt` (Vector Convert) converts each element in a vector from floating-point to signed or unsigned integer, and places the results in the destination vector.



- This instruction is supported only in Arm®v8.
- You cannot use `vcvt` with a directed rounding mode inside an IT block.

## Syntax

`VCVT mode .type Qd, Qm`

`VCVT mode .type Dd, Dm`

where:

***mode***

must be one of:

**A**

meaning round to nearest, ties away from zero

**N**

meaning round to nearest, ties to even

**P**

meaning round towards plus infinity

**M**

meaning round towards minus infinity.

***type***

specifies the data types for the elements of the vectors. It must be one of:

**S32.F32**

floating-point to signed integer

**U32.F32**

floating-point to unsigned integer.

***Qd, Qm***

specifies the destination and operand vectors, for a quadword operation.

***Dd, Dm***

specifies the destination and operand vectors, for a doubleword operation.

## 15.41 VCVTB, VCVTT (between half-precision and double-precision) (A32)

These instructions convert between half-precision and double-precision floating-point numbers.

The conversion can be done in either of the following ways:

- From half-precision floating-point to double-precision floating-point (**F64.F16**).
- From double-precision floating-point to half-precision floating-point (**F16.F64**).

**vcvtb** uses the bottom half (bits[15:0]) of the single word register to obtain or store the half-precision value.

**vcvtt** uses the top half (bits[31:16]) of the single word register to obtain or store the half-precision value.



These instructions are supported only in Arm®v8.

Note

### Syntax

**VCVTB{cond}.F64.F16 Dd, Sm**

`VCVTB{cond}.F16.F64 Sd, Dm`

`VCVTT{cond}.F64.F16 Dd, Sm`

`VCVTT{cond}.F16.F64 Sd, Dm`

where:

***cond***

is an optional condition code.

***Dd***

is a double-precision register for the result.

***Sm***

is a single word register holding the operand.

***Sd***

is a single word register for the result.

***Dm***

is a double-precision register holding the operand.

## Usage

These instructions convert the half-precision value in *Sm* to double-precision and place the result in *Dd*, or the double-precision value in *Dm* to half-precision and place the result in *Sd*.

## Floating-point exceptions

These instructions can produce Input Denormal, Invalid Operation, Overflow, Underflow, or Inexact exceptions.

## 15.42 VDUP (A32)

Vector Duplicate.

### Syntax

`VDUP{cond}.sizeQd, Dm[x]`

`VDUP{cond}.sizeDd, Dm[x]`

`VDUP{cond}.sizeQd, Rm`

`VDUP{cond}.sizeDd, Rm`

where:

***cond***

is an optional condition code.

***size***

must be 8, 16, or 32.

***Qd***

specifies the destination register for a quadword operation.

***Dd***

specifies the destination register for a doubleword operation.

***Dm[x]***

specifies the Advanced SIMD scalar.

***Rm***

specifies the general-purpose register. *Rm* must not be PC.

## Operation

`VDUP` duplicates a scalar into every element of the destination vector. The source can be an Advanced SIMD scalar or a general-purpose register.

## Related information

[Condition code suffixes](#) on page 138

## 15.43 VEOR (A32)

Vector Bitwise Exclusive OR.

### Syntax

```
VEOR{cond} {datatype} {Qd}, Qn, Qm
```

```
VEOR{cond} {datatype} {Dd}, Dn, Dm
```

where:

***cond***

is an optional condition code.

***datatype***

is an optional data type. The assembler ignores *datatype*.

***Qd, Qn, Qm***

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

***Dd, Dn, Dm***

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

## Operation

`VOR` performs a logical exclusive OR between two registers, and places the result in the destination register.

## Related information

[Condition code suffixes](#) on page 138

## 15.44 VEXT (A32)

Vector Extract.

### Syntax

`VEXT{cond}.8 {Qd}, Qn, Qm, #imm`

`VEXT{cond}.8 {Dd}, Dn, Dm, #imm`

where:

**cond**

is an optional condition code.

**Qd, Qn, Qm**

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

**Dd, Dn, Dm**

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

**imm**

is the number of 8-bit elements to extract from the bottom of the second operand vector, in the range 0-7 for doubleword operations, or 0-15 for quadword operations.

## Operation

`VEXT` extracts 8-bit elements from the bottom end of the second operand vector and the top end of the first, concatenates them, and places the result in the destination vector. See the following figure for an example:

**Figure 15-2: Operation of doubleword VEXT for imm = 3**



## VEXT pseudo-instruction

You can specify a datatype of 16, 32, or 64 instead of 8. In this case, #*imm* refers to halfwords, words, or doublewords instead of referring to bytes, and the permitted ranges are correspondingly reduced.

### Related information

[Condition code suffixes](#) on page 138

## 15.45 VFMA, VFMS (A32)

Vector Fused Multiply Accumulate, Vector Fused Multiply Subtract.

### Syntax

`Vop {cond}.F32 {Qd}, Qn, Qm`

`Vop {cond}.F32 {Dd}, Dn, Dm`

where:

***op***

is one of `FMA` or `FMS`.

***cond***

is an optional condition code.

***Dd, Dn, Dm***

are the destination and operand vectors for doubleword operation.

***Qd, Qn, Qm***

are the destination and operand vectors for quadword operation.

### Operation

`VFMA` multiplies corresponding elements in the two operand vectors, and accumulates the results into the elements of the destination vector. The result of the multiply is not rounded before the accumulation.

`VFMS` multiplies corresponding elements in the two operand vectors, then subtracts the products from the corresponding elements of the destination vector, and places the final results in the destination vector. The result of the multiply is not rounded before the subtraction.

### Related information

[VMUL \(A32\)](#) on page 568

[Condition code suffixes](#) on page 138

## 15.46 VHADD (A32)

Vector Halving Add.

### Syntax

`VHADD{cond}.datatype {Qd}, Qn, Qm`

`VHADD{cond}.datatype {Dd}, Dn, Dm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of s8, s16, s32, u8, u16, or u32.

***Qd, Qn, Qm***

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

***Dd, Dn, Dm***

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

### Operation

`VHADD` adds corresponding elements in two vectors, shifts each result right one bit, and places the results in the destination vector. Results are truncated.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.47 VHSUB (A32)

Vector Halving Subtract.

### Syntax

`VHSUB{cond}.datatype {Qd}, Qn, Qm`

`VHSUB{cond}.datatype {Dd}, Dn, Dm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of s8, s16, s32, u8, u16, or u32.

***Qd, Qn, Qm***

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

***Dd, Dn, Dm***

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

**Operation**

`VHSUB` subtracts the elements of one vector from the corresponding elements of another vector, shifts each result right one bit, and places the results in the destination vector. Results are always truncated.

**Related information**

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.48 VLDr (single n-element structure to one lane) (A32)

Vector Load single n-element structure to one lane.

**Syntax**

```
VLDr {cond}.datatype{list}, [Rn{@align}]{!}
```

```
VLDr {cond}.datatype{list}, [Rn{@align}], Rm
```

where:

***n***

must be one of 1, 2, 3, or 4.

***cond***

is an optional condition code.

***datatype***

see the following table.

***list***

is the list of Advanced SIMD registers enclosed in braces, { and }. See the following table for options.

***Rn***

is the general-purpose register containing the base address. *Rn* cannot be PC.

***align***

specifies an optional alignment. See the following table for options.

!

if ! is present,  $Rn$  is updated to ( $Rn$  + the number of bytes transferred by the instruction). The update occurs after all the loads have taken place.

 **$Rm$** 

is a general-purpose register containing an offset from the base address. If  $Rm$  is present, the instruction updates  $Rn$  to ( $Rn$  +  $Rm$ ) after using the address to access memory.  $Rm$  cannot be SP or PC.

**Operation**

`VLD $n$`  loads one  $n$ -element structure from memory into one or more Advanced SIMD registers. Elements of the register that are not loaded are unaltered.

**Table 15-4: Permitted combinations of parameters for VLD $n$  (single  $n$ -element structure to one lane)**

| <b><math>n</math></b> | <b><code>datatype</code></b> | <b>list<sup>26</sup></b>                 | <b><code>align</code><sup>27</sup></b> | <b>alignment</b>  |
|-----------------------|------------------------------|------------------------------------------|----------------------------------------|-------------------|
| 1                     | 8                            | {Dd[x]}                                  | -                                      | Standard only     |
| -                     | 16                           | {Dd[x]}                                  | @16                                    | 2-byte            |
| -                     | 32                           | {Dd[x]}                                  | @32                                    | 4-byte            |
| 2                     | 8                            | {Dd[x], D(d+1)[x]}                       | @16                                    | 2-byte            |
| -                     | 16                           | {Dd[x], D(d+1)[x]}                       | @32                                    | 4-byte            |
| -                     | -                            | {Dd[x], D(d+2)[x]}                       | @32                                    | 4-byte            |
| -                     | 32                           | {Dd[x], D(d+1)[x]}                       | @64                                    | 8-byte            |
| -                     | -                            | {Dd[x], D(d+2)[x]}                       | @64                                    | 8-byte            |
| 3                     | 8                            | {Dd[x], D(d+1)[x], D(d+2)[x]}            | -                                      | Standard only     |
| -                     | 16 or 32                     | {Dd[x], D(d+1)[x], D(d+2)[x]}            | -                                      | Standard only     |
| -                     | -                            | {Dd[x], D(d+2)[x], D(d+4)[x]}            | -                                      | Standard only     |
| 4                     | 8                            | {Dd[x], D(d+1)[x], D(d+2)[x], D(d+3)[x]} | @32                                    | 4-byte            |
| -                     | 16                           | {Dd[x], D(d+1)[x], D(d+2)[x], D(d+3)[x]} | @64                                    | 8-byte            |
| -                     | -                            | {Dd[x], D(d+2)[x], D(d+4)[x], D(d+6)[x]} | @64                                    | 8-byte            |
| -                     | 32                           | {Dd[x], D(d+1)[x], D(d+2)[x], D(d+3)[x]} | @64 or @128                            | 8-byte or 16-byte |
| -                     | -                            | {Dd[x], D(d+2)[x], D(d+4)[x], D(d+6)[x]} | @64 or @128                            | 8-byte or 16-byte |

<sup>26</sup> Every register in the list must be in the range D0-D31.

<sup>27</sup> align can be omitted. In this case, standard alignment rules apply.

## Related information

[Interleaving provided by load and store element and structure instructions](#) on page 508

## 15.49 VLDn (single n-element structure to all lanes) (A32)

Vector Load single n-element structure to all lanes.

### Syntax

`VLDn {cond}.datatype{list}, [Rn{@align}]{!}`

`VLDn {cond}.datatype{list}, [Rn{@align}], Rm`

where:

**n**

must be one of 1, 2, 3, or 4.

**cond**

is an optional condition code.

**datatype**

see the following table.

**list**

is the list of Advanced SIMD registers enclosed in braces, { and }. See the following table for options.

**Rn**

is the general-purpose register containing the base address. *Rn* cannot be PC.

**align**

specifies an optional alignment. See the following table for options.

**!**

if ! is present, *Rn* is updated to (*Rn* + the number of bytes transferred by the instruction). The update occurs after all the loads have taken place.

**Rm**

is a general-purpose register containing an offset from the base address. If *Rm* is present, the instruction updates *Rn* to (*Rn* + *Rm*) after using the address to access memory. *Rm* cannot be SP or PC.

### Operation

`VLDn` loads multiple copies of one *n*-element structure from memory into one or more Advanced SIMD registers.

**Table 15-5: Permitted combinations of parameters for VLDrv (single n-element structure to all lanes)**

| <i>n</i> | <i>datatype</i> | <i>list</i> <sup>28</sup>               | <i>align</i> <sup>29</sup> | <i>alignment</i>  |
|----------|-----------------|-----------------------------------------|----------------------------|-------------------|
| 1        | 8               | {Dd[]}                                  | -                          | Standard only     |
| -        | -               | {Dd[], D(d+1) []}                       | -                          | Standard only     |
| -        | 16              | {Dd[]}                                  | @16                        | 2-byte            |
| -        | -               | {Dd[], D(d+1) []}                       | @16                        | 2-byte            |
| -        | 32              | {Dd[]}                                  | @32                        | 4-byte            |
| -        | -               | {Dd[], D(d+1) []}                       | @32                        | 4-byte            |
| 2        | 8               | {Dd[], D(d+1) []}                       | @8                         | byte              |
| -        | -               | {Dd[], D(d+2) []}                       | @8                         | byte              |
| -        | 16              | {Dd[], D(d+1) []}                       | @16                        | 2-byte            |
| -        | -               | {Dd[], D(d+2) []}                       | @16                        | 2-byte            |
| -        | 32              | {Dd[], D(d+1) []}                       | @32                        | 4-byte            |
| -        | -               | {Dd[], D(d+2) []}                       | @32                        | 4-byte            |
| 3        | 8, 16, or 32    | {Dd[], D(d+1) [], D(d+2) []}            | -                          | Standard only     |
| -        | -               | {Dd[], D(d+2) [], D(d+4) []}            | -                          | Standard only     |
| 4        | 8               | {Dd[], D(d+1) [], D(d+2) [], D(d+3) []} | @32                        | 4-byte            |
| -        | -               | {Dd[], D(d+2) [], D(d+4) [], D(d+6) []} | @32                        | 4-byte            |
| -        | 16              | {Dd[], D(d+1) [], D(d+2) [], D(d+3) []} | @64                        | 8-byte            |
| -        | -               | {Dd[], D(d+2) [], D(d+4) [], D(d+6) []} | @64                        | 8-byte            |
| -        | 32              | {Dd[], D(d+1) [], D(d+2) [], D(d+3) []} | @64 or @128                | 8-byte or 16-byte |
| -        | -               | {Dd[], D(d+2) [], D(d+4) [], D(d+6) []} | @64 or @128                | 8-byte or 16-byte |

## Related information

[Interleaving provided by load and store element and structure instructions](#) on page 508

## 15.50 VLDrv (multiple n-element structures) (A32)

Vector Load multiple n-element structures.

### Syntax

`VLDrv {cond}.datatype{list, [Rn{@align}]}{!}`

<sup>28</sup> Every register in the list must be in the range D0-D31.

<sup>29</sup> align can be omitted. In this case, standard alignment rules apply.

`VLDn {cond}.datatype{list}, [Rn{@align}], Rm`

where:

**n**

must be one of 1, 2, 3, or 4.

**cond**

is an optional condition code.

**datatype**

see the following table for options.

**list**

is the list of Advanced SIMD registers enclosed in braces, { and }. See the following table for options.

**Rn**

is the general-purpose register containing the base address. *Rn* cannot be PC.

**align**

specifies an optional alignment. See the following table for options.

!

if ! is present, *Rn* is updated to (*Rn* + the number of bytes transferred by the instruction). The update occurs after all the loads have taken place.

**Rm**

is a general-purpose register containing an offset from the base address. If *Rm* is present, the instruction updates *Rn* to (*Rn* + *Rm*) after using the address to access memory. *Rm* cannot be SP or PC.

## Operation

`VLDn` loads multiple *n*-element structures from memory into one or more Advanced SIMD registers, with de-interleaving (unless == 1). Every element of each register is loaded.

**Table 15-6: Permitted combinations of parameters for VLDn (multiple n-element structures)**

| <b>n</b> | <b>datatype</b>  | <b>list</b> <sup>30</sup>    | <b>align</b> <sup>31</sup> | <b>alignment</b>            |
|----------|------------------|------------------------------|----------------------------|-----------------------------|
| 1        | 8, 16, 32, or 64 | {Dd}                         | @64                        | 8-byte                      |
| -        | -                | {Dd, D(d+1)}                 | @64 or @128                | 8-byte or 16-byte           |
| -        | -                | {Dd, D(d+1), D(d+2)}         | @64                        | 8-byte                      |
| -        | -                | {Dd, D(d+1), D(d+2), D(d+3)} | @64, @128, or @256         | 8-byte, 16-byte, or 32-byte |
| 2        | 8, 16, or 32     | {Dd, D(d+1)}                 | @64, @128                  | 8-byte or 16-byte           |
| -        | -                | {Dd, D(d+2)}                 | @64, @128                  | 8-byte or 16-byte           |

<sup>30</sup> Every register in the list must be in the range D0-D31.

<sup>31</sup> align can be omitted. In this case, standard alignment rules apply.

| <b>n</b> | <b>datatype</b> | <b>list</b> <sup>30</sup>    | <b>align</b> <sup>31</sup> | <b>alignment</b>            |
|----------|-----------------|------------------------------|----------------------------|-----------------------------|
| -        | -               | {Dd, D(d+1), D(d+2), D(d+3)} | @64, @128, or @256         | 8-byte, 16-byte, or 32-byte |
| 3        | 8, 16, or 32    | {Dd, D(d+1), D(d+2)}         | @64                        | 8-byte                      |
| -        | -               | {Dd, D(d+2), D(d+4)}         | @64                        | 8-byte                      |
| 4        | 8, 16, or 32    | {Dd, D(d+1), D(d+2), D(d+3)} | @64, @128, or @256         | 8-byte, 16-byte, or 32-byte |
| -        | -               | {Dd, D(d+2), D(d+4), D(d+6)} | @64, @128, or @256         | 8-byte, 16-byte, or 32-byte |

## Related information

[Interleaving provided by load and store element and structure instructions](#) on page 508

## 15.51 VLDM (A32)

Extension register load multiple.

### Syntax

`VLDMmode {cond} Rn{!}, Registers`

where:

**mode**

must be one of:

**IA**

meaning Increment address After each transfer. **IA** is the default, and can be omitted.

**DB**

meaning Decrement address Before each transfer.

**EA**

meaning Empty Ascending stack operation. This is the same as **DB** for loads.

**FD**

meaning Full Descending stack operation. This is the same as **IA** for loads.

**cond**

is an optional condition code.

**Rn**

is the general-purpose register holding the base address for the transfer.

<sup>30</sup> Every register in the list must be in the range D0-D31.

<sup>31</sup> align can be omitted. In this case, standard alignment rules apply.

!

is optional. `!` specifies that the updated base address must be written back to `Rn`. If `!` is not specified, `mode` must be `IA`.

#### **Registers**

is a list of consecutive extension registers enclosed in braces, { and }. The list can be comma-separated, or in range format. There must be at least one register in the list.

You can specify D or Q registers, but they must not be mixed. The number of registers must not exceed 16 D registers, or 8 Q registers. If Q registers are specified, on disassembly they are shown as D registers.



VPOP *Registers* is equivalent to VLDM `sp!`, *Registers*.

You can use either form of this instruction. They both disassemble to VPOP.

#### Related information

[Stack implementation using LDM and STM](#) on page 119

[Condition code suffixes](#) on page 138

[VLDM \(floating-point\) \(A32\)](#) on page 639

## 15.52 VLDR (A32)

Extension register load.

#### Syntax

`VLDR{cond}{.64} Dd, [Rn{, #offset}]`

`VLDR{cond}{.64} Dd, label`

where:

**cond**

is an optional condition code.

**Dd**

is the extension register to be loaded.

**Rn**

is the general-purpose register holding the base address for the transfer.

**offset**

is an optional numeric expression. It must evaluate to a numeric value at assembly time. The value must be a multiple of 4, and lie in the range -1020 to +1020. The value is added to the base address to form the address used for the transfer.

***label***

is a PC-relative expression.

*label* must be aligned on a word boundary within  $\pm 1\text{KB}$  of the current instruction.

**Operation**

The **VLDR** instruction loads an extension register from memory.

Two words are transferred.

There is also a **VLDR** pseudo-instruction.

**Related information**

[VLDR pseudo-instruction \(A32\)](#) on page 554

[Condition code suffixes](#) on page 138

[Register-relative and PC-relative expressions](#) on page 222

[VLDR \(floating-point\) \(A32\)](#) on page 640

## 15.53 VLDR (post-increment and pre-decrement) (A32)

Pseudo-instruction that loads extension registers, with post-increment and pre-decrement forms.



There are also **VLDR** and **VSTR** instructions without post-increment and pre-decrement.

Note

**Syntax**

**VLDR{cond}{.64} Dd, [Rn], #offset ; post-increment**

**VLDR{cond}{.64} Dd, [Rn, #-offset]! ; pre-decrement**

where:

**cond**

is an optional condition code.

**Dd**

is the extension register to load.

**Rn**

is the general-purpose register holding the base address for the transfer.

**offset**

is a numeric expression that must evaluate to 8 at assembly time.

## Operation

The post-increment instruction increments the base address in the register by the offset value, after the transfer. The pre-decrement instruction decrements the base address in the register by the offset value, and then performs the transfer using the new address in the register. This pseudo-instruction assembles to a `VLDM` instruction.

## Related information

[VLDM \(A32\)](#) on page 551

[VLDR \(A32\)](#) on page 552

[Condition code suffixes](#) on page 138

[VLDR \(post-increment and pre-decrement, floating-point\) \(A32\)](#) on page 641

## 15.54 VLDR pseudo-instruction (A32)

The `VLDR` pseudo-instruction loads a constant value into every element of a 64-bit Advanced SIMD vector.



This description is for the `VLDR` pseudo-instruction only.

Note

## Syntax

`VLDR{cond}.datatype Dd,=constant`

where:

**cond**

is an optional condition code.

**datatype**

must be one of `In`, `Sn`, `Un`, or `F32`.

**n**

must be one of 8, 16, 32, or 64.

**Dd**

is the extension register to be loaded.

**constant**

is an immediate value of the appropriate type for `datatype`.

## Usage

If an instruction (for example, `vmov`) is available that can generate the constant directly into the register, the assembler uses it. Otherwise, it generates a doubleword literal pool entry containing the constant and loads the constant using a `VLDR` instruction.

## Related information

[VLDR \(A32\)](#) on page 552

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.55 VMAX and VMIN (A32)

Vector Maximum, Vector Minimum.

### Syntax

`Vop {cond}.datatypeQd, Qn, Qm`

`Vop {cond}.datatypeDd, Dn, Dm`

where:

***op***

must be either `MAX` or `MIN`.

***cond***

is an optional condition code.

***datatype***

must be one of `s8`, `s16`, `s32`, `u8`, `u16`, `u32`, or `f32`.

***Qd*, *Qn*, *Qm***

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

***Dd*, *Dn*, *Dm***

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

### Operation

`VMAX` compares corresponding elements in two vectors, and copies the larger of each pair into the corresponding element in the destination vector.

`VMIN` compares corresponding elements in two vectors, and copies the smaller of each pair into the corresponding element in the destination vector.

### Floating-point maximum and minimum

$\max(+0.0, -0.0) = +0.0$ .

$\min(+0.0, -0.0) = -0.0$

If any input is a NaN, the corresponding result element is the default NaN.

## Related information

[VPADD \(A32\)](#) on page 578

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.56 VMAXNM, VMINNM (A32)

Vector Minimum, Vector Maximum.



- These instructions are supported only in Arm®v8.
- You cannot use `VMAXNM` or `VMINNM` inside an IT block.

### Syntax

`V op .F32 Qd, Qn, Qm`

`V op .F32 Dd, Dn, Dm`

where:

**op**

must be either MAXNM or MINNM.

**Qd, Qn, Qm**

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

**Dd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

### Operation

`VMAXNM` compares corresponding elements in two vectors, and copies the larger of each pair into the corresponding element in the destination vector.

`VMINNM` compares corresponding elements in two vectors, and copies the smaller of each pair into the corresponding element in the destination vector.

If one of the elements in a pair is a number and the other element is NaN, the corresponding result element is the number. This is consistent with the IEEE 754-2008 standard.

## 15.57 VMLA (A32)

Vector Multiply Accumulate.

### Syntax

`VMLA{cond}.datatype {Qd}, Qn, Qm`

`VMLA{cond}.datatype {Dd}, Dn, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of `I8`, `I16`, `I32`, or `F32`.

**Qd, Qn, Qm**

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

**Dd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

### Operation

`VMLA` multiplies corresponding elements in two vectors, and accumulates the results into the elements of the destination vector.

### Related information

[Polynomial arithmetic over {0,1}](#) on page 168

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.58 VMLA (by scalar) (A32)

Vector Multiply by scalar and Accumulate.

### Syntax

`VMLA{cond}.datatype {Qd}, Qn, Dm[x]`

`VMLA{cond}.datatype {Dd}, Dn, Dm[x]`

where:

**cond**

is an optional condition code.

***datatype***

must be one of `I16`, `I32`, or `F32`.

***Qd, Qn***

are the destination vector and the first operand vector, for a quadword operation.

***Dd, Dn***

are the destination vector and the first operand vector, for a doubleword operation.

***Dm [x]***

is the scalar holding the second operand.

**Operation**

`VMLA` multiplies each element in a vector by a scalar, and accumulates the results into the corresponding elements of the destination vector.

**Related information**

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.59 VMLAL (by scalar) (A32)

Vector Multiply by scalar and Accumulate Long.

**Syntax**

`VMLAL{cond}.datatypeQd, Dn, Dm[x]`

where:

***cond***

is an optional condition code.

***datatype***

must be one of `S16`, `S32`, `U16`, or `U32`

***Qd, Dn***

are the destination vector and the first operand vector, for a long operation.

***Dm [x]***

is the scalar holding the second operand.

**Operation**

`VMLAL` multiplies each element in a vector by a scalar, and accumulates the results into the corresponding elements of the destination vector.

**Related information**

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.60 VMLAL (A32)

Vector Multiply Accumulate Long.

### Syntax

`VMLAL{cond}.datatype Qd, Dn, Dm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of `s8`, `s16`, `u16`, or `u32`.

***Qd, Dn, Dm***

are the destination vector, the first operand vector, and the second operand vector, for a long operation.

### Operation

`VMLAL` multiplies corresponding elements in two vectors, and accumulates the results into the elements of the destination vector.

### Related information

[Condition code suffixes](#) on page 138

[Polynomial arithmetic over {0,1}](#) on page 168

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.61 VMLS (by scalar) (A32)

Vector Multiply by scalar and Subtract.

### Syntax

`VMLS{cond}.datatype {Qd}, Qn, Dm[x]`

`VMLS{cond}.datatype {Dd}, Dn, Dm[x]`

where:

***cond***

is an optional condition code.

***datatype***

must be one of `I16`, `I32`, or `F32`.

***Qd, Qn***

are the destination vector and the first operand vector, for a quadword operation.

**Dd, Dn**

are the destination vector and the first operand vector, for a doubleword operation.

**Dm [x]**

is the scalar holding the second operand.

**Operation**

`VMLS` multiplies each element in a vector by a scalar, subtracts the results from the corresponding elements of the destination vector, and places the final results in the destination vector.

**Related information**

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.62 VMLS (A32)

Vector Multiply Subtract.

**Syntax**

`VMLS{cond}.datatype {Qd}, Qn, Qm`

`VMLS{cond}.datatype {Dd}, Dn, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of I8, I16, I32, F32.

**Qd, Qn, Qm**

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

**Dd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

**Operation**

`VMLS` multiplies corresponding elements in two vectors, subtracts the results from corresponding elements of the destination vector, and places the final results in the destination vector.

**Related information**

[Condition code suffixes](#) on page 138

[Polynomial arithmetic over {0,1}](#) on page 168

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.63 VMLSL (A32)

Vector Multiply Subtract Long.

### Syntax

`VMLSL{cond}.datatypeQd, Dn, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, u8, u16, or u32.

**Qd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a long operation.

### Operation

`VMLSL` multiplies corresponding elements in two vectors, subtracts the results from corresponding elements of the destination vector, and places the final results in the destination vector.

### Related information

[Condition code suffixes](#) on page 138

[Polynomial arithmetic over {0,1}](#) on page 168

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.64 VMLSL (by scalar) (A32)

Vector Multiply by scalar and Subtract Long.

### Syntax

`VMLSL{cond}.datatypeQd, Dn, Dm[x]`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s16, s32, u16, or u32.

**Qd, Dn**

are the destination vector and the first operand vector, for a long operation.

**Dm [x]**

is the scalar holding the second operand.

## Operation

`VMLSL` multiplies each element in a vector by a scalar, subtracts the results from the corresponding elements of the destination vector, and places the final results in the destination vector.

## Related information

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.65 VMOV (immediate) (A32)

Vector Move.

### Syntax

`VMOV{cond}.datatypeQd, #imm`

`VMOV{cond}.datatypeDd, #imm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of I8, I16, I32, I64, or F32.

**Qd or Dd**

is the Advanced SIMD register for the result.

**imm**

is an immediate value of the type specified by `datatype`. This is replicated to fill the destination register.

## Operation

`VMOV` replicates an immediate value in every element of the destination register.

| datatype                                                                                                                                                                                                                    | imm                                                          |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------|
| I8 Each of 0xGG, 0xHH, 0xJJ, 0xKK, 0xLL, 0xMM, 0xNN, and 0xPP must be either 0x00 or 0xFF. Any number that can be expressed as $\pm n * 2^{-r}$ , where $n$ and $r$ are integers, $16 \leq n \leq 31$ , $0 \leq r \leq 7$ . | 0XXY                                                         |
| I16                                                                                                                                                                                                                         | 0x00XY, 0XXY00                                               |
| I32                                                                                                                                                                                                                         | 0x000000XY, 0x0000XY00, 0x00XY0000, 0XXY000000               |
| I64                                                                                                                                                                                                                         | 0x0000XYFF, 0x00XYFFFF<br>byte masks,<br>0xGGHHJJKKLLMMNNPP1 |

| datatype | imm                      |
|----------|--------------------------|
| F32      | floating-point numbers 2 |

<sup>1</sup>0xGG0xHH0xJJ0xKK0xLL0xMM0xNN0xPP0x000xFF<sup>2</sup> n<sub>r</sub>n rn

### Related information

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.66 VMOV (register) (A32)

Vector Move.

### Syntax

VMOV{cond} {datatype} Qd, Qm

VMOV{cond} {datatype} Dd, Dm

where:

**cond**

is an optional condition code.

**datatype**

is an optional datatype. The assembler ignores *datatype*.

**Qd, Qm**

specifies the destination vector and the source vector, for a quadword operation.

**Dd, Dm**

specifies the destination vector and the source vector, for a doubleword operation.

### Operation

vmov copies the contents of the source register into the destination register.

### Related information

[Condition code suffixes](#) on page 138

## 15.67 VMOV (between two general-purpose registers and a 64-bit extension register) (A32)

Transfer contents between two general-purpose registers and a 64-bit extension register.

### Syntax

VMOV{cond} Dm, Rd, Rn

`VMOV{cond} Rd, Rn, Dm`

where:

**cond**

is an optional condition code.

**Dm**

is a 64-bit extension register.

**Rd, Rn**

are the general-purpose registers. `Rd` and `Rn` must not be PC.

## Operation

`VMOV Dm, Rd, Rn` transfers the contents of `Rd` into the low half of `Dm`, and the contents of `Rn` into the high half of `Dm`.

`VMOV Rd, Rn, Dm` transfers the contents of the low half of `Dm` into `Rd`, and the contents of the high half of `Dm` into `Rn`.

## Related information

[Condition code suffixes](#) on page 138

## 15.68 VMOV (between a general-purpose register and an Advanced SIMD scalar) (A32)

Transfer contents between a general-purpose register and an Advanced SIMD scalar.

### Syntax

`VMOV{cond}{.size} Dn[x], Rd`

`VMOV{cond}{.datatype} Rd, Dn[x]`

where:

**cond**

is an optional condition code.

**size**

the data size. Can be 8, 16, or 32. If omitted, `size` is 32.

**datatype**

the data type. Can be `U8`, `S8`, `U16`, `S16`, or 32. If omitted, `datatype` is 32.

**Dn[x]**

is the Advanced SIMD scalar.

**Rd**

is the general-purpose register. `Rd` must not be PC.

## Operation

`VMOV Dn[x], Rd` transfers the contents of the least significant byte, halfword, or word of `Rd` into `Dn[x]`.

`VMOV Rd, Dn[x]` transfers the contents of `Dn[x]` into the least significant byte, halfword, or word of `Rd`. The remaining bits of `Rd` are either zero or sign extended.

## Related information

[Condition code suffixes](#) on page 138

[Advanced SIMD scalars](#) on page 171

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.69 VMOVL (A32)

Vector Move Long.

### Syntax

`VMOVL{cond} .datatypeQd, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, u8, u16, or u32.

**Qd, Dm**

specifies the destination vector and the operand vector.

## Operation

`VMOVL` takes each element in a doubleword vector, sign or zero extends them to twice their original length, and places the results in a quadword vector.

## Related information

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.70 VMOVN (A32)

Vector Move and Narrow.

### Syntax

`VMOVN{cond} .datatypeDd, Qm`

where:

***cond***

is an optional condition code.

***datatype***

I32, or I64.

***Dd, Qm***

specifies the destination vector and the operand vector.

## Operation

VMOVN copies the least significant half of each element of a quadword vector into the corresponding elements of a doubleword vector.

## Related information

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.71 VMOV2 (A32)

Pseudo-instruction that generates an immediate value and places it in every element of an Advanced SIMD vector, without loading a value from a literal pool.

### Syntax

```
VMOV2 {cond}.datatype Qd, #constant
```

```
VMOV2 {cond}.datatype Dd, #constant
```

where:

***datatype***

must be one of:

- I8, I16, I32, or I64.
- S8, S16, S32, or S64.
- U8, U16, U32, or U64.
- F32.

***cond***

is an optional condition code.

***Qd or Dd***

is the extension register to be loaded.

***constant***

is an immediate value of the appropriate type for *datatype*.

## Operation

`vMOV2` can generate any 16-bit immediate value, and a restricted range of 32-bit and 64-bit immediate values.

`vMOV2` is a pseudo-instruction that always assembles to exactly two instructions. It typically assembles to a `vMOV` or `vMVN` instruction, followed by a `VBIC` or `VORR` instruction.

## Related information

[VMOV \(immediate\) \(A32\)](#) on page 562

[VBIC \(immediate\) \(A32\)](#) on page 519

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.72 VMRS (A32)

Transfer contents from an Advanced SIMD system register to a general-purpose register.

### Syntax

`VMRS {cond} Rd, extsysreg`

where:

***cond***

is an optional condition code.

***extsysreg***

is the Advanced SIMD and floating-point system register, usually `FPSCR`, `FPSID`, or `FPEXC`.

***Rd***

is the general-purpose register. `Rd` must not be PC.

It can be `APSR_nzcv`, if `extsysreg` is `FPSCR`. In this case, the floating-point status flags are transferred into the corresponding flags in the special-purpose APSR.

### Usage

The `VMRS` instruction transfers the contents of `extsysreg` into `Rd`.



The instruction stalls the processor until all current Advanced SIMD or floating-point operations complete.

### Example

```
VMRS    r2, FPCID
VMRS    APSR_nzcv, FPSCR      ; transfer FP status register to the
                                ; special-purpose APSR
```

## Related information

[Advanced SIMD system registers in AArch32 state](#) on page 173  
[Condition code suffixes](#) on page 138  
[VMSR \(floating-point\) \(A32\)](#) on page 650

## 15.73 VMSR (A32)

Transfer contents of a general-purpose register to an Advanced SIMD system register.

### Syntax

`VMSR{cond} extsysreg, Rd`

where:

**cond**

is an optional condition code.

**extsysreg**

is the Advanced SIMD and floating-point system register, usually `FPSCR`, `FPSID`, or `FPEXC`.

**Rd**

is the general-purpose register. `Rd` must not be PC.

It can be `APSR_nzcv`, if `extsysreg` is `FPSCR`. In this case, the floating-point status flags are transferred into the corresponding flags in the special-purpose APSR.

### Usage

The `vmsr` instruction transfers the contents of `Rd` into `extsysreg`.



The instruction stalls the processor until all current Advanced SIMD operations complete.

### Example

```
VMSR    FPSCR, r4
```

### Related information

[Advanced SIMD system registers in AArch32 state](#) on page 173  
[Condition code suffixes](#) on page 138  
[VMSR \(floating-point\) \(A32\)](#) on page 651

## 15.74 VMUL (A32)

Vector Multiply.

### Syntax

`VMUL{cond}.datatype {Qd}, Qn, Qm`

`VMUL{cond}.datatype {Dd}, Dn, Dm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of I8, I16, I32, F32, or P8.

***Qd, Qn, Qm***

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

***Dd, Dn, Dm***

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

### Operation

`VMUL` multiplies corresponding elements in two vectors, and places the results in the destination vector.

### Related information

[Condition code suffixes](#) on page 138

[Polynomial arithmetic over {0,1}](#) on page 168

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.75 VMUL (by scalar) (A32)

Vector Multiply by scalar.

### Syntax

`VMUL{cond}.datatype {Qd}, Qn, Dm[x]`

`VMUL{cond}.datatype {Dd}, Dn, Dm[x]`

where:

***cond***

is an optional condition code.

***datatype***

must be one of `I16`, `I32`, or `F32`.

***Qd, Qn***

are the destination vector and the first operand vector, for a quadword operation.

***Dd, Dn***

are the destination vector and the first operand vector, for a doubleword operation.

***Dm[x]***

is the scalar holding the second operand.

**Operation**

`VMUL` multiplies each element in a vector by a scalar, and places the results in the destination vector.

**Related information**

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.76 VMULL (A32)

Vector Multiply Long

**Syntax**

`VMULL{cond}.datatype Qd, Dn, Dm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of `U8`, `U16`, `U32`, `S8`, `S16`, `S32`, or `P8`.

***Qd, Dn, Dm***

are the destination vector, the first operand vector, and the second operand vector, for a long operation.

**Operation**

`VMULL` multiplies corresponding elements in two vectors, and places the results in the destination vector.

**Related information**

[Condition code suffixes](#) on page 138

[Polynomial arithmetic over {0,1}](#) on page 168

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.77 VMULL (by scalar) (A32)

Vector Multiply Long by scalar

### Syntax

```
VMULL{cond}.datatypeQd, Dn, Dm[x]
```

where:

**cond**

is an optional condition code.

**datatype**

must be one of s16, s32, u16, or u32.

**Qd or Dn**

are the destination vector and the first operand vector, for a long operation.

**Dm[x]**

is the scalar holding the second operand.

### Operation

VMULL multiplies each element in a vector by a scalar, and places the results in the destination vector.

### Related information

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.78 VMVN (register) (A32)

Vector Move NOT (register).

### Syntax

```
VMVN{cond}{.datatype} Qd, Qm
```

```
VMVN{cond}{.datatype} Dd, Dm
```

where:

**cond**

is an optional condition code.

**datatype**

is an optional datatype. The assembler ignores *datatype*.

**Qd, Qm**

specifies the destination vector and the source vector, for a quadword operation.

**Dd, Dm**

specifies the destination vector and the source vector, for a doubleword operation.

**Operation**

`VMVN` inverts the value of each bit from the source register and places the results into the destination register.

**Related information**

[Condition code suffixes](#) on page 138

## 15.79 VMVN (immediate) (A32)

Vector Move NOT (immediate).

**Syntax**

`VMVN {cond} .datatypeQd, #imm`

`VMVN {cond} .datatypeDd, #imm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of I8, I16, I32, I64, or F32.

**Qd or Dd**

is the Advanced SIMD register for the result.

**imm**

is an immediate value of the type specified by `datatype`. This is replicated to fill the destination register.

**Operation**

`VMVN` inverts the value of each bit from an immediate value and places the results into each element in the destination register.

| datatype | imm                                             |
|----------|-------------------------------------------------|
| I8       | -                                               |
| I16      | 0xFFXY, 0xXYFF                                  |
| I32      | 0xFFFFFFFFXY, 0xFFFFXYFF, 0xFFXYFFFF, 0XYFFFFFF |
|          | 0xFFFFXY00, 0xFFXY0000                          |
| I64      | -                                               |
| F32      | -                                               |

## Related information

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.80 VNEG (A32)

Vector Negate.

### Syntax

`VNEG {cond} .datatypeQd, Qm`

`VNEG {cond} .datatypeDd, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, or f32.

**Qd, Qm**

are the destination vector and the operand vector, for a quadword operation.

**Dd, Dm**

are the destination vector and the operand vector, for a doubleword operation.

### Operation

`VNEG` negates each element in a vector, and places the results in a second vector. (The floating-point version only inverts the sign bit.)

### Related information

[VNEG \(floating-point\) \(A32\)](#) on page 653

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.81 VORN (register) (A32)

Vector bitwise OR NOT (register).

### Syntax

`VORN {cond} {datatype} {Qd}, Qn, Qm`

`VORN {cond} {datatype} {Dd}, Dn, Dm`

where:

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

***cond***

is an optional condition code.

***datatype***

is an optional data type. The assembler ignores *datatype*.

***Qd, Qn, Qm***

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

***Dd, Dn, Dm***

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

**Operation**

`VORN` performs a bitwise logical OR complement between two registers, and places the results in the destination register.

**Related information**

[Condition code suffixes](#) on page 138

## 15.82 VORN (immediate) (A32)

Vector bitwise OR NOT (immediate) pseudo-instruction.

**Syntax**

`VORN{cond}.datatypeQd, #imm`

`VORN{cond}.datatypeDd, #imm`

where:

***cond***

is an optional condition code.

***datatype***

must be either `I8`, `I16`, `I32`, or `I64`.

***Qd or Dd***

is the Advanced SIMD register for the result.

***imm***

is the immediate value.

**Operation**

`VORN` takes each element of the destination vector, performs a bitwise OR complement with an immediate value, and returns the results in the destination vector.



On disassembly, this pseudo-instruction is disassembled to a corresponding `VORR` instruction, with a complementary immediate value.

**Note**

## Immediate values

If `datatype` is `I16`, the immediate value must have one of the following forms:

- `0xFFFFXY`.
- `0xXYFF`.

If `datatype` is `I32`, the immediate value must have one of the following forms:

- `0xFFFFFFFFXY`.
- `0xFFFFXYFF`.
- `0xFFXYFFFF`.
- `0xXYFFFFFF`.

## Related information

[VORR \(immediate\) \(A32\)](#) on page 576

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.83 VORR (register) (A32)

Vector bitwise OR (register).

### Syntax

`VORR{cond} {datatype} {Qd}, Qn, Qm`

`VORR{cond} {datatype} {Dd}, Dn, Dm`

where:

**cond**

is an optional condition code.

**datatype**

is an optional data type. The assembler ignores `datatype`.

**Qd, Qn, Qm**

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

**Dd, Dn, Dm**

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.



VORR with the same register for both operands is a vMOV instruction. You can use VORR in this way, but disassembly of the resulting code produces the vMOV syntax.

**Operation**

VORR performs a bitwise logical OR between two registers, and places the result in the destination register.

**Related information**

[VMOV \(register\) \(A32\)](#) on page 563  
[Condition code suffixes](#) on page 138

## 15.84 VORR (immediate) (A32)

Vector bitwise OR immediate.

**Syntax**

VORR{cond}.datatypeQd, #imm

VORR{cond}.datatypeDd, #imm

where:

**cond**

is an optional condition code.

**datatype**

must be either I8, I16, I32, or I64.

**Qd or Dd**

is the Advanced SIMD register for the source and result.

**imm**

is the immediate value.

**Operation**

VORR takes each element of the destination vector, performs a bitwise logical OR with an immediate value, and places the results in the destination vector.

## Immediate values

You can either specify *imm* as a pattern which the assembler repeats to fill the destination register, or you can directly specify the immediate value (that conforms to the pattern) in full. The pattern for *imm* depends on the datatype, as shown in the following table:

**Table 15-9: Patterns for immediate value in VORR (immediate)**

| I16    | I32        |
|--------|------------|
| 0x00XY | 0x000000XY |
| 0xXY00 | 0x0000XY00 |
| -      | 0x00XY0000 |
| -      | 0xXY000000 |

If you use the *i8* or *i64* datatypes, the assembler converts it to either the *i16* or *i32* instruction to match the pattern of *imm*. If the immediate value does not match any of the patterns in the preceding table, the assembler generates an error.

## Related information

[Condition code suffixes](#) on page 138

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

## 15.85 VPADAL (A32)

Vector Pairwise Add and Accumulate Long.

### Syntax

VPADAL{cond}.datatypeQd, Qm

VPADAL{cond}.datatypeDd, Dm

where:

#### cond

is an optional condition code.

#### datatype

must be one of s8, s16, s32, u8, u16, or u32.

#### Qd, Qm

are the destination vector and the operand vector, for a quadword instruction.

#### Dd, Dm

are the destination vector and the operand vector, for a doubleword instruction.

### Operation

VPADAL adds adjacent pairs of elements of a vector, and accumulates the absolute values of the results into the elements of the destination vector.

**Figure 15-3: Example of operation of VPADAL (in this case for data type I16)****Related information**[Advanced SIMD data types in A32/T32 instructions](#) on page 168[Condition code suffixes](#) on page 138

## 15.86 VPADD (A32)

Vector Pairwise Add.

**Syntax**

`VPADD{cond}.datatype {Dd}, Dn, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of I8, I16, I32, or F32.

**Dd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector.

**Operation**

VPADD adds adjacent pairs of elements of two vectors, and places the results in the destination vector.

**Figure 15-4: Example of operation of VPADD (in this case, for data type I16)****Related information**[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.87 VPADDL (A32)

Vector Pairwise Add Long.

### Syntax

`VPADDL{cond}.datatypeQd, Qm`

`VPADDL{cond}.datatypeDd, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, u8, u16, or u32.

**Qd, Qm**

are the destination vector and the operand vector, for a quadword instruction.

**Dd, Dm**

are the destination vector and the operand vector, for a doubleword instruction.

### Operation

VPADDL adds adjacent pairs of elements of a vector, sign or zero extends the results to twice their original width, and places the final results in the destination vector.

**Figure 15-5: Example of operation of doubleword VPADDL (in this case, for data type S16)**



### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.88 VPMAX and VPMIN (A32)

Vector Pairwise Maximum, Vector Pairwise Minimum.

### Syntax

`VPop {cond}.datatypeDd, Dn, Dm`

where:

**op**

must be either MAX or MIN.

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, u8, u16, u32, or f32.

**Dd, Dn, Dm**

are the destination doubleword vector, the first operand doubleword vector, and the second operand doubleword vector.

### Operation

`VPMAX` compares adjacent pairs of elements in two vectors, and copies the larger of each pair into the corresponding element in the destination vector. Operands and results must be doubleword vectors.

`VPMIN` compares adjacent pairs of elements in two vectors, and copies the smaller of each pair into the corresponding element in the destination vector. Operands and results must be doubleword vectors.

### Floating-point maximum and minimum

$\max(+0.0, -0.0) = +0.0$ .

$\min(+0.0, -0.0) = -0.0$

If any input is a NaN, the corresponding result element is the default NaN.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.89 VPOP (A32)

Pop extension registers from the stack.

### Syntax

`VPOP{cond} Registers`

where:

**`cond`**

is an optional condition code.

**`Registers`**

is a list of consecutive extension registers enclosed in braces, { and }. The list can be comma-separated, or in range format. There must be at least one register in the list.

You can specify D or Q registers, but they must not be mixed. The number of registers must not exceed 16 D registers, or 8 Q registers. If Q registers are specified, on disassembly they are shown as D registers.



VPOP *Registers* is equivalent to VLDM sp!, *Registers*.

Note

You can use either form of this instruction. They both disassemble to vpop.

### Related information

[Stack implementation using LDM and STM](#) on page 119

[Condition code suffixes](#) on page 138

[VPUSH \(A32\)](#) on page 581

[VPOP \(floating-point\) \(A32\)](#) on page 655

## 15.90 VPUSH (A32)

Push extension registers onto the stack.

### Syntax

`VPUSH{cond} Registers`

where:

**`cond`**

is an optional condition code.

**`Registers`**

is a list of consecutive extension registers enclosed in braces, { and }. The list can be comma-separated, or in range format. There must be at least one register in the list.

You can specify D or Q registers, but they must not be mixed. The number of registers must not exceed 16 D registers, or 8 Q registers. If Q registers are specified, on disassembly they are shown as D registers.



`VPUSH Registers` is equivalent to `vSTMDB sp!, Registers`.

You can use either form of this instruction. They both disassemble to `VPUSH`.

## Related information

[Stack implementation using LDM and STM](#) on page 119

[Condition code suffixes](#) on page 138

[VPOP \(A32\)](#) on page 580

[VPUSH \(floating-point\) \(A32\)](#) on page 656

## 15.91 VQABS (A32)

Vector Saturating Absolute.

### Syntax

`VQABS {cond} .datatypeQd, Qm`

`VQABS {cond} .datatypeDd, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, or s32.

**Qd, Qm**

are the destination vector and the operand vector, for a quadword operation.

**Dd, Dm**

are the destination vector and the operand vector, for a doubleword operation.

### Operation

`VQABS` takes the absolute value of each element in a vector, and places the results in a second vector.

The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.92 VQADD (A32)

Vector Saturating Add.

### Syntax

`VQADD{cond}.datatype {Qd}, Qn, Qm`

`VQADD{cond}.datatype {Dd}, Dn, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, s64, u8, u16, u32, or u64.

**Qd, Qn, Qm**

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

**Dd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

### Operation

`VQADD` adds corresponding elements in two vectors, and places the results in the destination vector.

The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.93 VQDMLAL and VQDMLSL (by vector or by scalar) (A32)

Vector Saturating Doubling Multiply Accumulate Long, Vector Saturating Doubling Multiply Subtract Long.

### Syntax

`VQDopL{cond}.datatypeQd, Dn, Dm`

`VQDopL{cond}.datatypeQd, Dn, Dm[x]`

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

where:

***op***

must be one of:

***MIL*A**

Multiply Accumulate.

***MIL*S**

Multiply Subtract.

***cond***

is an optional condition code.

***datatype***

must be either s16 or s32.

***{Qd, Dn}***

are the destination vector and the first operand vector.

***Dm***

is the vector holding the second operand, for a by vector operation.

***Dm[x]***

is the scalar holding the second operand, for a by scalar operation.

## Operation

These instructions multiply their operands and double the results. `vQDMIL` adds the results to the values in the destination register. `vQDMLSL` subtracts the results from the values in the destination register.

If any of the results overflow, they are saturated. The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.94 VQDMULH (by vector or by scalar) (A32)

Vector Saturating Doubling Multiply Returning High Half.

### Syntax

`VQDMULH{cond}.datatype {Qd}, Qn, Qm`

`VQDMULH{cond}.datatype {Dd}, Dn, Dm`

`VQDMULH{cond}.datatype {Qd}, Qn, Dm[x]`

`VQDMULH{cond}.datatype {Dd}, Dn, Dm[x]`

where:

***cond***

is an optional condition code.

***datatype***

must be either s16 or s32.

***Dd, Qn***

are the destination vector and the first operand vector, for a quadword operation.

***Dd, Dn***

are the destination vector and the first operand vector, for a doubleword operation.

***Qm or Dm***

is the vector holding the second operand, for a by vector operation.

***Dm[x]***

is the scalar holding the second operand, for a by scalar operation.

## Operation

`VQDMULH` multiplies corresponding elements in two vectors, doubles the results, and places the most significant half of the final results in the destination vector.

The second operand can be a scalar instead of a vector.

If any of the results overflow, they are saturated. The sticky QC flag (FPSCR bit[27]) is set if saturation occurs. Each result is truncated.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.95 VQDMULL (by vector or by scalar) (A32)

Vector Saturating Doubling Multiply Long.

### Syntax

`VQDMULL{cond}.datatypeQd, Dn, Dm`

`VQDMULL{cond}.datatypeQd, Dn, Dm[x]`

where:

***cond***

is an optional condition code.

***datatype***

must be either s16 or s32.

**{Qd, Dn}**

are the destination vector and the first operand vector.

**Dm**

is the vector holding the second operand, for a by vector operation.

**Dm[x]**

is the scalar holding the second operand, for a by scalar operation.

## Operation

VQDMULL multiplies corresponding elements in two vectors, doubles the results and places the results in the destination register.

The second operand can be a scalar instead of a vector.

If any of the results overflow, they are saturated. The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.96 VQMOVN and VQMOVUN (A32)

Vector Saturating Move and Narrow.

### Syntax

VQMOVN {cond} .datatypeDd, Qm

VQMOVUN {cond} .datatypeDd, Qm

where:

**cond**

is an optional condition code.

**datatype**

must be one of:

**S16, S32, S64**

for VQMOVN or VQMOVUN.

**U16, U32, U64**

for VQMOVN.

**Dd, Qm**

specifies the destination vector and the operand vector.

**Operation**

vqmovn copies each element of the operand vector to the corresponding element of the destination vector. The result element is half the width of the operand element, and values are saturated to the result width. The results are the same type as the operands.

vqmovun copies each element of the operand vector to the corresponding element of the destination vector. The result element is half the width of the operand element, and values are saturated to the result width. The elements in the operand are signed and the elements in the result are unsigned.

**Related information**

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.97 VQNEG (A32)

Vector Saturating Negate.

**Syntax**

vqneg{cond}.datatypeQd, Qm

vqneg{cond}.datatypeDd, Dm

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, or s32.

**Qd, Qm**

are the destination vector and the operand vector, for a quadword operation.

**Dd, Dm**

are the destination vector and the operand vector, for a doubleword operation.

**Operation**

vqneg negates each element in a vector, and places the results in a second vector.

The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.

**Related information**

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.98 VQRDMULH (by vector or by scalar) (A32)

Vector Saturating Rounding Doubling Multiply Returning High Half.

### Syntax

`VQRDMULH{cond}.datatype {Qd}, Qn, Qm`

`VQRDMULH{cond}.datatype {Dd}, Dn, Dm`

`VQRDMULH{cond}.datatype {Qd}, Qn, Dm[x]`

`VQRDMULH{cond}.datatype {Dd}, Dn, Dm[x]`

where:

**cond**

is an optional condition code.

**datatype**

must be either s16 or s32.

**Qd, Qn**

are the destination vector and the first operand vector, for a quadword operation.

**Dd, Dn**

are the destination vector and the first operand vector, for a doubleword operation.

**Qm or Dm**

is the vector holding the second operand, for a by vector operation.

**Dm [x]**

is the scalar holding the second operand, for a by scalar operation.

### Operation

`VQRDMULH` multiplies corresponding elements in two vectors, doubles the results, and places the most significant half of the final results in the destination vector.

The second operand can be a scalar instead of a vector.

If any of the results overflow, they are saturated. The sticky QC flag (FPSCR bit[27]) is set if saturation occurs. Each result is rounded.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.99 VQRSHL (by signed variable) (A32)

Vector Saturating Rounding Shift Left by signed variable.

### Syntax

`VQRSHL{cond}.datatype {Qd}, Qm, Qn`

`VQRSHL{cond}.datatype {Dd}, Dm, Dn`

where:

***cond***

is an optional condition code.

***datatype***

must be one of s8, s16, s32, s64, u8, u16, u32, or u64.

***Qd, Qm, Qn***

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

***Dd, Dm, Dn***

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

### Operation

`VQRSHL` takes each element in a vector, shifts them by a value from the least significant byte of the corresponding element of a second vector, and places the results in the destination vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a rounding right shift.

The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.100 VQRSHRN and VQRSHRUN (by immediate) (A32)

Vector Saturating Shift Right, Narrow, by immediate value, with Rounding.

### Syntax

`VQRSHR {U}N{cond}.datatype Dd, Qm, #imm`

where:

***U***

if present, indicates that the results are unsigned, although the operands are signed. Otherwise, the results are the same type as the operands.

***cond***

is an optional condition code.

***datatype***

must be one of:

**I16, I32, I64**

for VQRSHRN or VQRSHRUN. Only a #0 immediate is permitted with these datatypes.

**S16, S32, S64**

for VQRSHRN or VQRSHRUN.

**U16, U32, U64**

for VQRSHRN only.

***Dd, Qm***

are the destination vector and the operand vector.

***imm***

is the immediate value specifying the size of the shift. The ranges are shown in the following table:

**Table 15-10: Available immediate ranges in VQRSHRN and VQRSHRUN (by immediate)**

| datatype   | imm range |
|------------|-----------|
| S16 or U16 | 0 to 8    |
| S32 or U32 | 0 to 16   |
| S64 or U64 | 0 to 32   |

**Operation**

VQRSHR{U}N takes each element in a quadword vector of integers, right shifts them by an immediate value, and places the results in a doubleword vector.

The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.

Results are rounded.

**Related information**

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.101 VQSHL (by signed variable) (A32)

Vector Saturating Shift Left by signed variable.

**Syntax**

VQSHL{cond}.datatype {Qd}, Qm, Qn

`VQSHL{cond}.datatype {Dd}, Dm, Dn`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, s64, u8, u16, u32, or u64.

**Qd, Qm, Qn**

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

**Dd, Dm, Dn**

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

## Operation

`vqshl` takes each element in a vector, shifts them by a value from the least significant byte of the corresponding element of a second vector, and places the results in the destination vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a truncating right shift.

The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.102 VQSHL and VQSHLU (by immediate) (A32)

Vector Saturating Shift Left.

### Syntax

`VQSHL {U}{cond}.datatype {Qd}, Qm, #imm`

`VQSHL {U}{cond}.datatype {Dd}, Dm, #imm`

where:

**U**

only permitted if `Q` is also present. Indicates that the results are unsigned even though the operands are signed.

**cond**

is an optional condition code.

**datatype**

must be one of :

**S8, S16, S32, S64**  
for vQSHL or vQSHLU.

**U8, U16, U32, U64**  
for vQSHL only.

**Qd, Qm**

are the destination and operand vectors, for a quadword operation.

**Dd, Dm**

are the destination and operand vectors, for a doubleword operation.

**imm**

is the immediate value specifying the size of the shift, in the range 0 to (size(*datatype*) - 1).  
The ranges are shown in the following table:

**Table 15-11: Available immediate ranges in VQSHL and VQSHLU (by immediate)**

| <b>datatype</b> | <b>imm range</b> |
|-----------------|------------------|
| S8 or U8        | 0 to 7           |
| S16 or U16      | 0 to 15          |
| S32 or U32      | 0 to 31          |
| S64 or U64      | 0 to 63          |

## Operation

vQSHL and vQSHLU instructions take each element in a vector of integers, left shift them by an immediate value, and place the results in the destination vector.

The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.103 VQSHRN and VQSHRUN (by immediate) (A32)

Vector Saturating Shift Right, Narrow, by immediate value.

### Syntax

VQSHR {U}N{cond}.datatypeDd, Qm, #imm

where:

**U**

if present, indicates that the results are unsigned, although the operands are signed.  
Otherwise, the results are the same type as the operands.

***cond***

is an optional condition code.

***datatype***

must be one of:

**I16, I32, I64**

for vQSHRN or vQSHRUN. Only a #0 immediate is permitted with these datatypes.

**S16, S32, S64**

for vQSHRN or vQSHRUN.

**U16, U32, U64**

for vQSHRN only.

***Dd, Qm***

are the destination vector and the operand vector.

***imm***

is the immediate value specifying the size of the shift. The ranges are shown in the following table:

**Table 15-12: Available immediate ranges in VQSHRN and VQSHRUN (by immediate)**

| datatype   | imm range |
|------------|-----------|
| S16 or U16 | 0 to 8    |
| S32 or U32 | 0 to 16   |
| S64 or U64 | 0 to 32   |

**Operation**

vQSHR{U}N takes each element in a quadword vector of integers, right shifts them by an immediate value, and places the results in a doubleword vector.

The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.

Results are truncated.

**Related information**

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.104 VQSUB (A32)

Vector Saturating Subtract.

**Syntax**

VQSUB{cond}.datatype {Qd}, Qn, Qm

`VQSUB{cond}.datatype {Dd}, Dn, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, s64, u8, u16, u32, or u64.

**Dd, Qn, Qm**

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

**Dd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

## Operation

`vqsub` subtracts the elements of one vector from the corresponding elements of another vector, and places the results in the destination vector.

The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.105 VRADDHN (A32)

Vector Rounding Add and Narrow, selecting High half.

### Syntax

`VRADDHN{cond}.datatype Dd, Qn, Qm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of i16, i32, or i64.

**Dd, Qn, Qm**

are the destination vector, the first operand vector, and the second operand vector.

## Operation

`VRADDHN` adds corresponding elements in two quadword vectors, selects the most significant halves of the results, and places the final results in the destination doubleword vector. Results are rounded.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.106 VRECPE (A32)

Vector Reciprocal Estimate.

### Syntax

`VRECPE {cond} .datatypeQd, Qm`

`VRECPE {cond} .datatypeDd, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be either `U32` or `F32`.

**Qd, Qm**

are the destination vector and the operand vector, for a quadword operation.

**Dd, Dm**

are the destination vector and the operand vector, for a doubleword operation.

## Operation

`VRECPE` finds an approximate reciprocal of each element in a vector, and places the results in a second vector.

### Results for out-of-range inputs

The following table shows the results where input values are out of range:

|                                                                           | Operand element               | Result element        |
|---------------------------------------------------------------------------|-------------------------------|-----------------------|
| Integer The Division by Zero exception bit in the FPSCR (FPSCR[1]) is set | <code>&lt;= 7FFFFFFF</code>   | <code>FFFFFFFF</code> |
| Floating-point                                                            | NaN                           | Default NaN           |
|                                                                           | Negative 0, Negative Denormal | Negative Infinity 1   |
|                                                                           | Positive 0, Positive Denormal | Positive Infinity 1   |
|                                                                           | Positive infinity             | Positive 0            |
|                                                                           | Negative infinity             | Negative 0            |

11

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.107 VRECPS (A32)

Vector Reciprocal Step.

### Syntax

VRECPS{cond}.F32{Qd}, Qn, Qm

VRECPS{cond}.F32 {Dd}, Dn, Dm

where:

**cond**

is an optional condition code.

**Qd, Qn, Qm**

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

**Dd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

### Operation

VRECPS multiplies the elements of one vector by the corresponding elements of another vector, subtracts each of the results from 2, and places the final results into the elements of the destination vector.

The Newton-Raphson iteration:

$$x_{n+1} = x_n (2 - dx_n)$$

converges to  $(1/d)$  if  $x_0$  is the result of VRECPE applied to  $d$ .

### Results for out-of-range inputs

The following table shows the results where input values are out of range:

**Table 15-14: Results for out-of-range inputs in VRECPS**

| 1st operand element | 2nd operand element | Result element |
|---------------------|---------------------|----------------|
| NaN                 | -                   | Default NaN    |
| -                   | NaN                 | Default NaN    |

| 1st operand element | 2nd operand element | Result element |
|---------------------|---------------------|----------------|
| ± 0.0 or denormal   | ± infinity          | 2.0            |
| ± infinity          | ± 0.0 or denormal   | 2.0            |

## Related information

[Condition code suffixes](#) on page 138

# 15.108 VREV16, VREV32, and VREV64 (A32)

Vector Reverse within halfwords, words, or doublewords.

## Syntax

`VREVn {cond}.sizeQd, Qm`

`VREVn {cond}.sizeDd, Dm`

where:

**n**

must be one of 16, 32, or 64.

**cond**

is an optional condition code.

**size**

must be one of 8, 16, or 32, and must be less than **n**.

**Qd, Qm**

specifies the destination vector and the operand vector, for a quadword operation.

**Dd, Dm**

specifies the destination vector and the operand vector, for a doubleword operation.

## Operation

`VREV16` reverses the order of 8-bit elements within each halfword of the vector, and places the result in the corresponding destination vector.

`VREV32` reverses the order of 8-bit or 16-bit elements within each word of the vector, and places the result in the corresponding destination vector.

`VREV64` reverses the order of 8-bit, 16-bit, or 32-bit elements within each doubleword of the vector, and places the result in the corresponding destination vector.

## Related information

[Condition code suffixes](#) on page 138

## 15.109 VRHADD (A32)

Vector Rounding Halving Add.

### Syntax

`VRHADD{cond}.datatype {Qd}, Qn, Qm`

`VRHADD{cond}.datatype {Dd}, Dn, Dm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of s8, s16, s32, u8, u16, or u32.

***Qd, Qn, Qm***

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

***Dd, Dn, Dm***

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

### Operation

`VRHADD` adds corresponding elements in two vectors, shifts each result right one bit, and places the results in the destination vector. Results are rounded.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.110 VRSHL (by signed variable) (A32)

Vector Rounding Shift Left by signed variable.

### Syntax

`VRSHL{cond}.datatype {Qd}, Qm, Qn`

`VRSHL{cond}.datatype {Dd}, Dm, Dn`

where:

***cond***

is an optional condition code.

***datatype***

must be one of s8, s16, s32, s64, u8, u16, u32, or u64.

***Qd, Qm, Qn***

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

***Dd, Dm, Dn***

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

**Operation**

**VRSHL** takes each element in a vector, shifts them by a value from the least significant byte of the corresponding element of a second vector, and places the results in the destination vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a rounding right shift.

**Related information**

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.111 VRSHR (by immediate) (A32)

Vector Rounding Shift Right by immediate value.

**Syntax**

**VRSHR{cond} .datatype {Qd}, Qm, #imm**

**VRSHR{cond} .datatype {Dd}, Dm, #imm**

where:

***cond***

is an optional condition code.

***datatype***

must be one of s8, s16, s32, s64, u8, u16, u32, or u64.

***Qd, Qm***

are the destination vector and the operand vector, for a quadword operation.

***Dd, Dm***

are the destination vector and the operand vector, for a doubleword operation.

***imm***

is the immediate value specifying the size of the shift, in the range 0 to (size(*datatype*)). The ranges are shown in the following table:

**Table 15-15: Available immediate ranges in VRSHR (by immediate)**

| datatype   | imm range |
|------------|-----------|
| S8 or U8   | 0 to 8    |
| S16 or U16 | 0 to 16   |
| S32 or U32 | 0 to 32   |
| S64 or U64 | 0 to 64   |

VRSHR with an immediate value of zero is a pseudo-instruction for VORR.

### Operation

VRSHR takes each element in a vector, right shifts them by an immediate value, and places the results in the destination vector. The results are rounded.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[VORR \(register\) \(A32\)](#) on page 575

[Condition code suffixes](#) on page 138

## 15.112 VRSHRN (by immediate) (A32)

Vector Rounding Shift Right, Narrow, by immediate value.

### Syntax

VRSHRN{cond}.datatypeDd, Qm, #imm

where:

**cond**

is an optional condition code.

**datatype**

must be one of I16, I32, or I64.

**Dd, Qm**

are the destination vector and the operand vector.

**imm**

is the immediate value specifying the size of the shift, in the range 0 to (size(datatype)/2).

The ranges are shown in the following table:

**Table 15-16: Available immediate ranges in VRSHRN (by immediate)**

| datatype | imm range |
|----------|-----------|
| I16      | 0 to 8    |
| I32      | 0 to 16   |
| I64      | 0 to 32   |

`VRSHRN` with an immediate value of zero is a pseudo-instruction for `VMOVN`.

## Operation

`VRSHRN` takes each element in a quadword vector, right shifts them by an immediate value, and places the results in a doubleword vector. The results are rounded.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[VMOVN \(A32\)](#) on page 565

[Condition code suffixes](#) on page 138

## 15.113 VRINT (A32)

`VRINT` (Vector Round to Integer) rounds each floating-point element in a vector to integer, and places the results in the destination vector.

The resulting integers are represented in floating-point format.



This instruction is supported only in Arm®v8.

Note

## Syntax

`VRINT mode .F32.F32 Qd, Qm`

`VRINT mode .F32.F32 Dd, Dm`

where:

**mode**

must be one of:

**A**

meaning round to nearest, ties away from zero. This cannot generate an Inexact exception, even if the result is not exact.

**N**

meaning round to nearest, ties to even. This cannot generate an Inexact exception, even if the result is not exact.

**X**

meaning round to nearest, ties to even, generating an Inexact exception if the result is not exact.

**P**

meaning round towards plus infinity. This cannot generate an Inexact exception, even if the result is not exact.

**M**

meaning round towards minus infinity. This cannot generate an Inexact exception, even if the result is not exact.

**Z**

meaning round towards zero. This cannot generate an Inexact exception, even if the result is not exact.

**Qd, Qm**

specifies the destination vector and the operand vector, for a quadword operation.

**Dd, Dm**

specifies the destination and operand vectors, for a doubleword operation.

## Notes

You cannot use `VRINT` inside an IT block.

## 15.114 VRSQRTE (A32)

Vector Reciprocal Square Root Estimate.

### Syntax

`VRSQRTE {cond} .datatypeQd, Qm`

`VRSQRTE {cond} .datatypeDd, Dm`

where:

**cond**

is an optional condition code.

**datatype**

must be either `U32` or `F32`.

**Qd, Qm**

are the destination vector and the operand vector, for a quadword operation.

**Dd, Dm**

are the destination vector and the operand vector, for a doubleword operation.

### Operation

`VRSQRTE` finds an approximate reciprocal square root of each element in a vector, and places the results in a second vector.

## Results for out-of-range inputs

The following table shows the results where input values are out of range:

|                                                                           | Operand element                         | Result element      |
|---------------------------------------------------------------------------|-----------------------------------------|---------------------|
| Integer The Division by Zero exception bit in the FPSCR (FPSCR[1]) is set | <= 3FFFFFFF                             | FFFFFFFF            |
| Floating-point                                                            | NaN, Negative Normal, Negative Infinity | Default NaN         |
|                                                                           | Negative 0, Negative Denormal           | Negative Infinity 1 |
|                                                                           | Positive 0, Positive Denormal           | Positive Infinity 1 |
|                                                                           | Positive infinity                       | Positive 0          |
|                                                                           |                                         | Negative 0          |

11

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.115 VRSQRTS (A32)

Vector Reciprocal Square Root Step.

### Syntax

`VRSQRTS {cond} .F32 {Qd}, Qn, Qm`

`VRSQRTS {cond} .F32 {Dd}, Dn, Dm`

where:

#### ***cond***

is an optional condition code.

#### ***Qd, Qn, Qm***

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

#### ***Dd, Dn, Dm***

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

### Operation

`VRSQRTS` multiplies the elements of one vector by the corresponding elements of another vector, subtracts each of the results from three, divides these results by two, and places the final results into the elements of the destination vector.

The Newton-Raphson iteration:

$$x_{\{n\}+1} = x_n (3 - dx_{\{n\}}^2) / 2$$

converges to  $(1 / \sqrt{\text{samp};\{d\}})$  if  $x_0$  is the result of `VRSQRTE` applied to  $d$ .

## Results for out-of-range inputs

The following table shows the results where input values are out of range:

**Table 15-18: Results for out-of-range inputs in VRSQRTS**

| 1st operand element   | 2nd operand element   | Result element |
|-----------------------|-----------------------|----------------|
| NaN                   | -                     | Default NaN    |
| -                     | NaN                   | Default NaN    |
| $\pm 0.0$ or denormal | $\pm$ infinity        | 1.5            |
| $\pm$ infinity        | $\pm 0.0$ or denormal | 1.5            |

## Related information

[Condition code suffixes](#) on page 138

## 15.116 VRSRA (by immediate) (A32)

Vector Rounding Shift Right by immediate value and Accumulate.

### Syntax

`VRSRA{cond}.datatype {Qd}, Qm, #imm`

`VRSRA{cond}.datatype {Dd}, Dm, #imm`

where:

#### **cond**

is an optional condition code.

#### **datatype**

must be one of S8, S16, S32, S64, U8, U16, U32, or U64.

#### **Qd, Qm**

are the destination vector and the operand vector, for a quadword operation.

#### **Dd, Dm**

are the destination vector and the operand vector, for a doubleword operation.

#### **imm**

is the immediate value specifying the size of the shift, in the range 1 to (size(`datatype`)). The ranges are shown in the following table:

**Table 15-19: Available immediate ranges in VRSRA (by immediate)**

| datatype | imm range |
|----------|-----------|
| S8 or U8 | 1 to 8    |

| datatype   | imm range |
|------------|-----------|
| S16 or U16 | 1 to 16   |
| S32 or U32 | 1 to 32   |
| S64 or U64 | 1 to 64   |

## Operation

`VRSRA` takes each element in a vector, right shifts them by an immediate value, and accumulates the results into the destination vector. The results are rounded.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.117 VRSUBHN (A32)

Vector Rounding Subtract and Narrow, selecting High half.

### Syntax

`VRSUBHN{cond}.datatypeDd, Qn, Qm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of `I16`, `I32`, or `I64`.

***Dd, Qn, Qm***

are the destination vector, the first operand vector, and the second operand vector.

## Operation

`VRSUBHN` subtracts the elements of one quadword vector from the corresponding elements of another quadword vector, selects the most significant halves of the results, and places the final results in the destination doubleword vector. Results are rounded.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.118 VSHL (by immediate) (A32)

Vector Shift Left by immediate.

### Syntax

`VSHL{cond}.datatype {Qd}, Qm, #imm`

`VSHL{cond}.datatype {Dd}, Dm, #imm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of `I8`, `I16`, `I32`, or `I64`.

***Qd*, *Qm***

are the destination and operand vectors, for a quadword operation.

***Dd*, *Dm***

are the destination and operand vectors, for a doubleword operation.

***imm***

is the immediate value specifying the size of the shift. The ranges are shown in the following table:

**Table 15-20: Available immediate ranges in VSHL (by immediate)**

| <b>datatype</b>  | <b>imm range</b> |
|------------------|------------------|
| <code>I8</code>  | 0 to 7           |
| <code>I16</code> | 0 to 15          |
| <code>I32</code> | 0 to 31          |
| <code>I64</code> | 0 to 63          |

### Operation

`VSHL` takes each element in a vector of integers, left shifts them by an immediate value, and places the results in the destination vector.

Bits shifted out of the left of each element are lost.

The following figure shows the operation of `VSHL` with two elements and a shift value of one. The least significant bit in each element in the destination vector is set to zero.

**Figure 15-6: Operation of quadword VSHL.64 Qd, Qm, #1****Related information**

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.119 VSHL (by signed variable) (A32)

Vector Shift Left by signed variable.

**Syntax**

`VSHL{cond}.datatype {Qd}, Qm, Qn`

`VSHL{cond}.datatype {Dd}, Dm, Dn`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, s64, u8, u16, u32, or u64.

**Qd, Qm, Qn**

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

**Dd, Dm, Dn**

are the destination vector, the first operand vector, and the second operand vector, for a doubleword operation.

**Operation**

`vshl` takes each element in a vector, shifts them by the value from the least significant byte of the corresponding element of a second vector, and places the results in the destination vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a truncating right shift.

**Related information**

[Advanced SIMD data types in A32/T32 instructions](#) on page 168  
[Condition code suffixes](#) on page 138

## 15.120 VSHLL (by immediate) (A32)

Vector Shift Left Long.

### Syntax

`VSHLL{cond}.datatype Qd, Dm, #imm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of s8, s16, s32, u8, u16, or u32.

***Qd, Dm***

are the destination and operand vectors, for a long operation.

***imm***

is the immediate value specifying the size of the shift. The ranges are shown in the following table:

**Table 15-21: Available immediate ranges in VSHLL (by immediate)**

| datatype   | imm range |
|------------|-----------|
| s8 or u8   | 1 to 8    |
| s16 or u16 | 1 to 16   |
| s32 or u32 | 1 to 32   |

0 is permitted, but the resulting code disassembles to `vmovl`.

### Operation

`VSHLL` takes each element in a vector of integers, left shifts them by an immediate value, and places the results in the destination vector. Values are sign or zero extended.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.121 VSHR (by immediate) (A32)

Vector Shift Right by immediate value.

### Syntax

`VSHR{cond}.datatype {Qd}, Qm, #imm`

`VSHR{cond}.datatype {Dd}, Dm, #imm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of s8, s16, s32, s64, u8, u16, u32, or u64.

**Dd, Qm**

are the destination vector and the operand vector, for a quadword operation.

**Dd, Dm**

are the destination vector and the operand vector, for a doubleword operation.

**imm**

is the immediate value specifying the size of the shift. The ranges are shown in the following table:

**Table 15-22: Available immediate ranges in VSHR (by immediate)**

| datatype   | imm range |
|------------|-----------|
| s8 or u8   | 0 to 8    |
| s16 or u16 | 0 to 16   |
| s32 or u32 | 0 to 32   |
| s64 or u64 | 0 to 64   |

vshr with an immediate value of zero is a pseudo-instruction for vorr.

## Operation

vshr takes each element in a vector, right shifts them by an immediate value, and places the results in the destination vector. The results are truncated.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[VORR \(register\) \(A32\)](#) on page 575

[Condition code suffixes](#) on page 138

## 15.122 VSHRN (by immediate) (A32)

Vector Shift Right, Narrow, by immediate value.

## Syntax

`VSHRN{cond}.datatype Dd, Qm, #imm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of `I16`, `I32`, or `I64`.

***Dd, Qm***

are the destination vector and the operand vector.

***imm***

is the immediate value specifying the size of the shift. The ranges are shown in the following table:

**Table 15-23: Available immediate ranges in VSHRN (by immediate)**

| datatype         | imm range |
|------------------|-----------|
| <code>I16</code> | 0 to 8    |
| <code>I32</code> | 0 to 16   |
| <code>I64</code> | 0 to 32   |

`VSHRN` with an immediate value of zero is a pseudo-instruction for `VMOVN`.

### Operation

`VSHRN` takes each element in a quadword vector, right shifts them by an immediate value, and places the results in a doubleword vector. The results are truncated.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[VMOVN \(A32\)](#) on page 565

[Condition code suffixes](#) on page 138

## 15.123 VSLI (A32)

Vector Shift Left and Insert.

### Syntax

`VSLI{cond}.size {Qd}, Qm, #imm`

`VSLI{cond}.size {Dd}, Dm, #imm`

where:

***cond***

is an optional condition code.

***size***

must be one of 8, 16, 32, or 64.

***Qd, Qm***

are the destination vector and the operand vector, for a quadword operation.

***Dd, Dm***

are the destination vector and the operand vector, for a doubleword operation.

***imm***

is the immediate value specifying the size of the shift, in the range 0 to (*size* - 1).

## Operation

`VSLI` takes each element in a vector, left shifts them by an immediate value, and inserts the results in the destination vector. Bits shifted out of the left of each element are lost. The following figure shows the operation of `VSLI` with two elements and a shift value of one. The least significant bit in each element in the destination vector is unchanged.

**Figure 15-7: Operation of quadword VSLI.64 Qd, Qm, #1**



## Related information

[Condition code suffixes](#) on page 138

## 15.124 VSRA (by immediate) (A32)

Vector Shift Right by immediate value and Accumulate.

### Syntax

```
VSRA{cond}.datatype {Qd}, Qm, #imm
```

```
VSRA{cond}.datatype {Dd}, Dm, #imm
```

where:

***cond***

is an optional condition code.

***datatype***

must be one of s8, s16, s32, s64, u8, u16, u32, or u64.

***Qd, Qm***

are the destination vector and the operand vector, for a quadword operation.

***Dd, Dm***

are the destination vector and the operand vector, for a doubleword operation.

***imm***

is the immediate value specifying the size of the shift. The ranges are shown in the following table:

**Table 15-24: Available immediate ranges in VSRA (by immediate)**

| datatype   | imm range |
|------------|-----------|
| S8 or U8   | 1 to 8    |
| S16 or U16 | 1 to 16   |
| S32 or U32 | 1 to 32   |
| S64 or U64 | 1 to 64   |

## Operation

VSRA takes each element in a vector, right shifts them by an immediate value, and accumulates the results into the destination vector. The results are truncated.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.125 VSRI (A32)

Vector Shift Right and Insert.

### Syntax

VSRI{cond}.size {Qd}, Qm, #imm

VSRI{cond}.size {Dd}, Dm, #imm

where:

***cond***

is an optional condition code.

***size***

must be one of 8, 16, 32, or 64.

***Qd, Qm***

are the destination vector and the operand vector, for a quadword operation.

***Dd, Dm***

are the destination vector and the operand vector, for a doubleword operation.

***imm***

is the immediate value specifying the size of the shift, in the range 1 to *size*.

**Operation**

`VSRI` takes each element in a vector, right shifts them by an immediate value, and inserts the results in the destination vector. Bits shifted out of the right of each element are lost. The following figure shows the operation of `VSRI` with a single element and a shift value of two. The two most significant bits in the destination vector are unchanged.

**Figure 15-8: Operation of doubleword VSRI.64 Dd, Dm, #2**

**Related information**

[Condition code suffixes](#) on page 138

## 15.126 VSTM (A32)

Extension register store multiple.

**Syntax**

`VSTMmode{cond} Rn{!}, Registers`

where:

**mode**

must be one of:

**IA**

meaning Increment address After each transfer. IA is the default, and can be omitted.

**DB**

meaning Decrement address Before each transfer.

**EA**

meaning Empty Ascending stack operation. This is the same as IA for stores.

**FD**

meaning Full Descending stack operation. This is the same as DB for stores.

***cond***

is an optional condition code.

***Rn***

is the general-purpose register holding the base address for the transfer.

!

is optional. ! specifies that the updated base address must be written back to *Rn*. If ! is not specified, *mode* must be IA.

***Registers***

is a list of consecutive extension registers enclosed in braces, { and }. The list can be comma-separated, or in range format. There must be at least one register in the list.

You can specify D or Q registers, but they must not be mixed. The number of registers must not exceed 16 D registers, or 8 Q registers. If Q registers are specified, on disassembly they are shown as D registers.



VPUSH *Registers* is equivalent to vSTMDB sp!, *Registers*.

You can use either form of this instruction. They both disassemble to VPUSH.

## Related information

[Stack implementation using LDM and STM](#) on page 119

[Condition code suffixes](#) on page 138

[VSTM \(floating-point\) \(A32\)](#) on page 659

## 15.127 VSTn (multiple n-element structures) (A32)

Vector Store multiple n-element structures.

### Syntax

```
VSTn {cond}.datatype, [Rn{@align}]{!}
```

```
VSTn {cond}.datatype, [Rn{@align}], Rm
```

where:

**n**

must be one of 1, 2, 3, or 4.

***cond***

is an optional condition code.

***datatype***

see the following table for options.

**list**

is the list of Advanced SIMD registers enclosed in braces, { and }. See the following table for options.

**Rn**

is the general-purpose register containing the base address. *Rn* cannot be PC.

**align**

specifies an optional alignment. See the following table for options.

!

if ! is present, *Rn* is updated to (*Rn* + the number of bytes transferred by the instruction). The update occurs after all the stores have taken place.

**Rm**

is a general-purpose register containing an offset from the base address. If *Rm* is present, the instruction updates *Rn* to (*Rn* + *Rm*) after using the address to access memory. *Rm* cannot be SP or PC.

**Operation**

`vstn` stores multiple *n*-element structures to memory from one or more Advanced SIMD registers, with interleaving (unless *n* == 1). Every element of each register is stored.

**Table 15-25: Permitted combinations of parameters for VSTn (multiple n-element structures)**

| <b>n</b> | <b>datatype</b>  | <b>list</b> <sup>32</sup>    | <b>align</b> <sup>33</sup> | <b>alignment</b>            |
|----------|------------------|------------------------------|----------------------------|-----------------------------|
| 1        | 8, 16, 32, or 64 | {Dd}                         | @64                        | 8-byte                      |
| -        | -                | {Dd, D(d+1)}                 | @64 or @128                | 8-byte or 16-byte           |
| -        | -                | {Dd, D(d+1), D(d+2)}         | @64                        | 8-byte                      |
| -        | -                | {Dd, D(d+1), D(d+2), D(d+3)} | @64, @128, or @256         | 8-byte, 16-byte, or 32-byte |
| 2        | 8, 16, or 32     | {Dd, D(d+1)}                 | @64, @128                  | 8-byte or 16-byte           |
| -        | -                | {Dd, D(d+2)}                 | @64, @128                  | 8-byte or 16-byte           |
| -        | -                | {Dd, D(d+1), D(d+2), D(d+3)} | @64, @128, or @256         | 8-byte, 16-byte, or 32-byte |
| 3        | 8, 16, or 32     | {Dd, D(d+1), D(d+2)}         | @64                        | 8-byte                      |
| -        | -                | {Dd, D(d+2), D(d+4)}         | @64                        | 8-byte                      |
| 4        | 8, 16, or 32     | {Dd, D(d+1), D(d+2), D(d+3)} | @64, @128, or @256         | 8-byte, 16-byte, or 32-byte |
| -        | -                | {Dd, D(d+2), D(d+4), D(d+6)} | @64, @128, or @256         | 8-byte, 16-byte, or 32-byte |

<sup>32</sup> Every register in the list must be in the range D0-D31.

<sup>33</sup> align can be omitted. In this case, standard alignment rules apply.

## Related information

[Interleaving provided by load and store element and structure instructions](#) on page 508

# 15.128 VSTn (single n-element structure to one lane) (A32)

Vector Store single n-element structure to one lane.

## Syntax

`VSTn {cond}.datatype{list}, [Rn{@align}]{!}`

`VSTn {cond}.datatype{list}, [Rn{@align}], Rm`

where:

**n**

must be one of 1, 2, 3, or 4.

**cond**

is an optional condition code.

**datatype**

see the following table.

**list**

is the list of Advanced SIMD registers enclosed in braces, { and }. See the following table for options.

**Rn**

is the general-purpose register containing the base address. *Rn* cannot be PC.

**align**

specifies an optional alignment. See the following table for options.

**!**

if ! is present, *Rn* is updated to (*Rn* + the number of bytes transferred by the instruction). The update occurs after all the stores have taken place.

**Rm**

is a general-purpose register containing an offset from the base address. If *Rm* is present, the instruction updates *Rn* to (*Rn* + *Rm*) after using the address to access memory. *Rm* cannot be SP or PC.

## Operation

`VSTn` stores one *n*-element structure into memory from one or more Advanced SIMD registers.

**Table 15-26: Permitted combinations of parameters for VSTn (single n-element structure to one lane)**

| <i>n</i> | <i>datatype</i> | <i>list</i> <sup>34</sup>                | <i>align</i> <sup>35</sup> | <i>alignment</i>  |
|----------|-----------------|------------------------------------------|----------------------------|-------------------|
| 1        | 8               | {Dd[x]}                                  | -                          | Standard only     |
| -        | 16              | {Dd[x]}                                  | @16                        | 2-byte            |
| -        | 32              | {Dd[x]}                                  | @32                        | 4-byte            |
| 2        | 8               | {Dd[x], D(d+1)[x]}                       | @16                        | 2-byte            |
| -        | 16              | {Dd[x], D(d+1)[x]}                       | @32                        | 4-byte            |
| -        | -               | {Dd[x], D(d+2)[x]}                       | @32                        | 4-byte            |
| -        | 32              | {Dd[x], D(d+1)[x]}                       | @64                        | 8-byte            |
| -        | -               | {Dd[x], D(d+2)[x]}                       | @64                        | 8-byte            |
| 3        | 8               | {Dd[x], D(d+1)[x], D(d+2)[x]}            | -                          | Standard only     |
| -        | 16 or 32        | {Dd[x], D(d+1)[x], D(d+2)[x]}            | -                          | Standard only     |
| -        | -               | {Dd[x], D(d+2)[x], D(d+4)[x]}            | -                          | Standard only     |
| 4        | 8               | {Dd[x], D(d+1)[x], D(d+2)[x], D(d+3)[x]} | @32                        | 4-byte            |
| -        | 16              | {Dd[x], D(d+1)[x], D(d+2)[x], D(d+3)[x]} | @64                        | 8-byte            |
| -        | -               | {Dd[x], D(d+2)[x], D(d+4)[x], D(d+6)[x]} | @64                        | 8-byte            |
| -        | 32              | {Dd[x], D(d+1)[x], D(d+2)[x], D(d+3)[x]} | @64 or @128                | 8-byte or 16-byte |
| -        | -               | {Dd[x], D(d+2)[x], D(d+4)[x], D(d+6)[x]} | @64 or @128                | 8-byte or 16-byte |

## Related information

[Interleaving provided by load and store element and structure instructions](#) on page 508

## 15.129 VSTR (A32)

Extension register store.

### Syntax

VSTR{cond}{.64} Dd, [Rn{, #offset}]

where:

<sup>34</sup> Every register in the list must be in the range D0-D31.

<sup>35</sup> align can be omitted. In this case, standard alignment rules apply.

***cond***

is an optional condition code.

***Dd***

is the extension register to be saved.

***Rn***

is the general-purpose register holding the base address for the transfer.

***offset***

is an optional numeric expression. It must evaluate to a numeric value at assembly time. The value must be a multiple of 4, and lie in the range -1020 to +1020. The value is added to the base address to form the address used for the transfer.

## Operation

The `VSTR` instruction saves the contents of an extension register to memory.

Two words are transferred.

### Related information

[Condition code suffixes](#) on page 138

[Register-relative and PC-relative expressions](#) on page 222

[VSTR \(floating-point\) \(A32\)](#) on page 660

## 15.130 VSTR (post-increment and pre-decrement) (A32)

Pseudo-instruction that stores extension registers with post-increment and pre-decrement forms.



There are also `VLDR` and `VSTR` instructions without post-increment and pre-decrement.

Note

### Syntax

`VSTR{cond}{.64} Dd, [Rn], #offset ; post-increment`

`VSTR{cond}{.64} Dd, [Rn, #-offset]! ; pre-decrement`

where:

***cond***

is an optional condition code.

***Dd***

is the extension register to be saved.

***Rn***

is the general-purpose register holding the base address for the transfer.

***offset***

is a numeric expression that must evaluate to 8 at assembly time.

**Operation**

The post-increment instruction increments the base address in the register by the offset value, after the transfer. The pre-decrement instruction decrements the base address in the register by the offset value, and then performs the transfer using the new address in the register. This pseudo-instruction assembles to a `VSTM` instruction.

**Related information**

[VSTR \(A32\)](#) on page 617

[VSTM \(A32\)](#) on page 613

[Condition code suffixes](#) on page 138

[VSTR \(post-increment and pre-decrement, floating-point\) \(A32\)](#) on page 661

## 15.131 VSUB (A32)

Vector Subtract.

**Syntax**

`VSUB{cond}.datatype {Qd}, Qn, Qm`

`VSUB{cond}.datatype {Dd}, Dn, Dm`

where:

***cond***

is an optional condition code.

***datatype***

must be one of `I8`, `I16`, `I32`, `I64`, or `F32`.

***Qd, Qn, Qm***

are the destination vector, the first operand vector, and the second operand vector, for a quadword operation.

**Operation**

`vsub` subtracts the elements of one vector from the corresponding elements of another vector, and places the results in the destination vector.

**Related information**

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.132 VSUBHN (A32)

Vector Subtract and Narrow, selecting High half.

### Syntax

`VSUBHN{cond}.datatypeDd, Qn, Qm`

where:

**cond**

is an optional condition code.

**datatype**

must be one of I16, I32, or I64.

**Dd, Qn, Qm**

are the destination vector, the first operand vector, and the second operand vector.

### Operation

VSUBHN subtracts the elements of one quadword vector from the corresponding elements of another quadword vector, selects the most significant halves of the results, and places the final results in the destination doubleword vector. Results are truncated.

### Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.133 VSUBL and VSUBW (A32)

Vector Subtract Long, Vector Subtract Wide.

### Syntax

`VSUBL{cond}.datatypeQd, Dn, Dm ; Long operation`

`VSUBW{cond}.datatype {Qd}, Qn, Dm ; Wide operation`

where:

**cond**

is an optional condition code.

**datatype**

must be one of S8, S16, S32, U8, U16, or U32.

**Qd, Dn, Dm**

are the destination vector, the first operand vector, and the second operand vector, for a long operation.

***Qd, Qn, Dm***

are the destination vector, the first operand vector, and the second operand vector, for a wide operation.

## Operation

`VSUBL` subtracts the elements of one doubleword vector from the corresponding elements of another doubleword vector, and places the results in the destination quadword vector.

`VSUBW` subtracts the elements of a doubleword vector from the corresponding elements of a quadword vector, and places the results in the destination quadword vector.

## Related information

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Condition code suffixes](#) on page 138

## 15.134 VSWP (A32)

Vector Swap.

### Syntax

`VSWP{cond} {datatype} Qd, Qm`

`VSWP{cond} {datatype} Dd, Dm`

where:

***cond***

is an optional condition code.

***datatype***

is an optional datatype. The assembler ignores *datatype*.

***Qd, Qm***

specifies the vectors for a quadword operation.

***Dd, Dm***

specifies the vectors for a doubleword operation.

## Operation

`VSWP` exchanges the contents of two vectors. The vectors can be either doubleword or quadword. There is no distinction between data types.

## Related information

[Condition code suffixes](#) on page 138

## 15.135 VTBL and VTBX (A32)

Vector Table Lookup, Vector Table Extension.

### Syntax

`Vop {cond}.8 Dd, list, Dm`

where:

**op**

must be either `TBL` or `TBX`.

**cond**

is an optional condition code.

**Dd**

specifies the destination vector.

**list**

Specifies the vectors containing the table. It must be one of:

- $\{D_n\}$ .
- $\{D_n, D(n+1)\}$ .
- $\{D_n, D(n+1), D(n+2)\}$ .
- $\{D_n, D(n+1), D(n+2), D(n+3)\}$ .
- $\{Q_n, Q(n+1)\}$ .

All the registers in `list` must be in the range `D0-D31` or `Q0-Q15` and must not wrap around the end of the register bank. For example `{D31, D0, D1}` is not permitted. If `list` contains `Q` registers, they disassemble to the equivalent `D` registers.

**Dm**

specifies the index vector.

### Operation

`VTBL` uses byte indexes in a control vector to look up byte values in a table and generate a new vector. Indexes out of range return zero.

`VTBX` works in the same way, except that indexes out of range leave the destination element unchanged.

### Related information

[Condition code suffixes](#) on page 138

## 15.136 VTRN (A32)

Vector Transpose.

### Syntax

`VTRN{cond}.sizeQd, Qm`

`VTRN{cond}.sizeDd, Dm`

where:

**cond**

is an optional condition code.

**size**

must be one of 8, 16, or 32.

**Qd, Qm**

specifies the vectors, for a quadword operation.

**Dd, Dm**

specifies the vectors, for a doubleword operation.

### Operation

`VTRN` treats the elements of its operand vectors as elements of  $2 \times 2$  matrices, and transposes the matrices. The following figures show examples of the operation of `VTRN`:

**Figure 15-9: Operation of doubleword VTRN.8**



**Figure 15-10: Operation of doubleword VTRN.32**



### Related information

[Condition code suffixes](#) on page 138

## 15.137 VTST (A32)

Vector Test bits.

### Syntax

`VTST{cond}.size {Qd}, Qn, Qm`

`VTST{cond}.size {Dd}, Dn, Dm`

where:

**`cond`**

is an optional condition code.

**`size`**

must be one of 8, 16, or 32.

**`Qd, Qn, Qm`**

specifies the destination register, the first operand register, and the second operand register, for a quadword operation.

**`Dd, Dn, Dm`**

specifies the destination register, the first operand register, and the second operand register, for a doubleword operation.

### Operation

`vtst` takes each element in a vector, and bitwise logical ANDs them with the corresponding element of a second vector. If the result is not zero, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.

### Related information

[Condition code suffixes](#) on page 138

## 15.138 VUZP (A32)

Vector Unzip.

### Syntax

`VUZP{cond}.sizeQd, Qm`

`VUZP{cond}.sizeDd, Dm`

where:

**`cond`**

is an optional condition code.

**size**

must be one of 8, 16, or 32.

***Qd, Qm***

specifies the vectors, for a quadword operation.

***Dd, Dm***

specifies the vectors, for a doubleword operation.

The following are all the same instruction:



- `VZIP.32 Dd, Dm.`
- `VUZP.32 Dd, Dm.`
- `VTRN.32 Dd, Dm.`

The instruction is disassembled as `VTRN.32 Dd, Dm.`

## Operation

`VUZP` de-interleaves the elements of two vectors.

De-interleaving is the inverse process of interleaving.

|    | Register state before operation |                |                |                |                |                |                |                | Register state after operation |                |                |                |                |                |                |                |
|----|---------------------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|--------------------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| Dd | A <sub>7</sub>                  | A <sub>6</sub> | A <sub>5</sub> | A <sub>4</sub> | A <sub>3</sub> | A <sub>2</sub> | A <sub>1</sub> | A <sub>0</sub> | B <sub>6</sub>                 | B <sub>4</sub> | B <sub>2</sub> | B <sub>0</sub> | A <sub>6</sub> | A <sub>4</sub> | A <sub>2</sub> | A <sub>0</sub> |
| Dm | B <sub>7</sub>                  | B <sub>6</sub> | B <sub>5</sub> | B <sub>4</sub> | B <sub>3</sub> | B <sub>2</sub> | B <sub>1</sub> | B <sub>0</sub> | B <sub>7</sub>                 | B <sub>5</sub> | B <sub>3</sub> | B <sub>1</sub> | A <sub>7</sub> | A <sub>5</sub> | A <sub>3</sub> | A <sub>1</sub> |

|    | Register state before operation |                |                |                | Register state after operation |                |                |                |
|----|---------------------------------|----------------|----------------|----------------|--------------------------------|----------------|----------------|----------------|
| Qd | A <sub>3</sub>                  | A <sub>2</sub> | A <sub>1</sub> | A <sub>0</sub> | B <sub>2</sub>                 | B <sub>0</sub> | A <sub>2</sub> | A <sub>0</sub> |
| Qm | B <sub>3</sub>                  | B <sub>2</sub> | B <sub>1</sub> | B <sub>0</sub> | B <sub>3</sub>                 | B <sub>1</sub> | A <sub>3</sub> | A <sub>1</sub> |

## Related information

[Interleaving provided by load and store element and structure instructions](#) on page 508

[VTRN \(A32\)](#) on page 622

[Condition code suffixes](#) on page 138

## 15.139 VZIP (A32)

Vector Zip.

### Syntax

`VZIP{cond}.sizeQd, Qm`

`VZIP{cond}.sizeDd, Dm`

where:

**cond**

is an optional condition code.

**size**

must be one of 8, 16, or 32.

**Qd, Qm**

specifies the vectors, for a quadword operation.

**Dd, Dm**

specifies the vectors, for a doubleword operation.

The following are all the same instruction:



- VZIP.32 Dd, Dm.
- VUZP.32 Dd, Dm.
- VTRN.32 Dd, Dm.

The instruction is disassembled as VTRN.32 Dd, Dm.

## Operation

VZIP interleaves the elements of two vectors.

|    | Register state before operation |                |                |                |                |                |                |                | Register state after operation |                |                |                |                |                |                |                |
|----|---------------------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|--------------------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| Dd | A <sub>7</sub>                  | A <sub>6</sub> | A <sub>5</sub> | A <sub>4</sub> | A <sub>3</sub> | A <sub>2</sub> | A <sub>1</sub> | A <sub>0</sub> | B <sub>3</sub>                 | A <sub>3</sub> | B <sub>2</sub> | A <sub>2</sub> | B <sub>1</sub> | A <sub>1</sub> | B <sub>0</sub> | A <sub>0</sub> |
| Dm | B <sub>7</sub>                  | B <sub>6</sub> | B <sub>5</sub> | B <sub>4</sub> | B <sub>3</sub> | B <sub>2</sub> | B <sub>1</sub> | B <sub>0</sub> | B <sub>7</sub>                 | A <sub>7</sub> | B <sub>6</sub> | A <sub>6</sub> | B <sub>5</sub> | A <sub>5</sub> | B <sub>4</sub> | A <sub>4</sub> |

|    | Register state before operation |                |                |                |  | Register state after operation |                |                |                |
|----|---------------------------------|----------------|----------------|----------------|--|--------------------------------|----------------|----------------|----------------|
| Qd | A <sub>3</sub>                  | A <sub>2</sub> | A <sub>1</sub> | A <sub>0</sub> |  | B <sub>1</sub>                 | A <sub>1</sub> | B <sub>0</sub> | A <sub>0</sub> |
| Qm | B <sub>3</sub>                  | B <sub>2</sub> | B <sub>1</sub> | B <sub>0</sub> |  | B <sub>3</sub>                 | A <sub>3</sub> | B <sub>2</sub> | A <sub>2</sub> |

## Related information

[Interleaving provided by load and store element and structure instructions](#) on page 508

[VTRN \(A32\)](#) on page 622

[Condition code suffixes](#) on page 138

# 16. Floating-point Instructions (32-bit)

Describes floating-point assembly language instructions.

## 16.1 Summary of floating-point instructions

A summary of the floating-point instructions. Not all of these instructions are available in all floating-point versions.

The following table shows a summary of floating-point instructions that are not available in Advanced SIMD.



Floating-point vector mode is not supported in Arm®v8. Use Advanced SIMD instructions for vector floating-point.

**Table 16-1: Summary of floating-point instructions**

| Mnemonic       | Brief description                                                              |
|----------------|--------------------------------------------------------------------------------|
| VABS           | Absolute value                                                                 |
| VADD           | Add                                                                            |
| VCMP, VCMPE    | Compare                                                                        |
| VCVT           | Convert between single-precision and double-precision                          |
| -              | Convert between floating-point and integer                                     |
| -              | Convert between floating-point and fixed-point                                 |
| -              | Convert floating-point to integer with directed rounding modes                 |
| VCVTB, VCVTT   | Convert between half-precision and single-precision floating-point             |
| -              | Convert between half-precision and double-precision                            |
| VDIV           | Divide                                                                         |
| VFMA, VFMS     | Fused multiply accumulate, Fused multiply subtract                             |
| VFNMA, VFNMS   | Fused multiply accumulate with negation, Fused multiply subtract with negation |
| VJCVT          | Javascript Convert to signed fixed-point, rounding toward Zero                 |
| VLDM           | Extension register load multiple                                               |
| VLDR           | Extension register load                                                        |
| VLLDM          | Floating-point Lazy Load Multiple                                              |
| VLSTM          | Floating-point Lazy Store Multiple                                             |
| VMAXNM, VMINNM | Maximum, Minimum, consistent with IEEE 754-2008                                |
| VMLA           | Multiply accumulate                                                            |
| VMLS           | Multiply subtract                                                              |

| Mnemonic | Brief description                                                                                                                                    |
|----------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| VMOV     | Insert floating-point immediate in single-precision or double-precision register, or copy one FP register into another FP register of the same width |
| VMRS     | Transfer contents from a floating-point system register to a general-purpose register                                                                |
| VMSR     | Transfer contents from a general-purpose register to a floating-point system register                                                                |
| VMUL     | Multiply                                                                                                                                             |
| VNEG     | Negate                                                                                                                                               |
| VNMLA    | Negated multiply accumulate                                                                                                                          |
| VNMLS    | Negated multiply subtract                                                                                                                            |
| VNMUL    | Negated multiply                                                                                                                                     |
| VPOP     | Extension register load multiple                                                                                                                     |
| VPUSH    | Extension register store multiple                                                                                                                    |
| VRINT    | Round to integer                                                                                                                                     |
| VSEL     | Select                                                                                                                                               |
| VSQRT    | Square Root                                                                                                                                          |
| VSTM     | Extension register store multiple                                                                                                                    |
| VSTR     | Extension register store                                                                                                                             |
| VSUB     | Subtract                                                                                                                                             |

## 16.2 VABS (floating-point) (A32)

Floating-point absolute value.

### Syntax

VABS {cond}.F32 *Sd*, *Sm*

VABS {cond}.F64 *Dd*, *Dm*

where:

**cond**

is an optional condition code.

**Sd, Sm**

are the single-precision registers for the result and operand.

**Dd, Dm**

are the double-precision registers for the result and operand.

### Operation

The **VABS** instruction takes the contents of *Sm* or *Dm*, clears the sign bit, and places the result in *Sd* or *Dd*. This gives the absolute value.

If the operand is a NaN, the sign bit is cleared, but no exception is produced.

### Floating-point exceptions

`VABS` instructions do not produce any exceptions.

### Related information

[Condition code suffixes](#) on page 138

## 16.3 VADD (floating-point) (A32)

Floating-point add.

### Syntax

`VADD{cond}.F32 {Sd}, Sn, Sm`

`VADD{cond}.F64 {Dd}, Dn, Dm`

where:

***cond***

is an optional condition code.

***Sd, Sn, Sm***

are the single-precision registers for the result and operands.

***Dd, Dn, Dm***

are the double-precision registers for the result and operands.

### Operation

The `VADD` instruction adds the values in the operand registers and places the result in the destination register.

### Floating-point exceptions

The `VADD` instruction can produce Invalid Operation, Overflow, or Inexact exceptions.

### Related information

[Condition code suffixes](#) on page 138

## 16.4 VCMP, VCMPE (A32)

Floating-point compare.

### Syntax

`VCMP {E}{cond}.F32 Sd, Sm`

`VCMP {E} {cond}.F32 Sd, #0`

`VCMP {E} {cond}.F64 Dd, Dm`

`VCMP {E} {cond}.F64 Dd, #0`

where:

**E**

if present, indicates that the instruction raises an Invalid Operation exception if either operand is a quiet or signaling NaN. Otherwise, it raises the exception only if either operand is a signaling NaN.

**cond**

is an optional condition code.

**Sd, Sm**

are the single-precision registers holding the operands.

**Dd, Dm**

are the double-precision registers holding the operands.

## Operation

The `VCMP{E}` instruction subtracts the value in the second operand register (or 0 if the second operand is #0) from the value in the first operand register, and sets the VFP condition flags based on the result.

## Floating-point exceptions

`VCMP{E}` instructions can produce Invalid Operation exceptions.

## Related information

[Condition code suffixes](#) on page 138

## 16.5 VCVT (between single-precision and double-precision) (A32)

Convert between single-precision and double-precision numbers.

## Syntax

`VCVT{cond}.F64.F32 Dd, Sm`

`VCVT{cond}.F32.F64 Sd, Dm`

where:

**cond**

is an optional condition code.

**Dd**

is a double-precision register for the result.

**Sm**

is a single-precision register holding the operand.

**Sd**

is a single-precision register for the result.

**Dm**

is a double-precision register holding the operand.

## Operation

These instructions convert the single-precision value in *Sm* to double-precision, placing the result in *Dd*, or the double-precision value in *Dm* to single-precision, placing the result in *Sd*.

### Floating-point exceptions

These instructions can produce Invalid Operation, Input Denormal, Overflow, Underflow, or Inexact exceptions.

### Related information

[Condition code suffixes](#) on page 138

## 16.6 VCVT (between floating-point and integer) (A32)

Convert between floating-point numbers and integers.

### Syntax

`VCVT{R}{cond}.type.F64 Sd, Dm`

`VCVT{R}{cond}.type.F32Sd, Sm`

`VCVT{cond}.F64.type Dd, Sm`

`VCVT{cond}.F32.type Sd, Sm`

where:

**R**

makes the operation use the rounding mode specified by the FPSCR. Otherwise, the operation rounds towards zero.

**cond**

is an optional condition code.

**type**

can be either `U32` (unsigned 32-bit integer) or `S32` (signed 32-bit integer).

***Sd***

is a single-precision register for the result.

***Dd***

is a double-precision register for the result.

***Sm***

is a single-precision register holding the operand.

***Dm***

is a double-precision register holding the operand.

## Operation

The first two forms of this instruction convert from floating-point to integer.

The third and fourth forms convert from integer to floating-point.

### Floating-point exceptions

These instructions can produce Input Denormal, Invalid Operation, or Inexact exceptions.

### Related information

[Condition code suffixes](#) on page 138

## 16.7 VCVT (from floating-point to integer with directed rounding modes) (A32 FP)

Convert from floating-point to signed or unsigned integer with directed rounding modes.



This instruction is supported only in Arm®v8.

Note

### Syntax

`VCVTmode.S32.F64 Sd, Dm`

`VCVTmode.S32.F32 Sd, Sm`

`VCVTmode.U32.F64 Sd, Dm`

`VCVTmode.U32.F32 Sd, Sm`

where:

***mode***

must be one of:

**A**

meaning round to nearest, ties away from zero

**N**

meaning round to nearest, ties to even

**P**

meaning round towards plus infinity

**M**

meaning round towards minus infinity.

**Sd, Sm**

specifies the single-precision registers for the operand and result.

**Sd, Dm**

specifies a single-precision register for the result and double-precision register holding the operand.

**Notes**

You cannot use vcvrt with a directed rounding mode inside an IT block.

**Floating-point exceptions**

These instructions can produce Input Denormal, Invalid Operation, or Inexact exceptions.

## 16.8 VCVT (between floating-point and fixed-point) (A32)

Convert between floating-point and fixed-point numbers.

**Syntax**

`VCVT{cond}.type.F64 Dd, Dd, #fbits`

`VCVT{cond}.type.F32 Sd, Sd, #fbits`

`VCVT{cond}.F64.type Dd, Dd, #fbits`

`VCVT{cond}.F32.type Sd, Sd, #fbits`

where:

**cond**

is an optional condition code.

**type**

can be any one of:

**S16**

16-bit signed fixed-point number.

***U16***

16-bit unsigned fixed-point number.

***S32***

32-bit signed fixed-point number.

***U32***

32-bit unsigned fixed-point number.

***Sd***

is a single-precision register for the operand and result.

***Dd***

is a double-precision register for the operand and result.

***Fbits***

is the number of fraction bits in the fixed-point number, in the range 0-16 if *type* is *s16* or *u16*, or in the range 1-32 if *type* is *s32* or *u32*.

## Operation

The first two forms of this instruction convert from floating-point to fixed-point.

The third and fourth forms convert from fixed-point to floating-point.

In all cases the fixed-point number is contained in the least significant 16 or 32 bits of the register.

## Floating-point exceptions

These instructions can produce Input Denormal, Invalid Operation, or Inexact exceptions.

## Related information

[Condition code suffixes](#) on page 138

## 16.9 VCVTB, VCVTT (half-precision extension) (A32)

Convert between half-precision and single-precision floating-point numbers.

### Syntax

`VCVTB{cond}.typeSd,Sm`

`VCVTT{cond}.typeSd,Sm`

where:

***Cond***

is an optional condition code.

***Type***

can be any one of:

**F32.F16**

Convert from half-precision to single-precision.

**F16.F32**

Convert from single-precision to half-precision.

***Sd***

is a single word register for the result.

***Sm***

is a single word register for the operand.

**Operation**

`vcvTB` uses the bottom half (bits[15:0]) of the single word register to obtain or store the half-precision value

`vcvTT` uses the top half (bits[31:16]) of the single word register to obtain or store the half-precision value.

**Architectures**

The instructions are only available in VFPv3 systems with the half-precision extension, and VFPv4.

**Floating-point exceptions**

These instructions can produce Input Denormal, Invalid Operation, Overflow, Underflow, or Inexact exceptions.

**Related information**

[Condition code suffixes](#) on page 138

## 16.10 VCVTB, VCVTT (between half-precision and double-precision) (A32 FP)

These instructions convert between half-precision and double-precision floating-point numbers.

The conversion can be done in either of the following ways:

- From half-precision floating-point to double-precision floating-point (`F64.F16`).
- From double-precision floating-point to half-precision floating-point (`F16.F64`).

`vcvTB` uses the bottom half (bits[15:0]) of the single word register to obtain or store the half-precision value.

`vcvTT` uses the top half (bits[31:16]) of the single word register to obtain or store the half-precision value.



These instructions are supported only in Arm®v8.

**Note**

## Syntax

`VCVTB{cond}.F64.F16 Dd, Sm`

`VCVTB{cond}.F16.F64 Sd, Dm`

`VCVTT{cond}.F64.F16 Dd, Sm`

`VCVTT{cond}.F16.F64 Sd, Dm`

where:

**cond**

is an optional condition code.

**Dd**

is a double-precision register for the result.

**Sm**

is a single word register holding the operand.

**Sd**

is a single word register for the result.

**Dm**

is a double-precision register holding the operand.

## Usage

These instructions convert the half-precision value in *Sm* to double-precision and place the result in *Dd*, or the double-precision value in *Dm* to half-precision and place the result in *Sd*.

## Floating-point exceptions

These instructions can produce Input Denormal, Invalid Operation, Overflow, Underflow, or Inexact exceptions.

## 16.11 VDIV (A32)

Floating-point divide.

## Syntax

`VDIV{cond}.F32 {Sd}, Sn, Sm`

`VDIV{cond}.F64 {Dd}, Dn, Dm`

where:

***cond***

is an optional condition code.

***Sd, Sn, Sm***

are the single-precision registers for the result and operands.

***Dd, Dn, Dm***

are the double-precision registers for the result and operands.

## Operation

The **vdiv** instruction divides the value in the first operand register by the value in the second operand register, and places the result in the destination register.

### Floating-point exceptions

**vdiv** operations can produce Division by Zero, Invalid Operation, Overflow, Underflow, or Inexact exceptions.

### Related information

[Condition code suffixes](#) on page 138

## 16.12 VFMA, VFMS, VFNMA, VFNMS (floating-point) (A32)

Fused floating-point multiply accumulate and fused floating-point multiply subtract, with optional negation.

### Syntax

**VF {N} op{cond} .F64 {Dd}, Dn, Dm**

**VF {N} op{cond} .F32 {Sd}, Sn, Sm**

where:

***op***

is one of **MA** or **MS**.

***N***

negates the final result.

***cond***

is an optional condition code.

***Sd, Sn, Sm***

are the single-precision registers for the result and operands.

***Dd, Dn, Dm***

are the double-precision registers for the result and operands.

## Operation

**VFMAs** multiplies the values in the operand registers, adds the value in the destination register, and places the final result in the destination register. The result of the multiply is not rounded before the accumulation.

**VFMS** multiplies the values in the operand registers, subtracts the product from the value in the destination register, and places the final result in the destination register. The result of the multiply is not rounded before the subtraction.

In each case, the final result is negated if the **N** option is used.

## Floating-point exceptions

These instructions can produce Input Denormal, Invalid Operation, Overflow, Underflow, or Inexact exceptions.

## Related information

[VMUL \(floating-point\) \(A32\)](#) on page 652

[Condition code suffixes](#) on page 138

## 16.13 VJCVT (A32)

Javascript Convert to signed fixed-point, rounding toward Zero.

### Syntax

**VJCVT{q}.S32.F64 Sd, Dm ; A1 FP/SIMD registers (A32)**

**VJCVT{q}.S32.F64 Sd, Dm ; T1 FP/SIMD registers (T32)**

Where:

**q**

Is an optional instruction width specifier. See [Instruction width specifiers](#).

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Dm**

Is the 64-bit name of the SIMD and FP source register.

### Architectures supported

Supported in the Arm®v8.3-A architecture and later.

### Usage

Javascript Convert to signed fixed-point, rounding toward Zero. This instruction converts the double-precision floating-point value in the SIMD and FP source register to a 32-bit signed integer using the Round towards Zero rounding mode, and write the result to the general-purpose

destination register. If the result is too large to be held as a 32-bit signed integer, then the result is the integer modulo  $2^{32}$ , as held in a 32-bit signed integer.

Depending on settings in the CPACR, NSACR, HCPTR, and FPEXC registers, and the security state and mode in which the instruction is executed, an attempt to execute the instruction might be **UNDEFINED**, or trapped to Hyp mode. For more information see *Enabling Advanced SIMD and floating-point support in the Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile*.

## Related information

[Summary of floating-point instructions](#) on page 627

## 16.14 VLDM (floating-point) (A32)

Extension register load multiple.

### Syntax

`VLDMmode {cond} Rn{!}, Registers`

where:

**mode**

must be one of:

**IA**

meaning Increment address After each transfer. **IA** is the default, and can be omitted.

**DB**

meaning Decrement address Before each transfer.

**EA**

meaning Empty Ascending stack operation. This is the same as **DB** for loads.

**FD**

meaning Full Descending stack operation. This is the same as **IA** for loads.

**cond**

is an optional condition code.

**Rn**

is the general-purpose register holding the base address for the transfer.

**!**

is optional. **!** specifies that the updated base address must be written back to **Rn**. If **!** is not specified, **mode** must be **IA**.

**Registers**

is a list of consecutive extension registers enclosed in braces, { and }. The list can be comma-separated, or in range format. There must be at least one register in the list.

You can specify s or d registers, but they must not be mixed. The number of registers must not exceed 16 d registers.



`VPOP Registers` is equivalent to `VLDM sp!, Registers`.

You can use either form of this instruction. They both disassemble to `VPOP`.

## Related information

[Stack implementation using LDM and STM](#) on page 119

[Condition code suffixes](#) on page 138

## 16.15 VLDR (floating-point) (A32)

Extension register load.

### Syntax

`VLDR{cond}{.size} Fd, [Rn{, #offset}]`

`VLDR{cond}{.size} Fd, label`

where:

**cond**

is an optional condition code.

**size**

is an optional data size specifier. Must be 32 if `Fd` is an s register, or 64 otherwise.

**Fd**

is the extension register to be loaded, and can be either a d or s register.

**Rn**

is the general-purpose register holding the base address for the transfer.

**offset**

is an optional numeric expression. It must evaluate to a numeric value at assembly time. The value must be a multiple of 4, and lie in the range -1020 to +1020. The value is added to the base address to form the address used for the transfer.

**label**

is a PC-relative expression.

`label` must be aligned on a word boundary within ±1KB of the current instruction.

### Operation

The `VLDR` instruction loads an extension register from memory.

One word is transferred if  $Fd$  is an s register. Two words are transferred otherwise.

There is also a `VLDR` pseudo-instruction.

### Related information

[VLDR pseudo-instruction \(floating-point\) \(A32\)](#) on page 642

[Condition code suffixes](#) on page 138

[Register-relative and PC-relative expressions](#) on page 222

## 16.16 VLDR (post-increment and pre-decrement, floating-point) (A32)

Pseudo-instruction that loads extension registers, with post-increment and pre-decrement forms.



There are also `VLDR` and `VSTR` instructions without post-increment and pre-decrement.

Note

### Syntax

```
VLDR{cond}{.size} Fd, [Rn], #offset ; post-increment
```

```
VLDR{cond}{.size} Fd, [Rn, #-offset]! ; pre-decrement
```

where:

**cond**

is an optional condition code.

**size**

is an optional data size specifier. Must be 32 if  $Fd$  is an s register, or 64 if  $Fd$  is a d register.

**Fd**

is the extension register to load. It can be either a double precision ( $Dd$ ) or a single precision ( $sd$ ) register.

**Rn**

is the general-purpose register holding the base address for the transfer.

**offset**

is a numeric expression that must evaluate to a numeric value at assembly time. The value must be 4 if  $Fd$  is an s register, or 8 if  $Fd$  is a d register.

### Operation

The post-increment instruction increments the base address in the register by the offset value, after the transfer. The pre-decrement instruction decrements the base address in the register by

the offset value, and then performs the transfer using the new address in the register. This pseudo-instruction assembles to a `VLDM` instruction.

### Related information

[VLDM \(floating-point\) \(A32\)](#) on page 639  
[VLDR \(floating-point\) \(A32\)](#) on page 640  
[Condition code suffixes](#) on page 138

## 16.17 VLDR pseudo-instruction (floating-point) (A32)

The `VLDR` pseudo-instruction loads a constant value into a floating-point single-precision or double-precision register.



This description is for the `VLDR` pseudo-instruction only.

Note

### Syntax

`VLDR{cond}.F64Dd,=constant`

`VLDR{cond}.F32Sd,=constant`

where:

**`cond`**

is an optional condition code.

**{`Dd` or `Sd`}**

is the extension register to be loaded.

**`constant`**

is an immediate value of the appropriate type for the extension register width.

### Usage

If an instruction (for example, `vmov`) is available that can generate the constant directly into the register, the assembler uses it. Otherwise, it generates a doubleword literal pool entry containing the constant and loads the constant using a `VLDR` instruction.

### Related information

[VLDR \(floating-point\) \(A32\)](#) on page 640  
[Condition code suffixes](#) on page 138  
[Floating-point data types in A32/T32 instructions](#) on page 182

## 16.18 VLLDM (A32)

Floating-point Lazy Load Multiple.

### Syntax

`VLLDM{c}{q} Rn`

Where:

**c**

Is an optional condition code. See [Condition Codes](#).

**q**

Is an optional instruction width specifier. See [Instruction width specifiers](#).

**Rn**

Is the general-purpose base register.

### Architectures supported

Supported in Arm®v8-M Main extension only.

### Usage

Floating-point Lazy Load Multiple restores the contents of the Secure floating-point registers that were protected by a `VLSTM` instruction, and marks the floating-point context as active.

If the lazy state preservation set up by a previous `VLSTM` instruction is active (`FPCCR.LSPACT == 1`), this instruction deactivates lazy state preservation and enables access to the Secure floating-point registers.

If lazy state preservation is inactive (`FPCCR.LSPACT == 0`), either because lazy state preservation was not enabled (`FPCCR.LSPEN == 0`) or because a floating-point instruction caused the Secure floating-point register contents to be stored to memory, this instruction loads the stored Secure floating-point register contents back into the floating-point registers.

If Secure floating-point is not in use (`CONTROL_S.SFPA == 0`), this instruction behaves as a NOP.

This instruction is only available in Secure state, and is **UNDEFINED** in Non-secure state.

If the Floating-point Extension is not implemented, this instruction is available in Secure state, but behaves as a NOP.

### Related information

[Summary of floating-point instructions](#) on page 627

## 16.19 VLSTM (A32)

Floating-point Lazy Store Multiple.

### Syntax

`VLSTM{c}{q} Rn`

Where:

**c**

Is an optional condition code. See [Condition Codes](#).

**q**

Is an optional instruction width specifier. See [Instruction width specifiers](#).

**Rn**

Is the general-purpose base register.

### Architectures supported

Supported in Arm®v8-M Main extension only.

### Usage

Floating-point Lazy Store Multiple stores the contents of Secure floating-point registers to a prepared stack frame, and clears the Secure floating-point registers.

If floating-point lazy preservation is enabled (FPCCR.LSPEN == 1), then the next time a floating-point instruction other than `VLSTM` or `VLIDM` is executed:

- The contents of Secure floating-point registers are stored to memory.
- The Secure floating-point registers are cleared.

If Secure floating-point is not in use (CONTROL\_S.SFPA == 0), this instruction behaves as a `NOP`.

This instruction is only available in Secure state, and is **UNDEFINED** in Non-secure state.

If the Floating-point extension is not implemented, this instruction is available in Secure state, but behaves as a NOP.

### Related information

[Summary of floating-point instructions](#) on page 627

## 16.20 VMAXNM, VMINNM (floating-point) (A32)

Vector Minimum, Vector Maximum.



These instructions are supported only in Arm®v8.

**Note**

### Syntax

`V op .F32 Sd, Sn, Sm`

`V op .F64 Dd, Dn, Dm`

where:

**op**

must be either MAXNM or MINNM.

**Sd, Sn, Sm**

are the single-precision destination register, first operand register, and second operand register.

**Dd, Dn, Dm**

are the double-precision destination register, first operand register, and second operand register.

### Operation

VMAXNM compares the values in the operand registers, and copies the larger value into the destination operand register.

VMINNM compares the values in the operand registers, and copies the smaller value into the destination operand register.

If one of the values being compared is a number and the other value is NaN, the number is copied into the destination operand register. This is consistent with the IEEE 754-2008 standard.

### Notes

You cannot use VMAXNM or VMINNM inside an IT block.

### Floating-point exceptions

These instructions can produce Input Denormal, Invalid Operation, Overflow, Underflow, or Inexact exceptions.

## 16.21 VMLA (floating-point) (A32)

Floating-point multiply accumulate.

### Syntax

`VMLA{cond}.F32 Sd, Sn, Sm`

`VMLA{cond}.F64 Dd, Dn, Dm`

where:

***cond***

is an optional condition code.

***Sd, Sn, Sm***

are the single-precision registers for the result and operands.

***Dd, Dn, Dm***

are the double-precision registers for the result and operands.

### Operation

The `VMLA` instruction multiplies the values in the operand registers, adds the value in the destination register, and places the final result in the destination register.

### Floating-point exceptions

This instruction can produce Invalid Operation, Overflow, Underflow, Inexact, or Input Denormal exceptions.

### Related information

[Condition code suffixes](#) on page 138

## 16.22 VMLS (floating-point) (A32)

Floating-point multiply subtract.

### Syntax

`VMLS{cond}.F32 Sd, Sn, Sm`

`VMLS{cond}.F64 Dd, Dn, Dm`

where:

***cond***

is an optional condition code.

***Sd, Sn, Sm***

are the single-precision registers for the result and operands.

**Dd, Dn, Dm**

are the double-precision registers for the result and operands.

## Operation

The `VMLS` instruction multiplies the values in the operand registers, subtracts the result from the value in the destination register, and places the final result in the destination register.

## Floating-point exceptions

This instruction can produce Invalid Operation, Overflow, Underflow, Inexact, or Input Denormal exceptions.

## Related information

[Condition code suffixes](#) on page 138

## 16.23 VMOV (floating-point) (A32)

Insert a floating-point immediate value into a single-precision or double-precision register, or copy one register into another register. This instruction is always scalar.

### Syntax

`VMOV{cond}.F32 Sd, #imm`

`VMOV{cond}.F64 Dd, #imm`

`VMOV{cond}.F32 Sd, Sm`

`VMOV{cond}.F64 Dd, Dm`

where:

**cond**

is an optional condition code.

**Sd**

is the single-precision destination register.

**Dd**

is the double-precision destination register.

**imm**

is the floating-point immediate value.

**Sm**

is the single-precision source register.

**Dm**

is the double-precision source register.

## Immediate values

Any number that can be expressed as  $\pm n * 2^{-r}$ , where  $n$  and  $r$  are integers,  $16 \leq n \leq 31$ ,  $0 \leq r \leq 7$ .

## Architectures

The instructions that copy immediate constants are available in VFPv3 and above.

The instructions that copy from registers are available in all VFP systems.

## Related information

[Condition code suffixes](#) on page 138

## 16.24 VMOV (between one general-purpose register and single precision floating-point register) (A32)

Transfer contents between a single-precision floating-point register and a general-purpose register.

### Syntax

`VMOV{cond} Rd, Sn`

`VMOV{cond} Sn, Rd`

where:

**cond**

is an optional condition code.

**Sn**

is the floating-point single-precision register.

**Rd**

is the general-purpose register. *Rd* must not be PC.

### Operation

`VMOV Rd, Sn` transfers the contents of *Sn* into *Rd*.

`VMOV Sn, Rd` transfers the contents of *Rd* into *Sn*.

## Related information

[Condition code suffixes](#) on page 138

## 16.25 VMOV (between two general-purpose registers and one or two extension registers) (A32)

Transfer contents between two general-purpose registers and either one 64-bit register or two consecutive 32-bit registers.

### Syntax

`VMOV{cond} Dm, Rd, Rn`

`VMOV{cond} Rd, Rn, Dm`

`VMOV{cond} Sm, Sm1, Rd, Rn`

`VMOV{cond} Rd, Rn, Sm, Sm1`

where:

***cond***

is an optional condition code.

***Dm***

is a 64-bit extension register.

***Sm***

is a VFP 32-bit register.

***Sm1***

is the next consecutive VFP 32-bit register after *Sm*.

**{*Rd*, *Rn*}**

are the general-purpose registers. *Rd* and *Rn* must not be PC.

### Operation

`VMOV Dm, Rd, Rn` transfers the contents of *Rd* into the low half of *Dm*, and the contents of *Rn* into the high half of *Dm*.

`VMOV Rd, Rn, Dm` transfers the contents of the low half of *Dm* into *Rd*, and the contents of the high half of *Dm* into *Rn*.

`VMOV Rd, Rn, Sm, Sm1` transfers the contents of *Sm* into *Rd*, and the contents of *Sm1* into *Rn*.

`VMOV Sm, Sm1, Rd, Rn` transfers the contents of *Rd* into *Sm*, and the contents of *Rn* into *Sm1*.

### Architectures

The instructions are available in VFPv2 and above.

### Related information

[Condition code suffixes](#) on page 138

## 16.26 VMOV (between a general-purpose register and half a double precision floating-point register) (A32)

Transfer contents between a general-purpose register and half a double precision floating-point register.

### Syntax

`VMOV{cond}{.size} Dn[x], Rd`

`VMOV{cond}{.size} Rd, Dn[x]`

where:

***cond***

is an optional condition code.

***size***

the data size. Must be either 32 or omitted. If omitted, *size* is 32.

***Dn[x]***

is the upper or lower half of a double precision floating-point register.

***Rd***

is the general-purpose register. *Rd* must not be PC.

### Operation

`VMOV Dn[x], Rd` transfers the contents of *Rd* into *Dn[x]*.

`VMOV Rd, Dn[x]` transfers the contents of *Dn[x]* into *Rd*.

### Related information

[Condition code suffixes](#) on page 138

[Floating-point data types in A32/T32 instructions](#) on page 182

## 16.27 VMRS (floating-point) (A32)

Transfer contents from an floating-point system register to a general-purpose register.

### Syntax

`VMRS {cond} Rd, extsysreg`

where:

***cond***

is an optional condition code.

***extsysreg***

is the floating-point system register, usually `FPSCR`, `FPSID`, or `FPEXC`.

***Rd***

is the general-purpose register. *Rd* must not be PC.

It can be `APSR_nzcv`, if *extsysreg* is `FPSCR`. In this case, the floating-point status flags are transferred into the corresponding flags in the special-purpose APSR.

**Usage**

The `VMRS` instruction transfers the contents of *extsysreg* into *Rd*.



The instruction stalls the processor until all current floating-point operations complete.

**Examples**

```
VMRS      r2, FPCID
VMRS      APSR_nzcv, FPSCR      ; transfer FP status register to the
                                ; special-purpose APSR
```

**Related information**

[Condition code suffixes](#) on page 138

## 16.28 VMSR (floating-point) (A32)

Transfer contents of a general-purpose register to a floating-point system register.

**Syntax**

```
VMSR{cond} extsysreg, Rd
```

where:

***cond***

is an optional condition code.

***extsysreg***

is the floating-point system register, usually `FPSCR`, `FPSID`, or `FPEXC`.

***Rd***

is the general-purpose register. *Rd* must not be PC.

It can be `APSR_nzcv`, if *extsysreg* is `FPSCR`. In this case, the floating-point status flags are transferred into the corresponding flags in the special-purpose APSR.

## Usage

The `VMSR` instruction transfers the contents of `Rd` into `extsysreg`.



The instruction stalls the processor until all current floating-point operations complete.

**Note**

## Example

```
VMSR    FPSCR, r4
```

## Related information

[Condition code suffixes](#) on page 138

## 16.29 VMUL (floating-point) (A32)

Floating-point multiply.

### Syntax

`VMUL{cond}.F32 {Sd,} Sn, Sm`

`VMUL{cond}.F64 {Dd,} Dn, Dm`

where:

**cond**

is an optional condition code.

**Sd, Sn, Sm**

are the single-precision registers for the result and operands.

**Dd, Dn, Dm**

are the double-precision registers for the result and operands.

### Operation

The `VMUL` operation multiplies the values in the operand registers and places the result in the destination register.

### Floating-point exceptions

This instruction can produce Invalid Operation, Overflow, Underflow, Inexact, or Input Denormal exceptions.

## Related information

[Condition code suffixes](#) on page 138

## 16.30 VNEG (floating-point) (A32)

Floating-point negate.

### Syntax

`VNEG{cond}.F32Sd, Sm`

`VNEG{cond}.F64Dd, Dm`

where:

***cond***

is an optional condition code.

***Sd, Sm***

are the single-precision registers for the result and operand.

***Dd, Dm***

are the double-precision registers for the result and operand.

### Operation

The `VNEG` instruction takes the contents of `Sm` or `Dm`, changes the sign bit, and places the result in `Sd` or `Dd`. This gives the negation of the value.

If the operand is a NaN, the sign bit is changed, but no exception is produced.

### Floating-point exceptions

`VNEG` instructions do not produce any exceptions.

### Related information

[Condition code suffixes](#) on page 138

## 16.31 VNMLA (floating-point) (A32)

Floating-point multiply accumulate with negation.

### Syntax

`VNMLA{cond}.F32Sd, Sn, Sm`

`VNMLA{cond}.F64Dd, Dn, Dm`

where:

***cond***

is an optional condition code.

***Sd, Sn, Sm***

are the single-precision registers for the result and operands.

***Dd, Dn, Dm***

are the double-precision registers for the result and operands.

**Operation**

The `VNMLA` instruction multiplies the values in the operand registers, adds the value to the destination register, and places the negated final result in the destination register.

**Floating-point exceptions**

This instruction can produce Invalid Operation, Overflow, Underflow, Inexact, or Input Denormal exceptions.

**Related information**

[Condition code suffixes](#) on page 138

## 16.32 VNMLS (floating-point) (A32)

Floating-point multiply subtract with negation.

**Syntax**

`VNMLS{cond}.F32 Sd, Sn, Sm`

`VNMLS{cond}.F64 Dd, Dn, Dm`

where:

***cond***

is an optional condition code.

***Sd, Sn, Sm***

are the single-precision registers for the result and operands.

***Dd, Dn, Dm***

are the double-precision registers for the result and operands.

**Operation**

The `VNMLS` instruction multiplies the values in the operand registers, subtracts the result from the value in the destination register, and places the negated final result in the destination register.

**Floating-point exceptions**

This instruction can produce Invalid Operation, Overflow, Underflow, Inexact, or Input Denormal exceptions.

**Related information**

[Condition code suffixes](#) on page 138

## 16.33 VNMUL (floating-point) (A32)

Floating-point multiply with negation.

### Syntax

`VNMUL{cond}.F32 {Sd,} Sn, Sm`

`VNMUL{cond}.F64 {Dd,} Dn, Dm`

where:

**cond**

is an optional condition code.

**Sd, Sn, Sm**

are the single-precision registers for the result and operands.

**Dd, Dn, Dm**

are the double-precision registers for the result and operands.

### Operation

The `VNMUL` instruction multiplies the values in the operand registers and places the negated result in the destination register.

### Floating-point exceptions

This instruction can produce Invalid Operation, Overflow, Underflow, Inexact, or Input Denormal exceptions.

### Related information

[Condition code suffixes](#) on page 138

## 16.34 VPOP (floating-point) (A32)

Pop extension registers from the stack.

### Syntax

`VPOP{cond} Registers`

where:

**cond**

is an optional condition code.

**Registers**

is a list of consecutive extension registers enclosed in braces, { and }. The list can be comma-separated, or in range format. There must be at least one register in the list.

You can specify s or d registers, but they must not be mixed. The number of registers must not exceed 16 d registers.



`VPOP Registers` is equivalent to `VLDM sp!, Registers`.

You can use either form of this instruction. They both disassemble to `vpop`.

## Related information

[Stack implementation using LDM and STM](#) on page 119

[Condition code suffixes](#) on page 138

[VPUSH \(floating-point\) \(A32\)](#) on page 656

## 16.35 VPUSH (floating-point) (A32)

Push extension registers onto the stack.

### Syntax

`VPUSH{cond} Registers`

where:

**cond**

is an optional condition code.

**Registers**

is a list of consecutive extension registers enclosed in braces, { and }. The list can be comma-separated, or in range format. There must be at least one register in the list.

You can specify s or d registers, but they must not be mixed. The number of registers must not exceed 16 d registers.



`VPUSH Registers` is equivalent to `VSTMDB sp!, Registers`.

You can use either form of this instruction. They both disassemble to `vpush`.

## Related information

[Stack implementation using LDM and STM](#) on page 119

[Condition code suffixes](#) on page 138

[VPOP \(floating-point\) \(A32\)](#) on page 655

## 16.36 VRINT (floating-point) (A32)

Rounds a floating-point number to integer and places the result in the destination register. The resulting integer is represented in floating-point format.



This instruction is supported only in Arm®v8.

**Note**

### Syntax

`VRINT mode {cond}.F64.F64 Dd, Dm`

`VRINT mode {cond}.F32.F32 Sd, Sm`

where:

**mode**

must be one of:

**Z**

meaning round towards zero.

**R**

meaning use the rounding mode specified in the FPSCR.

**X**

meaning use the rounding mode specified in the FPSCR, generating an Inexact exception if the result is not exact.

**A**

meaning round to nearest, ties away from zero.

**N**

meaning round to nearest, ties to even.

**P**

meaning round towards plus infinity.

**M**

meaning round towards minus infinity.

**cond**

is an optional condition code. This can only be used when `mode` is Z, R or X.

**Sd, Sm**

specifies the destination and operand registers, for a word operation.

**Dd, Dm**

specifies the destination and operand registers, for a doubleword operation.

## Notes

You cannot use `VRINT` with a rounding mode of `A`, `N`, `P` or `M` inside an IT block.

## Floating-point exceptions

These instructions cannot produce any exceptions, except `VRINTX` which can generate an Inexact exception.

## Related information

[Condition code suffixes](#) on page 138

## 16.37 VSEL (A32)

Floating-point select.



This instruction is supported only in Arm®v8.

Note

### Syntax

`VSEL cond.F32 Sd, Sn, Sm`

`VSEL cond.F64 Dd, Dn, Dm`

where:

***cond***

must be one of `GE`, `GT`, `EQ`, `VS`.

***Sd, Sn, Sm***

are the single-precision registers for the result and operands.

***Dd, Dn, Dm***

are the double-precision registers for the result and operands.

### Usage

The `VSEL` instruction compares the values in the operand registers. If the condition is true, it copies the value in the first operand register into the destination operand register. Otherwise, it copies the value in the second operand register.

You cannot use `VSEL` inside an IT block.

## Floating-point exceptions

`VSEL` instructions cannot produce any exceptions.

## Related information

[Comparison of condition code meanings in integer and floating-point code](#) on page 140  
[Condition code suffixes](#) on page 138

## 16.38 VSQRT (A32)

Floating-point square root.

### Syntax

`VSQRT{cond} .F32 Sd, Sm`

`VSQRT{cond} .F64 Dd, Dm`

where:

**cond**

is an optional condition code.

**Sd, Sm**

are the single-precision registers for the result and operand.

**Dd, Dm**

are the double-precision registers for the result and operand.

### Operation

The `VSQRT` instruction takes the square root of the contents of `Sm` or `Dm`, and places the result in `Sd` or `Dd`.

### Floating-point exceptions

`VSQRT` instructions can produce Invalid Operation or Inexact exceptions.

### Related information

[Condition code suffixes](#) on page 138

## 16.39 VSTM (floating-point) (A32)

Extension register store multiple.

### Syntax

`VSTMmode{cond} Rn{!}, Registers`

where:

**mode**

must be one of:

**IA**

meaning Increment address After each transfer. **IA** is the default, and can be omitted.

**DB**

meaning Decrement address Before each transfer.

**EA**

meaning Empty Ascending stack operation. This is the same as **IA** for stores.

**FD**

meaning Full Descending stack operation. This is the same as **DB** for stores.

***cond***

is an optional condition code.

**Rn**

is the general-purpose register holding the base address for the transfer.

!

is optional. ! specifies that the updated base address must be written back to **Rn**. If ! is not specified, **mode** must be **IA**.

**Registers**

is a list of consecutive extension registers enclosed in braces, { and }. The list can be comma-separated, or in range format. There must be at least one register in the list.

You can specify s or d registers, but they must not be mixed. The number of registers must not exceed 16 d registers.



`V PUSH Registers` is equivalent to `V STMDB sp!, Registers`.

You can use either form of this instruction. They both disassemble to `V PUSH`.

**Related information**

[Stack implementation using LDM and STM](#) on page 119

[Condition code suffixes](#) on page 138

## 16.40 VSTR (floating-point) (A32)

Extension register store.

**Syntax**

`V STR{cond}{.size} Fd, [Rn{, #offset}]`

where:

***cond***

is an optional condition code.

***size***

is an optional data size specifier. Must be 32 if *Fd* is an s register, or 64 otherwise.

***Fd***

is the extension register to be saved. It can be either a D or s register.

***Rn***

is the general-purpose register holding the base address for the transfer.

***offset***

is an optional numeric expression. It must evaluate to a numeric value at assembly time. The value must be a multiple of 4, and lie in the range -1020 to +1020. The value is added to the base address to form the address used for the transfer.

## Operation

The `vSTR` instruction saves the contents of an extension register to memory.

One word is transferred if *Fd* is an s register. Two words are transferred otherwise.

### Related information

[VLDR pseudo-instruction \(floating-point\) \(A32\)](#) on page 642

[Condition code suffixes](#) on page 138

[Register-relative and PC-relative expressions](#) on page 222

## 16.41 VSTR (post-increment and pre-decrement, floating-point) (A32)

Pseudo-instruction that stores extension registers with post-increment and pre-decrement forms.



There are also `VLDR` and `vSTR` instructions without post-increment and pre-decrement.

### Syntax

`VSTR{cond}{.size} Fd, [Rn], #offset ; post-increment`

`VSTR{cond}{.size} Fd, [Rn, #-offset]! ; pre-decrement`

where:

***cond***

is an optional condition code.

**size**

is an optional data size specifier. Must be 32 if *Fd* is an s register, or 64 if *Fd* is a d register.

**Fd**

is the extension register to be saved. It can be either a double precision (*Dd*) or a single precision (*Sd*) register.

**Rn**

is the general-purpose register holding the base address for the transfer.

**offset**

is a numeric expression that must evaluate to a numeric value at assembly time. The value must be 4 if *Fd* is an s register, or 8 if *Fd* is a d register.

## Operation

The post-increment instruction increments the base address in the register by the offset value, after the transfer. The pre-decrement instruction decrements the base address in the register by the offset value, and then performs the transfer using the new address in the register. This pseudo-instruction assembles to a `VSTM` instruction.

### Related information

[VSTR \(floating-point\) \(A32\)](#) on page 660

[VSTM \(floating-point\) \(A32\)](#) on page 659

[Condition code suffixes](#) on page 138

## 16.42 VSUB (floating-point) (A32)

Floating-point subtract.

### Syntax

`VSUB{cond}.F32 {Sd}, Sn, Sm`

`VSUB{cond}.F64 {Dd}, Dn, Dm`

where:

**cond**

is an optional condition code.

**Sd, Sn, Sm**

are the single-precision registers for the result and operands.

**Dd, Dn, Dm**

are the double-precision registers for the result and operands.

### Operation

The `vsub` instruction subtracts the value in the second operand register from the value in the first operand register, and places the result in the destination register.

## Floating-point exceptions

The `vsub` instruction can produce Invalid Operation, Overflow, or Inexact exceptions.

## Related information

[Condition code suffixes](#) on page 138

# 17. A64 General Instructions

Describes the A64 general instructions.

## 17.1 A64 instructions in alphabetical order

A summary of the A64 instructions and pseudo-instructions that are supported.

**Table 17-1: Summary of A64 general instructions**

| Mnemonic                                  | Brief description                             | See                                                             |
|-------------------------------------------|-----------------------------------------------|-----------------------------------------------------------------|
| ADC                                       | Add with Carry                                | <a href="#">ADC (A64)</a>                                       |
| ADCS                                      | Add with Carry, setting flags                 | <a href="#">ADCS (A64)</a>                                      |
| ADD (extended register)                   | Add (extended register)                       | <a href="#">ADD (extended register) (A64)</a>                   |
| ADD (immediate)                           | Add (immediate)                               | <a href="#">ADD (immediate) (A64)</a>                           |
| ADD (shifted register)                    | Add (shifted register)                        | <a href="#">ADD (shifted register) (A64)</a>                    |
| ADDS (extended register)                  | Add (extended register), setting flags        | <a href="#">ADDS (extended register) (A64)</a>                  |
| ADDS (immediate)                          | Add (immediate), setting flags                | <a href="#">ADDS (immediate) (A64)</a>                          |
| ADDS (shifted register)                   | Add (shifted register), setting flags         | <a href="#">ADDS (shifted register) (A64)</a>                   |
| ADR                                       | Form PC-relative address                      | <a href="#">ADR (A64)</a>                                       |
| ADRL pseudo-instruction                   | Load a PC-relative address into a register    | <a href="#">ADRL pseudo-instruction (A64)</a>                   |
| ADRP                                      | Form PC-relative address to 4KB page          | <a href="#">ADRP (A64)</a>                                      |
| AND (immediate)                           | Bitwise AND (immediate)                       | <a href="#">AND (immediate) (A64)</a>                           |
| AND (shifted register)                    | Bitwise AND (shifted register)                | <a href="#">AND (shifted register) (A64)</a>                    |
| ANDS (immediate)                          | Bitwise AND (immediate), setting flags        | <a href="#">ANDS (immediate) (A64)</a>                          |
| ANDS (shifted register)                   | Bitwise AND (shifted register), setting flags | <a href="#">ANDS (shifted register) (A64)</a>                   |
| ASR (register)                            | Arithmetic Shift Right (register)             | <a href="#">ASR (register) (A64)</a>                            |
| ASR (immediate)                           | Arithmetic Shift Right (immediate)            | <a href="#">ASR (immediate) (A64)</a>                           |
| ASRV                                      | Arithmetic Shift Right Variable               | <a href="#">ASRV (A64)</a>                                      |
| AT                                        | Address Translate                             | <a href="#">AT (A64)</a>                                        |
| AUTDA, AUTDZA                             | Authenticate Data address, using key A        | <a href="#">AUTDA, AUTDZA (A64)</a>                             |
| AUTDB, AUTDZB                             | Authenticate Data address, using key B        | <a href="#">AUTDB, AUTDZB (A64)</a>                             |
| AUTIA, AUTIZA, AUTIA1716, AUTIASP, AUTIAZ | Authenticate Instruction address, using key A | <a href="#">AUTIA, AUTIZA, AUTIA1716, AUTIASP, AUTIAZ (A64)</a> |
| AUTIB, AUTIZB, AUTIB1716, AUTIBSP, AUTIBZ | Authenticate Instruction address, using key B | <a href="#">AUTIB, AUTIZB, AUTIB1716, AUTIBSP, AUTIBZ (A64)</a> |
| B.cond                                    | Branch conditionally                          | <a href="#">B.cond</a>                                          |
| B                                         | Branch                                        | <a href="#">B (A64)</a>                                         |
| BFC                                       | Bitfield Clear, leaving other bits unchanged  | <a href="#">BFC (A64)</a>                                       |
| BFI                                       | Bitfield Insert                               | <a href="#">BFI (A64)</a>                                       |
| BFM                                       | Bitfield Move                                 | <a href="#">BFM (A64)</a>                                       |
| BFXIL                                     | Bitfield extract and insert at low end        | <a href="#">BFXIL (A64)</a>                                     |

| Mnemonic                           | Brief description                                                                                                  | See                                                      |
|------------------------------------|--------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|
| BIC (shifted register)             | Bitwise Bit Clear (shifted register)                                                                               | <a href="#">BIC (shifted register) (A64)</a>             |
| BICS (shifted register)            | Bitwise Bit Clear (shifted register), setting flags                                                                | <a href="#">BICS (shifted register) (A64)</a>            |
| BL                                 | Branch with Link                                                                                                   | <a href="#">BL (A64)</a>                                 |
| BLR                                | Branch with Link to Register                                                                                       | <a href="#">BLR (A64)</a>                                |
| BLRAA, BLRAAZ, BLRAB, BLRABZ       | Branch with Link to Register, with pointer authentication                                                          | <a href="#">BLRAA, BLRAAZ, BLRAB, BLRABZ (A64)</a>       |
| BR                                 | Branch to Register                                                                                                 | <a href="#">BR (A64)</a>                                 |
| BRAA, BRAAZ, BRAB, BRABZ           | Branch to Register, with pointer authentication                                                                    | <a href="#">BRAA, BRAAZ, BRAB, BRABZ (A64)</a>           |
| BRK                                | Breakpoint instruction                                                                                             | <a href="#">BRK (A64)</a>                                |
| CBNZ                               | Compare and Branch on Nonzero                                                                                      | <a href="#">CBNZ (A64)</a>                               |
| CBZ                                | Compare and Branch on Zero                                                                                         | <a href="#">CBZ (A64)</a>                                |
| CCMN (immediate)                   | Conditional Compare Negative (immediate)                                                                           | <a href="#">CCMN (immediate) (A64)</a>                   |
| CCMN (register)                    | Conditional Compare Negative (register)                                                                            | <a href="#">CCMN (register) (A64)</a>                    |
| CCMP (immediate)                   | Conditional Compare (immediate)                                                                                    | <a href="#">CCMP (immediate) (A64)</a>                   |
| CCMP (register)                    | Conditional Compare (register)                                                                                     | <a href="#">CCMP (register) (A64)</a>                    |
| CINC                               | Conditional Increment                                                                                              | <a href="#">CINC (A64)</a>                               |
| CINV                               | Conditional Invert                                                                                                 | <a href="#">CINV (A64)</a>                               |
| CLREX                              | Clear Exclusive                                                                                                    | <a href="#">CLREX (A64)</a>                              |
| CLS                                | Count leading sign bits                                                                                            | <a href="#">CLS (A64)</a>                                |
| CLZ                                | Count leading zero bits                                                                                            | <a href="#">CLZ (A64)</a>                                |
| CMN (extended register)            | Compare Negative (extended register)                                                                               | <a href="#">CMN (extended register) (A64)</a>            |
| CMN (immediate)                    | Compare Negative (immediate)                                                                                       | <a href="#">CMN (immediate) (A64)</a>                    |
| CMN (shifted register)             | Compare Negative (shifted register)                                                                                | <a href="#">CMN (shifted register) (A64)</a>             |
| CMP (extended register)            | Compare (extended register)                                                                                        | <a href="#">CMP (extended register) (A64)</a>            |
| CMP (immediate)                    | Compare (immediate)                                                                                                | <a href="#">CMP (immediate) (A64)</a>                    |
| CMP (shifted register)             | Compare (shifted register)                                                                                         | <a href="#">CMP (shifted register) (A64)</a>             |
| CNEG                               | Conditional Negate                                                                                                 | <a href="#">CNEG (A64)</a>                               |
| CRC32B, CRC32H, CRC32W, CRC32X     | CRC32 checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register  | <a href="#">CRC32B, CRC32H, CRC32W, CRC32X (A64)</a>     |
| CRC32CB, CRC32CH, CRC32CW, CRC32CX | CRC32C checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register | <a href="#">CRC32CB, CRC32CH, CRC32CW, CRC32CX (A64)</a> |
| CSEL                               | Conditional Select                                                                                                 | <a href="#">CSEL (A64)</a>                               |
| CSET                               | Conditional Set                                                                                                    | <a href="#">CSET (A64)</a>                               |
| CSETM                              | Conditional Set Mask                                                                                               | <a href="#">CSETM (A64)</a>                              |
| CSINC                              | Conditional Select Increment                                                                                       | <a href="#">CSINC (A64)</a>                              |
| CSINV                              | Conditional Select Invert                                                                                          | <a href="#">CSINV (A64)</a>                              |
| CSNEG                              | Conditional Select Negation                                                                                        | <a href="#">CSNEG (A64)</a>                              |
| DC                                 | Data Cache operation                                                                                               | <a href="#">DC (A64)</a>                                 |

| Mnemonic                      | Brief description                                                             | See                                                 |
|-------------------------------|-------------------------------------------------------------------------------|-----------------------------------------------------|
| DCPS1                         | Debug Change PE State to EL1                                                  | <a href="#">DCPS1 (A64)</a>                         |
| DCPS2                         | Debug Change PE State to EL2                                                  | <a href="#">DCPS2 (A64)</a>                         |
| DCPS3                         | Debug Change PE State to EL3                                                  | <a href="#">DCPS3 (A64)</a>                         |
| DMB                           | Data Memory Barrier                                                           | <a href="#">DMB (A64)</a>                           |
| DRPS                          | Debug restore process state                                                   | <a href="#">DRPS (A64)</a>                          |
| DSB                           | Data Synchronization Barrier                                                  | <a href="#">DSB (A64)</a>                           |
| EON (shifted register)        | Bitwise Exclusive OR NOT (shifted register)                                   | <a href="#">EON (shifted register) (A64)</a>        |
| EOR (immediate)               | Bitwise Exclusive OR (immediate)                                              | <a href="#">EOR (immediate) (A64)</a>               |
| EOR (shifted register)        | Bitwise Exclusive OR (shifted register)                                       | <a href="#">EOR (shifted register) (A64)</a>        |
| ERET                          | Returns from an exception                                                     | <a href="#">ERET (A64)</a>                          |
| ERETAA, ERETAB                | Exception Return, with pointer authentication                                 | <a href="#">ERETAA, ERETAB (A64)</a>                |
| ESB                           | Error Synchronization Barrier                                                 | <a href="#">ESB (A64)</a>                           |
| EXTR                          | Extract register                                                              | <a href="#">EXTR (A64)</a>                          |
| HINT                          | Hint instruction                                                              | <a href="#">HINT (A64)</a>                          |
| HLT                           | Halt instruction                                                              | <a href="#">HLT (A64)</a>                           |
| HVC                           | Hypervisor call to allow OS code to call the Hypervisor                       | <a href="#">HVC (A64)</a>                           |
| IC                            | Instruction Cache operation                                                   | <a href="#">IC (A64)</a>                            |
| ISB                           | Instruction Synchronization Barrier                                           | <a href="#">ISB (A64)</a>                           |
| LSL (register)                | Logical Shift Left (register)                                                 | <a href="#">LSL (register) (A64)</a>                |
| LSL (immediate)               | Logical Shift Left (immediate)                                                | <a href="#">LSL (immediate) (A64)</a>               |
| LSLV                          | Logical Shift Left Variable                                                   | <a href="#">LSLV (A64)</a>                          |
| LSR (register)                | Logical Shift Right (register)                                                | <a href="#">LSR (register) (A64)</a>                |
| LSR (immediate)               | Logical Shift Right (immediate)                                               | <a href="#">LSR (immediate) (A64)</a>               |
| LSRV                          | Logical Shift Right Variable                                                  | <a href="#">LSRV (A64)</a>                          |
| MADD                          | Multiply-Add                                                                  | <a href="#">MADD (A64)</a>                          |
| MNEG                          | Multiply-Negate                                                               | <a href="#">MNEG (A64)</a>                          |
| MOV (to or from SP)           | Move between register and stack pointer                                       | <a href="#">MOV (to or from SP) (A64)</a>           |
| MOV (inverted wide immediate) | Move (inverted wide immediate)                                                | <a href="#">MOV (inverted wide immediate) (A64)</a> |
| MOV (wide immediate)          | Move (wide immediate)                                                         | <a href="#">MOV (wide immediate) (A64)</a>          |
| MOV (bitmask immediate)       | Move (bitmask immediate)                                                      | <a href="#">MOV (bitmask immediate) (A64)</a>       |
| MOV (register)                | Move (register)                                                               | <a href="#">MOV (register) (A64)</a>                |
| MOVK                          | Move wide with keep                                                           | <a href="#">MOVK (A64)</a>                          |
| MOVL pseudo-instruction       | Load a register with either a 32-bit or 64-bit immediate value or any address | <a href="#">MOVL pseudo-instruction (A64)</a>       |
| MOVN                          | Move wide with NOT                                                            | <a href="#">MOVN (A64)</a>                          |
| MOVZ                          | Move wide with zero                                                           | <a href="#">MOVZ (A64)</a>                          |
| MRS                           | Move System Register                                                          | <a href="#">MRS (A64)</a>                           |
| MSR (immediate)               | Move immediate value to Special Register                                      | <a href="#">MSR (immediate) (A64)</a>               |

| Mnemonic                                  | Brief description                                                | See                                                             |
|-------------------------------------------|------------------------------------------------------------------|-----------------------------------------------------------------|
| MSR (register)                            | Move general-purpose register to System Register                 | <a href="#">MSR (register) (A64)</a>                            |
| MSUB                                      | Multiply-Subtract                                                | <a href="#">MSUB (A64)</a>                                      |
| MUL                                       | Multiply                                                         | <a href="#">MUL (A64)</a>                                       |
| MVN                                       | Bitwise NOT                                                      | <a href="#">MVN (A64)</a>                                       |
| NEG (shifted register)                    | Negate (shifted register)                                        | <a href="#">NEG (shifted register) (A64)</a>                    |
| NEGS                                      | Negate, setting flags                                            | <a href="#">NEGS (A64)</a>                                      |
| NGC                                       | Negate with Carry                                                | <a href="#">NGC (A64)</a>                                       |
| NGCS                                      | Negate with Carry, setting flags                                 | <a href="#">NGCS (A64)</a>                                      |
| NOP                                       | No Operation                                                     | <a href="#">NOP (A64)</a>                                       |
| ORN (shifted register)                    | Bitwise OR NOT (shifted register)                                | <a href="#">ORN (shifted register) (A64)</a>                    |
| ORR (immediate)                           | Bitwise OR (immediate)                                           | <a href="#">ORR (immediate) (A64)</a>                           |
| ORR (shifted register)                    | Bitwise OR (shifted register)                                    | <a href="#">ORR (shifted register) (A64)</a>                    |
| PACDA, PACDZA                             | Pointer Authentication Code for Data address, using key A        | <a href="#">PACDA, PACDZA (A64)</a>                             |
| PACDB, PACDZB                             | Pointer Authentication Code for Data address, using key B        | <a href="#">PACDB, PACDZB (A64)</a>                             |
| PACGA                                     | Pointer Authentication Code, using Generic key                   | <a href="#">PACGA (A64)</a>                                     |
| PACIA, PACIZA, PACIA1716, PACIASP, PACIAZ | Pointer Authentication Code for Instruction address, using key A | <a href="#">PACIA, PACIZA, PACIA1716, PACIASP, PACIAZ (A64)</a> |
| PACIB, PACIZB, PACIB1716, PACIBSP, PACIBZ | Pointer Authentication Code for Instruction address, using key B | <a href="#">PACIB, PACIZB, PACIB1716, PACIBSP, PACIBZ (A64)</a> |
| PSB                                       | Profiling Synchronization Barrier                                | <a href="#">PSB (A64)</a>                                       |
| RBIT                                      | Reverse Bits                                                     | <a href="#">RBIT (A64)</a>                                      |
| RET                                       | Return from subroutine                                           | <a href="#">RET (A64)</a>                                       |
| RETAAC, RETAB                             | Return from subroutine, with pointer authentication              | <a href="#">RETAAC, RETAB (A64)</a>                             |
| REV16                                     | Reverse bytes in 16-bit halfwords                                | <a href="#">REV16 (A64)</a>                                     |
| REV32                                     | Reverse bytes in 32-bit words                                    | <a href="#">REV32 (A64)</a>                                     |
| REV64                                     | Reverse Bytes                                                    | <a href="#">REV64 (A64)</a>                                     |
| REV                                       | Reverse Bytes                                                    | <a href="#">REV (A64)</a>                                       |
| ROR (immediate)                           | Rotate right (immediate)                                         | <a href="#">ROR (immediate) (A64)</a>                           |
| ROR (register)                            | Rotate Right (register)                                          | <a href="#">ROR (register) (A64)</a>                            |
| RORV                                      | Rotate Right Variable                                            | <a href="#">RORV (A64)</a>                                      |
| SBC                                       | Subtract with Carry                                              | <a href="#">SBC (A64)</a>                                       |
| SBCS                                      | Subtract with Carry, setting flags                               | <a href="#">SBCS (A64)</a>                                      |
| SBFIZ                                     | Signed Bitfield Insert in Zero                                   | <a href="#">SBFIZ (A64)</a>                                     |
| SBFM                                      | Signed Bitfield Move                                             | <a href="#">SBFM (A64)</a>                                      |
| SBFX                                      | Signed Bitfield Extract                                          | <a href="#">SBFX (A64)</a>                                      |
| SDIV                                      | Signed Divide                                                    | <a href="#">SDIV (A64)</a>                                      |
| SEV                                       | Send Event                                                       | <a href="#">SEV (A64)</a>                                       |

| Mnemonic                 | Brief description                                                         | See                                            |
|--------------------------|---------------------------------------------------------------------------|------------------------------------------------|
| SEVL                     | Send Event Local                                                          | <a href="#">SEVL (A64)</a>                     |
| SMADDL                   | Signed Multiply-Add Long                                                  | <a href="#">SMADDL (A64)</a>                   |
| SMC                      | Supervisor call to allow OS or Hypervisor code to call the Secure Monitor | <a href="#">SMC (A64)</a>                      |
| SMNEGL                   | Signed Multiply-Negate Long                                               | <a href="#">SMNEGL (A64)</a>                   |
| SMSUBL                   | Signed Multiply-Subtract Long                                             | <a href="#">SMSUBL (A64)</a>                   |
| SMULH                    | Signed Multiply High                                                      | <a href="#">SMULH (A64)</a>                    |
| SMULL                    | Signed Multiply Long                                                      | <a href="#">SMULL (A64)</a>                    |
| SUB (extended register)  | Subtract (extended register)                                              | <a href="#">SUB (extended register) (A64)</a>  |
| SUB (immediate)          | Subtract (immediate)                                                      | <a href="#">SUB (immediate) (A64)</a>          |
| SUB (shifted register)   | Subtract (shifted register)                                               | <a href="#">SUB (shifted register) (A64)</a>   |
| SUBS (extended register) | Subtract (extended register), setting flags                               | <a href="#">SUBS (extended register) (A64)</a> |
| SUBS (immediate)         | Subtract (immediate), setting flags                                       | <a href="#">SUBS (immediate) (A64)</a>         |
| SUBS (shifted register)  | Subtract (shifted register), setting flags                                | <a href="#">SUBS (shifted register) (A64)</a>  |
| SVC                      | Supervisor call to allow application code to call the OS                  | <a href="#">SVC (A64)</a>                      |
| SXTB                     | Signed Extend Byte                                                        | <a href="#">SXTB (A64)</a>                     |
| SXTH                     | Sign Extend Halfword                                                      | <a href="#">SXTH (A64)</a>                     |
| SXTW                     | Sign Extend Word                                                          | <a href="#">SXTW (A64)</a>                     |
| SYS                      | System instruction                                                        | <a href="#">SYS (A64)</a>                      |
| SYSL                     | System instruction with result                                            | <a href="#">SYSL (A64)</a>                     |
| TBNZ                     | Test bit and Branch if Nonzero                                            | <a href="#">TBNZ (A64)</a>                     |
| TBZ                      | Test bit and Branch if Zero                                               | <a href="#">TBZ (A64)</a>                      |
| TLBI                     | TLB Invalidate operation                                                  | <a href="#">TLBI (A64)</a>                     |
| TST (immediate)          | , setting the condition flags and discarding the result                   | <a href="#">TST (immediate) (A64)</a>          |
| TST (shifted register)   | Test (shifted register)                                                   | <a href="#">TST (shifted register) (A64)</a>   |
| UBFIZ                    | Unsigned Bitfield Insert in Zero                                          | <a href="#">UBFIZ (A64)</a>                    |
| UBFM                     | Unsigned Bitfield Move                                                    | <a href="#">UBFM (A64)</a>                     |
| UBFX                     | Unsigned Bitfield Extract                                                 | <a href="#">UBFX (A64)</a>                     |
| UDIV                     | Unsigned Divide                                                           | <a href="#">UDIV (A64)</a>                     |
| UMADDL                   | Unsigned Multiply-Add Long                                                | <a href="#">UMADDL (A64)</a>                   |
| UMNEGL                   | Unsigned Multiply-Negate Long                                             | <a href="#">UMNEGL (A64)</a>                   |
| UMSUBL                   | Unsigned Multiply-Subtract Long                                           | <a href="#">UMSUBL (A64)</a>                   |
| UMULH                    | Unsigned Multiply High                                                    | <a href="#">UMULH (A64)</a>                    |
| UMULL                    | Unsigned Multiply Long                                                    | <a href="#">UMULL (A64)</a>                    |
| UXTB                     | Unsigned Extend Byte                                                      | <a href="#">UXTB (A64)</a>                     |
| UXTH                     | Unsigned Extend Halfword                                                  | <a href="#">UXTH (A64)</a>                     |
| WFE                      | Wait For Event                                                            | <a href="#">WFE (A64)</a>                      |
| WFI                      | Wait For Interrupt                                                        | <a href="#">WFI (A64)</a>                      |
| XPACD, XPACI, XPAACLRI   | Strip Pointer Authentication Code                                         | <a href="#">XPACD, XPACI, XPAACLRI (A64)</a>   |

| Mnemonic | Brief description | See                         |
|----------|-------------------|-----------------------------|
| YIELD    | YIELD             | <a href="#">YIELD (A64)</a> |

## 17.2 Register restrictions for A64 instructions

In A64 instructions, the general-purpose integer registers are W0-W30 for 32-bit registers and X0-X30 for 64-bit registers.

You cannot refer to register 31 by number. In a few instructions, you can refer to it using one of the following names:

**WSP**

the current stack pointer in a 32-bit context.

**SP**

the current stack pointer in a 64-bit context.

**WZR**

the zero register in a 32-bit context.

**XZR**

the zero register in a 64-bit context.

You can only use one of these names if it is mentioned in the Syntax section for the instruction.

You cannot refer to the Program Counter (PC) explicitly by name or by number.

## 17.3 ADC (A64)

Add with Carry.

### Syntax

ADC *Wd*, *Wn*, *Wm* ; 32-bit

ADC *Xd*, *Xn*, *Xm* ; 64-bit

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Wn**

Is the 32-bit name of the first general-purpose source register.

**Wm**

Is the 32-bit name of the second general-purpose source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

## Operation

Add with Carry adds two register values and the Carry flag value, and writes the result to the destination register.

$Rd = Rn + Rm + C$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.4 ADCS (A64)

Add with Carry, setting flags.

### Syntax

ADCS *Wd*, *Wn*, *Wm* ; 32-bit

ADCS *Xd*, *Xn*, *Xm* ; 64-bit

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

## Operation

Add with Carry, setting flags, adds two register values and the Carry flag value, and writes the result to the destination register. It updates the condition flags based on the result.

$Rd = Rn + Rm + C$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.5 ADD (extended register) (A64)

Add (extended register).

### Syntax

`ADD Rd|WSP, Rn|WSP, Rm, {extend #{}amount} ; 32-bit`

`ADD Xd|SP, Xn|SP, Rm, {extend #{}amount} ; 64-bit`

Where:

#### **Rd | WSP**

Is the 32-bit name of the destination general-purpose register or stack pointer.

#### **Rn | WSP**

Is the 32-bit name of the first source general-purpose register or stack pointer.

#### **Rm**

Is the 32-bit name of the second general-purpose source register.

#### **extend**

Is the extension to be applied to the second source operand:

#### **32-bit general registers**

Can be one of UXTB, UXTH, LSL|UXTW, UXTX, SXTB, SXTH, SXTW or SXTX.

If  $Rd$  or  $Rn$  is WSP then LSL is preferred rather than UXTW, and can be omitted when  $amount$  is 0. In all other cases extend is required and must be UXTW rather than LSL.

#### **64-bit general registers**

Can be one of UXTB, UXTH, UXTW, LSL|UXTX, SXTB, SXTH, SXTW or SXTX.

If  $Rd$  or  $Rn$  is SP then LSL is preferred rather than UXTX, and can be omitted when  $amount$  is 0. In all other cases extend is required and must be UXTX rather than LSL.

#### **Xd | SP**

Is the 64-bit name of the destination general-purpose register or stack pointer.

#### **Xn | SP**

Is the 64-bit name of the first source general-purpose register or stack pointer.

**R**

Is a width specifier, and can be either w or x.

**m**

Is the number [0-30] of the second general-purpose source register or the name ZR (31).

**amount**

Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0. It must be absent when *extend* is absent, is required when *extend* is LSL, and is optional when *extend* is present but not LSL.

**Operation**

Add (extended register) adds a register value and a sign or zero-extended register value, followed by an optional left shift amount, and writes the result to the destination register. The argument that is extended from the *Rm* register can be a byte, halfword, word, or doubleword.

*Rd* = *Rn* + LSL(*extend(Rm)*, *amount*), where *R* is either w or x.

**Usage**

**Table 17-2: ADD (64-bit general registers) specifier combinations**

| <b>R</b> | <b>extend</b> |
|----------|---------------|
| W        | SXTB          |
| W        | SXTH          |
| W        | SXTW          |
| W        | UXTB          |
| W        | UXTH          |
| W        | UXTW          |
| X        | LSL   UXTX    |
| X        | SXTX          |

**Related information**

[A64 instructions in alphabetical order](#) on page 664

## 17.6 ADD (immediate) (A64)

Add (immediate).

This instruction is used by the alias mov (to or from SP).

**Syntax**

ADD *Wd|WSP, Wn|WSP, #imm, {shift}* ; 32-bit

ADD *Xd|SP, Xn|SP, #imm, {shift}* ; 64-bit

Where:

***Wd | WSP***

Is the 32-bit name of the destination general-purpose register or stack pointer.

***Wn | WSP***

Is the 32-bit name of the source general-purpose register or stack pointer.

***Xd | SP***

Is the 64-bit name of the destination general-purpose register or stack pointer.

***Xn | SP***

Is the 64-bit name of the source general-purpose register or stack pointer.

***imm***

Is an unsigned immediate, in the range 0 to 4095.

***shift***

Is the optional left shift to apply to the immediate, defaulting to `LSL #0`, and can be either `LSL #0` or `LSL #12`.

## Operation

Add (immediate) adds a register value and an optionally-shifted immediate value, and writes the result to the destination register.

$Rd = Rn + \text{shift}(\text{imm})$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[MOV \(to or from SP\) \(A64\)](#) on page 748

[A64 instructions in alphabetical order](#) on page 664

## 17.7 ADD (shifted register) (A64)

Add (shifted register).

### Syntax

`ADD Wd, Wn, Wm, {shift #amount} ; 32-bit`

`ADD Xd, Xn, Xm, {shift #amount} ; 64-bit`

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***amount***

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

***shift***

Is the optional shift type to be applied to the second source operand, defaulting to LSL, and can be one of LSL, LSR, or ASR.

## Operation

Add (shifted register) adds a register value and an optionally-shifted register value, and writes the result to the destination register.

$Rd = Rn + shift(Rm, amount)$ , where  $R$  is either w or x.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.8 ADDS (extended register) (A64)

Add (extended register), setting flags.

This instruction is used by the alias CMN (extended register).

### Syntax

```
ADDS Wd, Wn|WSP, Wm, {extend #{}amount} ; 32-bit
```

```
ADDS Xd, Xn|SP, Rm, {extend #{}amount} ; 64-bit
```

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn | WSP***

Is the 32-bit name of the first source general-purpose register or stack pointer.

***Rm***

Is the 32-bit name of the second general-purpose source register.

***extend***

Is the extension to be applied to the second source operand:

**32-bit general registers**

Can be one of UXTB, UXTH, LSL|UXTW, UXTX, SXTB, SXTH, SXTW or SXTX.

If *Rn* is WSP then LSL is preferred rather than UXTW, and can be omitted when *amount* is 0. In all other cases *extend* is required and must be UXTW rather than LSL.

**64-bit general registers**

Can be one of UXTB, UXTH, UXTW, LSL|UXTX, SXTB, SXTH, SXTW or SXTX.

If *Rn* is SP then LSL is preferred rather than UXTX, and can be omitted when *amount* is 0. In all other cases *extend* is required and must be UXTX rather than LSL.

***Rd***

Is the 64-bit name of the general-purpose destination register.

***Rn | SP***

Is the 64-bit name of the first source general-purpose register or stack pointer.

***R***

Is a width specifier, and can be either w or x.

***m***

Is the number [0-30] of the second general-purpose source register or the name ZR (31).

***amount***

Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0. It must be absent when *extend* is absent, is required when *extend* is LSL, and is optional when *extend* is present but not LSL.

**Operation**

Add (extended register), setting flags, adds a register value and a sign or zero-extended register value, followed by an optional left shift amount, and writes the result to the destination register. The argument that is extended from the *Rm* register can be a byte, halfword, word, or doubleword. It updates the condition flags based on the result.

*Rd* = *Rn* + LSL(*extend(Rm)*, *amount*), where *R* is either w or x.

**Usage**

**Table 17-3: ADDS (64-bit general registers) specifier combinations**

| <b><i>R</i></b> | <b><i>extend</i></b> |
|-----------------|----------------------|
| W               | SXTB                 |
| W               | SXTH                 |
| W               | SXTW                 |

| <b>R</b> | <b>extend</b> |
|----------|---------------|
| W        | UXTB          |
| W        | UXTH          |
| W        | UXTW          |
| X        | LSL   UXTX    |
| X        | SXTX          |

## Related information

[CMN \(extended register\) \(A64\)](#) on page 710

[A64 instructions in alphabetical order](#) on page 664

## 17.9 ADDS (immediate) (A64)

Add (immediate), setting flags.

This instruction is used by the alias `CMN` (immediate).

### Syntax

`ADDS Rd, Rn|RSP, #imm, {shift} ; 32-bit`

`ADDS Xd, Xn|SP, #imm, {shift} ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn|RSP**

Is the 32-bit name of the source general-purpose register or stack pointer.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn|SP**

Is the 64-bit name of the source general-purpose register or stack pointer.

**imm**

Is an unsigned immediate, in the range 0 to 4095.

**shift**

Is the optional left shift to apply to the immediate, defaulting to `LSL #0`, and can be either `LSL #0` or `LSL #12`.

### Operation

Add (immediate), setting flags, adds a register value and an optionally-shifted immediate value, and writes the result to the destination register. It updates the condition flags based on the result.

$Rd = Rn + shift(imm)$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[CMN \(immediate\) \(A64\)](#) on page 711

[A64 instructions in alphabetical order](#) on page 664

## 17.10 ADDS (shifted register) (A64)

Add (shifted register), setting flags.

This instruction is used by the alias `CMN` (shifted register).

### Syntax

```
ADDS Wd, Wn, Wm, {shift #amount} ; 32-bit
```

```
ADDS Xd, Xn, Xm, {shift #amount} ; 64-bit
```

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***amount***

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

***shift***

Is the optional shift type to be applied to the second source operand, defaulting to LSL, and can be one of LSL, LSR, or ASR.

### Operation

Add (shifted register), setting flags, adds a register value and an optionally-shifted register value, and writes the result to the destination register. It updates the condition flags based on the result.

$Rd = Rn + \text{shift}(Rm, amount)$ , where  $R$  is either w or x.

### Related information

[CMN \(shifted register\) \(A64\)](#) on page 712

[A64 instructions in alphabetical order](#) on page 664

## 17.11 ADR (A64)

Form PC-relative address.

### Syntax

`ADR Rd, label`

Where:

**Rd**

Is the 64-bit name of the general-purpose destination register.

**label**

Is the program label whose address is to be calculated. Its offset from the address of this instruction, in the range  $\pm 1\text{MB}$ .

### Usage

Form PC-relative address adds an immediate value to the PC value to form a PC-relative address, and writes the result to the destination register.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.12 ADRL pseudo-instruction (A64)

Load a PC-relative address into a register. It is similar to the `ADR` instruction but `ADRL` can load a wider range of addresses than `ADR` because it generates two data processing instructions.

### Syntax

`ADRL Rd, label`

`ADRL Xd, label`

where:

**Rd**

Is the register to load with a 32-bit address.

**Xd**

Is the register to load with a 64-bit address.

**label**

Is a PC-relative expression.

**Usage**

ADRL assembles to two instructions, an ADRP followed by ADD.

If the assembler cannot construct the address in two instructions, it generates a relocation. The linker then generates the correct offsets.

ADRL produces position-independent code, because the address is calculated relative to PC.

**Example**

```
ADRL x0, mylabel ; loads address of mylabel into x0
```

**Related information**

[Register-relative and PC-relative expressions](#) on page 222

[A-Profile Architectures](#)

## 17.13 ADRP (A64)

Form PC-relative address to 4KB page.

**Syntax**

```
ADRP Xd, label
```

Where:

**Xd**

Is the 64-bit name of the general-purpose destination register.

**label**

Is the program label whose 4KB page address is to be calculated. Its offset from the page address of this instruction, in the range ±4GB.

**Usage**

Form PC-relative address to 4KB page adds an immediate value that is shifted left by 12 bits, to the PC value to form a PC-relative address, with the bottom 12 bits masked out, and writes the result to the destination register.

**Related information**

[A64 instructions in alphabetical order](#) on page 664

## 17.14 AND (immediate) (A64)

Bitwise AND (immediate).

### Syntax

`AND Wd|WSP, Wn, #imm ; 32-bit`

`AND Xd|SP, Xn, #imm ; 64-bit`

Where:

#### **Wd|WSP**

Is the 32-bit name of the destination general-purpose register or stack pointer.

#### **Wn**

Is the 32-bit name of the general-purpose source register.

#### **imm**

The bitmask immediate.

#### **Xd|SP**

Is the 64-bit name of the destination general-purpose register or stack pointer.

#### **Xn**

Is the 64-bit name of the general-purpose source register.

### Operation

Bitwise AND (immediate) performs a bitwise AND of a register value and an immediate value, and writes the result to the destination register.

`Rd = Rn AND imm`, where `R` is either `w` or `x`.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.15 AND (shifted register) (A64)

Bitwise AND (shifted register).

### Syntax

`AND Wd, Wn, Wm, {shift #amount} ; 32-bit`

`AND Xd, Xn, Xm, {shift #amount} ; 64-bit`

Where:

#### **Wd**

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***amount***

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

***shift***

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSL, LSR, ASR, or ROR.

## Operation

Bitwise AND (shifted register) performs a bitwise AND of a register value and an optionally-shifted register value, and writes the result to the destination register.

$Rd = Rn \text{ AND } \text{shift}(Rm, amount)$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.16 ANDS (immediate) (A64)

Bitwise AND (immediate), setting flags.

This instruction is used by the alias `TST` (immediate).

## Syntax

`ANDS Wd, Wn, #imm ; 32-bit`

`ANDS Xd, Xn, #imm ; 64-bit`

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the general-purpose source register.

***imm***

The bitmask immediate.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

## Operation

Bitwise AND (immediate), setting flags, performs a bitwise AND of a register value and an immediate value, and writes the result to the destination register. It updates the condition flags based on the result.

$Rd = Rn \text{ AND } imm$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[TST \(immediate\) \(A64\)](#) on page 804

[A64 instructions in alphabetical order](#) on page 664

## 17.17 ANDS (shifted register) (A64)

Bitwise AND (shifted register), setting flags.

This instruction is used by the alias `tst` (shifted register).

## Syntax

```
ANDS Wd, Wn, Wm, {shift #amount} ; 32-bit
```

```
ANDS Xd, Xn, Xm, {shift #amount} ; 64-bit
```

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***amount***

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

***shift***

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSL, LSR, ASR, or ROR.

## Operation

Bitwise AND (shifted register), setting flags, performs a bitwise AND of a register value and an optionally-shifted register value, and writes the result to the destination register. It updates the condition flags based on the result.

$Rd = Rn \text{ AND } \text{shift}(Rm, amount)$ , where  $R$  is either w or x.

## Related information

[TST \(shifted register\) \(A64\)](#) on page 805

[A64 instructions in alphabetical order](#) on page 664

## 17.18 ASR (register) (A64)

Arithmetic Shift Right (register).

This instruction is an alias of ASRV.

The equivalent instruction is ASRV  $wd, wn, wm$ .

## Syntax

ASR  $wd, wn, wm ; 32\text{-bit}$

ASR  $xd, xn, xm ; 64\text{-bit}$

Where:

***wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in its bottom 5 bits.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in its bottom 6 bits.

## Operation

Arithmetic Shift Right (register) shifts a register value right by a variable number of bits, shifting in copies of its sign bit, and writes the result to the destination register. The remainder obtained by dividing the second source register by the data size defines the number of bits by which the first source register is right-shifted.

*Rd* = ASR (*Rn*, *Rm*), where *R* is either *w* or *x*.

## Related information

[ASRV \(A64\)](#) on page 685

[A64 instructions in alphabetical order](#) on page 664

## 17.19 ASR (immediate) (A64)

Arithmetic Shift Right (immediate).

This instruction is an alias of SBFM.

The equivalent instruction is SBFM *wd*, *wn*, #*shift*, #31.

## Syntax

ASR *wd*, *wn*, #*shift* ; 32-bit

ASR *xd*, *xn*, #*shift* ; 64-bit

Where:

***wd***

Is the 32-bit name of the general-purpose destination register.

***wn***

Is the 32-bit name of the general-purpose source register.

***shift***

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31.

**64-bit general registers** Is the shift amount, in the range 0 to 63.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

## Operation

Arithmetic Shift Right (immediate) shifts a register value right by an immediate number of bits, shifting in copies of the sign bit in the upper bits and zeros in the lower bits, and writes the result to the destination register.

*Rd* = ASR(*Rn*, *shift*), where *R* is either *w* or *x*.

## Related information

[SBFM \(A64\)](#) on page 782

[A64 instructions in alphabetical order](#) on page 664

## 17.20 ASRV (A64)

Arithmetic Shift Right Variable.

This instruction is used by the alias `ASR` (register).

### Syntax

`ASRV Wd, Wn, Wm ; 32-bit general registers`

`ASRV Xd, Xn, Xm ; 64-bit general registers`

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in its bottom 5 bits.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in its bottom 6 bits.

## Operation

Arithmetic Shift Right Variable shifts a register value right by a variable number of bits, shifting in copies of its sign bit, and writes the result to the destination register. The remainder obtained by dividing the second source register by the data size defines the number of bits by which the first source register is right-shifted.

$Rd = \text{ASR}(Rn, Rm)$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[ASR \(register\) \(A64\)](#) on page 683

[A64 instructions in alphabetical order](#) on page 664

## 17.21 AT (A64)

Address Translate.

This instruction is an alias of `sys`.

The equivalent instruction is `sys #op1, c7, cm, #op2, xt`.

## Syntax

`AT at_op, xt`

Where:

***at\_op***

Is an AT instruction name, as listed for the AT system instruction group, and can be one of the values shown in Usage.

***op1***

Is a 3-bit unsigned immediate, in the range 0 to 7.

***Cm***

Is a name `cm`, with `m` in the range 0 to 15.

***op2***

Is a 3-bit unsigned immediate, in the range 0 to 7.

***xt***

Is the 64-bit name of the general-purpose source register.

## Usage

Address Translate. For more information, see *A64 system instructions for address translation* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The following table shows the valid specifier combinations:

**Table 17-4: SYS parameter values corresponding to AT operations**

| <b>at_op</b> | <b>op1</b> | <b>Cm</b> | <b>op2</b> |
|--------------|------------|-----------|------------|
| S12EOR       | 4          | 0         | 6          |
| S12EOW       | 4          | 0         | 7          |
| S12E1R       | 4          | 0         | 4          |
| S12E1W       | 4          | 0         | 5          |
| S1EOR        | 0          | 0         | 2          |
| S1EOW        | 0          | 0         | 3          |
| S1E1R        | 0          | 0         | 0          |
| S1E1RP       | 0          | 1         | 0          |
| S1E1W        | 0          | 0         | 1          |
| S1E1WP       | 0          | 1         | 1          |
| S1E2R        | 4          | 0         | 0          |
| S1E2W        | 4          | 0         | 1          |
| S1E3R        | 6          | 0         | 0          |
| S1E3W        | 6          | 0         | 1          |

## Related information

[SYS \(A64\)](#) on page 800

[A64 instructions in alphabetical order](#) on page 664

## 17.22 AUTDA, AUTDZA (A64)

Authenticate Data address, using key A.

### Syntax

`AUTDA Xd, Xn|SP ; AUTDA general registers`

`AUTDZA Xd ; AUTDZA general registers`

Where:

**Xn | SP**

Is the 64-bit name of the general-purpose source register or stack pointer.

**Xd**

Is the 64-bit name of the general-purpose destination register.

## Architectures supported

Supported in the Arm®v8.3-A architecture and later.

## Usage

Authenticate Data address, using key A. This instruction authenticates a data address, using a modifier and key A.

The address is in the general-purpose register that is specified by *xd*.

The modifier is:

- In the general-purpose register or stack pointer that is specified by *Xn|SP* for AUTDA.
- The value zero, for AUTDZA.

If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.23 AUTDB, AUTDZB (A64)

Authenticate Data address, using key B.

### Syntax

AUTDB *Xd, Xn|SP ; AUTDB general registers*

AUTDZB *Xd ; AUTDZB general registers*

Where:

***Xn|SP***

Is the 64-bit name of the general-purpose source register or stack pointer.

***Xd***

Is the 64-bit name of the general-purpose destination register.

## Architectures supported

Supported in the Arm®v8.3-A architecture and later.

## Usage

Authenticate Data address, using key B. This instruction authenticates a data address, using a modifier and key B.

The address is in the general-purpose register that is specified by *xd*.

The modifier is:

- In the general-purpose register or stack pointer that is specified by  $xn|sp$  for AUTDB.
- The value zero, for AUTDZB.

If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.

## Related information

[A64 instructions in alphabetical order](#) on page 664

# 17.24 AUTIA, AUTIZA, AUTIA1716, AUTIASP, AUTIAZ (A64)

Authenticate Instruction address, using key A.

## Syntax

**AUTIA**  $Xd$ ,  $Xn|sp$

**AUTIZA**  $Xd$

**AUTIA1716**

**AUTIASP**

**AUTIAZ**

Where:

**$Xd$**

Is the 64-bit name of the general-purpose destination register.

**$Xn|sp$**

Is the 64-bit name of the general-purpose source register or stack pointer.

## Architectures supported

Supported in the Arm®v8.3-A architecture and later.

## Usage

Authenticate Instruction address, using key A. This instruction authenticates an instruction address, using a modifier and key A.

The address is:

- In the general-purpose register that is specified by  $xd$  for **AUTIA** and **AUTIZA**.
- In X17, for **AUTIA1716**.

- In X30, for `AUTIASP` and `AUTIAZ`.

The modifier is:

- In the general-purpose register or stack pointer that is specified by `Xn|SP` for `AUTIA`.
- The value zero, for `AUTIZA` and `AUTIAZ`.
- In X16, for `AUTIA1716`.
- In SP, for `AUTIASP`.

If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.

## Related information

[A64 instructions in alphabetical order](#) on page 664

# 17.25 AUTIB, AUTIZB, AUTIB1716, AUTIBSP, AUTIBZ (A64)

Authenticate Instruction address, using key B.

## Syntax

`AUTIB Xd, Xn|SP`

`AUTIZB Xd`

`AUTIB1716`

`AUTIBSP`

`AUTIBZ`

Where:

### ***Xd***

Is the 64-bit name of the general-purpose destination register.

### ***Xn|SP***

Is the 64-bit name of the general-purpose source register or stack pointer.

## Architectures supported

Supported in the Arm®v8.3-A architecture and later.

## Usage

Authenticate Instruction address, using key B. This instruction authenticates an instruction address, using a modifier and key B.

The address is:

- In the general-purpose register that is specified by  $x_d$  for `AUTIB` and `AUTIZB`.
- In  $X17$ , for `AUTIB1716`.
- In  $X30$ , for `AUTIBSP` and `AUTIBZ`.

The modifier is:

- In the general-purpose register or stack pointer that is specified by  $x_n | SP$  for `AUTIB`.
- The value zero, for `AUTIZB` and `AUTIBZ`.
- In  $X16$ , for `AUTIB1716`.
- In  $SP$ , for `AUTIBSP`.

If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.26 B.cond

Branch conditionally.

### Syntax

`B.condlabel`

Where:

**`cond`**

Is one of the standard conditions.

**`label`**

Is the program label to be conditionally branched to. Its offset from the address of this instruction, in the range  $\pm 1\text{MB}$ .

### Usage

Branch conditionally to a label at a PC-relative offset, with a hint that this is not a subroutine call or return.

### Related information

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.27 B (A64)

Branch.

### Syntax

`B label`

Where:

#### **label**

Is the program label to be unconditionally branched to. Its offset from the address of this instruction, in the range ±128MB. The branch can be forward or backward within 128MB.

### Usage

Branch causes an unconditional branch to a label at a PC-relative offset, with a hint that this is not a subroutine call or return.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.28 BFC (A64)

Bitfield Clear, leaving other bits unchanged.

This instruction is an alias of `BFM`.

The equivalent instruction is `BFM Wd, WZR, #(-lsb MOD 32), #(width-1)`.

### Syntax

`BFC Wd, #lsb, #width ; 32-bit`

`BFC Xd, #lsb, #width ; 64-bit`

Where:

#### **Wd**

Is the 32-bit name of the general-purpose destination register.

#### **lsb**

Depends on the instruction variant:

**32-bit general registers** Is the bit number of the lsb of the destination bitfield, in the range 0 to 31.

**64-bit general registers** Is the bit number of the lsb of the destination bitfield, in the range 0 to 63.

***width***

Depends on the instruction variant:

**32-bit general registers** Is the width of the bitfield, in the range 1 to 32-*lsb*.

**64-bit general registers** Is the width of the bitfield, in the range 1 to 64-*lsb*.

***xd***

Is the 64-bit name of the general-purpose destination register.

**Architectures supported**

Supported in the Arm®v8.2 architecture and later.

**Related information**

[BFM \(A64\)](#) on page 694

[A64 instructions in alphabetical order](#) on page 664

## 17.29 BFI (A64)

Bitfield Insert.

This instruction is an alias of `BFM`.

The equivalent instruction is `BFI Wd, Wn, #(-lsb MOD 32), #(width-1)`.

**Syntax**

`BFI Wd, Wn, #lsb, #width ; 32-bit`

`BFI Xd, Xn, #lsb, #width ; 64-bit`

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the general-purpose source register.

***lsb***

Depends on the instruction variant:

**32-bit general registers** Is the bit number of the lsb of the destination bitfield, in the range 0 to 31.

**64-bit general registers** Is the bit number of the lsb of the destination bitfield, in the range 0 to 63.

***width***

Depends on the instruction variant:

**32-bit general registers** Is the width of the bitfield, in the range 1 to 32-*lsb*.

**64-bit general registers** Is the width of the bitfield, in the range 1 to 64-*lsb*.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

**Usage**

Bitfield Insert copies any number of low-order bits from a source register into the same number of adjacent bits at any position in the destination register, leaving other bits unchanged.

**Related information**

[BFM \(A64\)](#) on page 694

[A64 instructions in alphabetical order](#) on page 664

## 17.30 BFM (A64)

Bitfield Move.

This instruction is used by the aliases:

- BFC.
- BFI.
- BFXIL.

**Syntax**

BFM *Wd*, *Wn*, #<immr>, #<imms> ; 32-bit

BFM *Xd*, *Xn*, #<immr>, #<imms> ; 64-bit

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the general-purpose source register.

**<immr>**

Depends on the instruction variant:

**32-bit general registers** Is the right rotate amount, in the range 0 to 31.

**64-bit general registers** Is the right rotate amount, in the range 0 to 63.

**<imms>**

Depends on the instruction variant:

**32-bit general registers** Is the leftmost bit number to be moved from the source, in the range 0 to 31.

**64-bit general registers** Is the leftmost bit number to be moved from the source, in the range 0 to 63.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the general-purpose source register.

## Usage

Bitfield Move copies any number of low-order bits from a source register into the same number of adjacent bits at any position in the destination register, leaving other bits unchanged.

## Related information

[BFC \(A64\)](#) on page 692

[BFI \(A64\)](#) on page 693

[BFXIL \(A64\)](#) on page 695

[A64 instructions in alphabetical order](#) on page 664

## 17.31 BFXIL (A64)

Bitfield extract and insert at low end.

This instruction is an alias of `BFM`.

The equivalent instruction is `BFM Wd, Wn, #lsb, #(lsb+width-1)`.

## Syntax

`BFXIL Wd, Wn, #lsb, #width ; 32-bit`

`BFXIL Xd, Xn, #lsb, #width ; 64-bit`

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Wn**

Is the 32-bit name of the general-purpose source register.

***1sb***

Depends on the instruction variant:

**32-bit general registers** Is the bit number of the lsb of the source bitfield, in the range 0 to 31.

**64-bit general registers** Is the bit number of the lsb of the source bitfield, in the range 0 to 63.

***width***

Depends on the instruction variant:

**32-bit general registers** Is the width of the bitfield, in the range 1 to  $32\text{-}1sb$ .

**64-bit general registers** Is the width of the bitfield, in the range 1 to  $64\text{-}1sb$ .

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

**Usage**

Bitfield extract and insert at low end copies any number of low-order bits from a source register into the same number of adjacent bits at the low end in the destination register, leaving other bits unchanged.

**Related information**

[BFM \(A64\)](#) on page 694

[A64 instructions in alphabetical order](#) on page 664

## 17.32 BIC (shifted register) (A64)

Bitwise Bit Clear (shifted register).

**Syntax**

```
BIC Wd, Wn, Wm, {shift #amount} ; 32-bit
```

```
BIC Xd, Xn, Xm, {shift #amount} ; 64-bit
```

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***amount***

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

***shift***

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSL, LSR, ASR, OR ROR.

## Operation

Bitwise Bit Clear (shifted register) performs a bitwise AND of a register value and the complement of an optionally-shifted register value, and writes the result to the destination register.

$Rd = Rn \text{ AND NOT } shift(Rm, amount)$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.33 BICS (shifted register) (A64)

Bitwise Bit Clear (shifted register), setting flags.

### Syntax

BICS *Xd*, *Wn*, *Wm*, {*shift #amount*} ; 32-bit

BICS *Xd*, *Xn*, *Xm*, {*shift #amount*} ; 64-bit

Where:

***Xd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Rm***

Is the 32-bit name of the second general-purpose source register.

***amount***

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

***Rd***

Is the 64-bit name of the general-purpose destination register.

***Rn***

Is the 64-bit name of the first general-purpose source register.

***Rm***

Is the 64-bit name of the second general-purpose source register.

***shift***

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSL, LSR, ASR, OR ROR.

## Operation

Bitwise Bit Clear (shifted register), setting flags, performs a bitwise AND of a register value and the complement of an optionally-shifted register value, and writes the result to the destination register. It updates the condition flags based on the result.

$Rd = Rn \text{ AND NOT } shift(Rm, amount)$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.34 BL (A64)

Branch with Link.

### Syntax

`BL label`

Where:

***label***

Is the program label to be unconditionally branched to. Its offset from the address of this instruction, in the range  $\pm 128\text{MB}$ . The branch can be forward or backward within 128MB.

## Usage

Branch with Link branches to a PC-relative offset, setting the register X30 to PC+4. It provides a hint that this is a subroutine call.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.35 BLR (A64)

Branch with Link to Register.

### Syntax

```
BLR Xn
```

Where:

**Xn**

Is the 64-bit name of the general-purpose register holding the address to be branched to.

### Usage

Branch with Link to Register calls a subroutine at an address in a register, setting register X30 to PC +4.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.36 BLRAA, BLRAAZ, BLRAB, BLRABZ (A64)

Branch with Link to Register, with pointer authentication.

### Syntax

```
BLRAA Xn, Xm|SP ; BLRAA general registers
```

```
BLRAAZ Xn ; BLRAAZ general registers
```

```
BLRAB Xn, Xm|SP ; BLRAB general registers
```

```
BLRABZ Xn ; BLRABZ general registers
```

Where:

**Xn**

Is the 64-bit name of the general-purpose register holding the address to be branched to.

***Xm|SP***

Is the 64-bit name of the general-purpose source register or stack pointer holding the modifier.

## Architectures supported

Supported in the Arm®v8.3-A architecture and later.

## Usage

Branch with Link to Register, with pointer authentication. This instruction authenticates the address in the general-purpose register that is specified by *Xn*, using a modifier and the specified key, and calls a subroutine at the authenticated address, setting register X30 to PC+4.

The modifier is:

- In the general-purpose register or stack pointer that is specified by *Xm|SP* for `BLRAA` and `BLRAB`.
- The value zero, for `BLRAAZ` and `BLRABZ`.

Key A is used for `BLRAA` and `BLRAAZ`, and key B is used for `BLRAB` and `BLRABZ`.

If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a Translation fault is generated.

The authenticated address is not written back to the general-purpose register.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.37 BR (A64)

Branch to Register.

### Syntax

`BR Xm`

Where:

***Xn***

Is the 64-bit name of the general-purpose register holding the address to be branched to.

## Usage

Branch to Register branches unconditionally to an address in a register, with a hint that this is not a subroutine return.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.38 BRAA, BRAAZ, BRAB, BRABZ (A64)

Branch to Register, with pointer authentication.

### Syntax

`BRAA Xn, Xm|SP ; BRAA general registers`

`BRAAZ Xn ; BRAAZ general registers`

`BRAB Xn, Xm|SP ; BRAB general registers`

`BRABZ Xn ; BRABZ general registers`

Where:

#### **Xn**

Is the 64-bit name of the general-purpose register holding the address to be branched to.

#### **Xm|SP**

Is the 64-bit name of the general-purpose source register or stack pointer holding the modifier.

### Architectures supported

Supported in the Arm®v8.3-A architecture and later.

### Usage

Branch to Register, with pointer authentication. This instruction authenticates the address in the general-purpose register that is specified by `Xn`, using a modifier and the specified key, and branches to the authenticated address.

The modifier is:

- In the general-purpose register or stack pointer that is specified by `Xm|SP` for `BRAA` and `BRAB`.
- The value zero, for `BRAAZ` and `BRABZ`.

Key A is used for `BRAA` and `BRAAZ`, and key B is used for `BRAB` and `BRABZ`.

If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a Translation fault is generated.

The authenticated address is not written back to the general-purpose register.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.39 BRK (A64)

Breakpoint instruction.

### Syntax

```
BRK #imm
```

Where:

**imm**

Is a 16-bit unsigned immediate, in the range 0 to 65535.

### Usage

Breakpoint instruction generates a Breakpoint Instruction exception. The PE records the exception in ESR\_ELx, using the EC value 0x3c, and captures the value of the immediate argument in ESR\_ELx.ISS.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.40 CBNZ (A64)

Compare and Branch on Nonzero.

### Syntax

```
CBNZ Wt, label ; 32-bit
```

```
CBNZ Xt, label ; 64-bit
```

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be tested.

**Xt**

Is the 64-bit name of the general-purpose register to be tested.

**label**

Is the program label to be conditionally branched to. Its offset from the address of this instruction, in the range ±1MB.

### Usage

Compare and Branch on Nonzero compares the value in a register with zero, and conditionally branches to a label at a PC-relative offset if the comparison is not equal. It provides a hint that this is not a subroutine call or return. This instruction does not affect the condition flags.

**Related information**[A64 instructions in alphabetical order](#) on page 664

## 17.41 CBZ (A64)

Compare and Branch on Zero.

**Syntax**

```
CBZ Wt, label ; 32-bit
```

```
CBZ Xt, label ; 64-bit
```

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be tested.

***Xt***

Is the 64-bit name of the general-purpose register to be tested.

***label***

Is the program label to be conditionally branched to. Its offset from the address of this instruction, in the range ±1MB.

**Usage**

Compare and Branch on Zero compares the value in a register with zero, and conditionally branches to a label at a PC-relative offset if the comparison is equal. It provides a hint that this is not a subroutine call or return. This instruction does not affect condition flags.

**Related information**[A64 instructions in alphabetical order](#) on page 664

## 17.42 CCMN (immediate) (A64)

Conditional Compare Negative (immediate).

**Syntax**

```
CCMN Wn, #imm, #nzcv, cond ; 32-bit
```

```
CCMN Xn, #imm, #nzcv, cond ; 64-bit
```

Where:

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***imm***

Is a five bit unsigned immediate.

***nzcv***

Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit NZCV condition flags.

***cond***

Is one of the standard conditions.

## Operation

Conditional Compare Negative (immediate) sets the value of the condition flags to the result of the comparison of a register value and a negated immediate value if the condition is TRUE, and an immediate value otherwise.

```
flags = if cond then compare(Rn, #-imm) else #nzcv, where R is either w or x.
```

## Related information

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.43 CCMN (register) (A64)

Conditional Compare Negative (register).

### Syntax

```
CCMN Wn, Wm, #nzcv, cond ; 32-bit
```

```
CCMN Xn, Xm, #nzcv, cond ; 64-bit
```

Where:

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

***nzcv***

Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit NZCV condition flags.

***cond***

Is one of the standard conditions.

## Operation

Conditional Compare Negative (register) sets the value of the condition flags to the result of the comparison of a register value and the inverse of another register value if the condition is TRUE, and an immediate value otherwise.

```
flags = if cond then compare(Rn, -Rm) else #nzcv, where R is either w or x.
```

## Related information

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.44 CCMP (immediate) (A64)

Conditional Compare (immediate).

### Syntax

```
CCMP Wn, #imm, #nzcv, cond ; 32-bit
```

```
CCMP Xn, #imm, #nzcv, cond ; 64-bit
```

Where:

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***imm***

Is a five bit unsigned immediate.

***nzcv***

Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit NZCV condition flags.

***cond***

Is one of the standard conditions.

## Operation

Conditional Compare (immediate) sets the value of the condition flags to the result of the comparison of a register value and an immediate value if the condition is TRUE, and an immediate value otherwise.

`flags = if cond then compare(Rn, #imm) else #nzcv`, where R is either w or x.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.45 CCMP (register) (A64)

Conditional Compare (register).

### Syntax

`CCMP Wn, Wm, #nzcv, cond ; 32-bit`

`CCMP Xn, Xm, #nzcv, cond ; 64-bit`

Where:

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

***nzcv***

Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit NZCV condition flags.

***cond***

Is one of the standard conditions.

## Operation

Conditional Compare (register) sets the value of the condition flags to the result of the comparison of two registers if the condition is TRUE, and an immediate value otherwise.

`flags = if cond then compare(Rn, Rm) else #nzcv`, where R is either w or x.

## Related information

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.46 CINC (A64)

Conditional Increment.

This instruction is an alias of `CSINC`.

The equivalent instruction is `CSINC Rd, Rn, cond`.

### Syntax

`CINC Rd, Rn, cond ; 32-bit`

`CINC Xd, Xn, cond ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the general-purpose source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the general-purpose source register.

**cond**

Is one of the standard conditions, excluding AL and NV.

### Operation

Conditional Increment returns, in the destination register, the value of the source register incremented by 1 if the condition is TRUE, and otherwise returns the value of the source register.

$Rd = \text{if } cond \text{ then } Rn+1 \text{ else } Rn$ , where  $R$  is either  $w$  or  $x$ .

### Related information

[CSINC \(A64\)](#) on page 722

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.47 CINV (A64)

Conditional Invert.

This instruction is an alias of `CSINV`.

The equivalent instruction is `CSINV Rd, Rn, Rn, invert(cond)`.

### Syntax

`CINV Rd, Rn, cond ; 32-bit`

`CINV Xd, Xn, cond ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the general-purpose source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the general-purpose source register.

**cond**

Is one of the standard conditions, excluding AL and NV.

### Operation

Conditional Invert returns, in the destination register, the bitwise inversion of the value of the source register if the condition is TRUE, and otherwise returns the value of the source register.

$Rd = \text{if } cond \text{ then NOT}(Rn) \text{ else } Rn$ , where R is either w or x.

### Related information

[CSINV \(A64\)](#) on page 723

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.48 CLREX (A64)

Clear Exclusive.

### Syntax

`CLREX # {imm}`

Where:

***imm***

Is an optional 4-bit unsigned immediate, in the range 0 to 15, defaulting to 15.

### Usage

Clear Exclusive clears the local monitor of the executing PE.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.49 CLS (A64)

Count leading sign bits.

### Syntax

`CLS Rd, Rn ; 32-bit`

`CLS Xd, Xn ; 64-bit`

Where:

***Rd***

Is the 32-bit name of the general-purpose destination register.

***Rn***

Is the 32-bit name of the general-purpose source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

### Operation

$Rd = \text{CLS}(Rn)$ , where  $R$  is either  $w$  or  $x$ .

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.50 CLZ (A64)

Count leading zero bits.

### Syntax

`CLZ Rd, Rn ; 32-bit`

`CLZ Rd, Xn ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the general-purpose source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the general-purpose source register.

## Operation

`Rd = CLZ (Rn)`, where R is either w or x.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.51 CMN (extended register) (A64)

Compare Negative (extended register).

This instruction is an alias of ADDS (extended register).

The equivalent instruction is `ADDS WZR, Wn|WSP, Wm, {extend #amount}`.

## Syntax

`CMN Wn|WSP, Wm, {extend #amount} ; 32-bit`

`CMN Xn|SP, Rm, {extend #amount} ; 64-bit`

Where:

**Wn | WSP**

Is the 32-bit name of the first source general-purpose register or stack pointer.

**Rm**

Is the 32-bit name of the second general-purpose source register.

**extend**

Is the extension to be applied to the second source operand:

### 32-bit general registers

Can be one of UXTB, UXTH, LSL|UXTW, UXTX, SXTB, SXTH, SXTW OR SXTX.

If  $Rn$  is WSP then `LSL` is preferred rather than `UXTW`, and can be omitted when `amount` is 0. In all other cases `extend` is required and must be `UXTW` rather than `LSL`.

### 64-bit general registers

Can be one of `UXTB`, `UXTH`, `UXTW`, `LSL|UXTX`, `SXTB`, `SXTH`, `SXTW` or `SXTX`.

If  $Rn$  is SP then `LSL` is preferred rather than `UXTX`, and can be omitted when `amount` is 0. In all other cases `extend` is required and must be `UXTX` rather than `LSL`.

#### **$Xn | SP$**

Is the 64-bit name of the first source general-purpose register or stack pointer.

#### **$R$**

Is a width specifier, and can be either `w` or `x`.

#### **$m$**

Is the number [0-30] of the second general-purpose source register or the name ZR (31).

#### **`amount`**

Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0. It must be absent when `extend` is absent, is required when `extend` is `LSL`, and is optional when `extend` is present but not `LSL`.

## Operation

Compare Negative (extended register) adds a register value and a sign or zero-extended register value, followed by an optional left shift amount. The argument that is extended from the  $Rm$  register can be a byte, halfword, word, or doubleword. It updates the condition flags based on the result, and discards the result.

$Rn + LSL(extend(Rm), amount)$ , where  $R$  is either `w` or `x`.

## Usage

**Table 17-5: CMN (64-bit general registers) specifier combinations**

| <b><math>R</math></b> | <b><code>extend</code></b> |
|-----------------------|----------------------------|
| W                     | <code>SXTB</code>          |
| W                     | <code>SXTH</code>          |
| W                     | <code>SXTW</code>          |
| W                     | <code>UXTB</code>          |
| W                     | <code>UXTH</code>          |
| W                     | <code>UXTW</code>          |
| X                     | <code>LSL UXTX</code>      |
| X                     | <code>SXTX</code>          |

## Related information

[ADDS \(extended register\) \(A64\)](#) on page 674

[A64 instructions in alphabetical order](#) on page 664

## 17.52 CMN (immediate) (A64)

Compare Negative (immediate).

This instruction is an alias of `ADDS` (immediate).

The equivalent instruction is `ADDS WZR, Wn|WSP, #imm, {shift}`.

### Syntax

`CMN Wn|WSP, #imm, {shift} ; 32-bit`

`CMN Xn|SP, #imm, {shift} ; 64-bit`

Where:

***Wn|WSP***

Is the 32-bit name of the source general-purpose register or stack pointer.

***Xn|SP***

Is the 64-bit name of the source general-purpose register or stack pointer.

***imm***

Is an unsigned immediate, in the range 0 to 4095.

***shift***

Is the optional left shift to apply to the immediate, defaulting to `LSL #0`, and can be either `LSL #0` or `LSL #12`.

### Operation

Compare Negative (immediate) adds a register value and an optionally-shifted immediate value. It updates the condition flags based on the result, and discards the result.

$Rn + shift(imm)$ , where  $R$  is either  $w$  or  $x$ .

### Related information

[ADDS \(immediate\) \(A64\)](#) on page 676

[A64 instructions in alphabetical order](#) on page 664

## 17.53 CMN (shifted register) (A64)

Compare Negative (shifted register).

This instruction is an alias of `ADDS` (shifted register).

The equivalent instruction is `ADDS WZR, Wn, Wm, {shift #amount}`.

### Syntax

`CMN Wn, Wm, {shift #amount} ; 32-bit`

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

`CMN Xn, Xm, {shift #amount} ; 64-bit`

Where:

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***amount***

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

***shift***

Is the optional shift type to be applied to the second source operand, defaulting to LSL, and can be one of LSL, LSR, or ASR.

## Operation

Compare Negative (shifted register) adds a register value and an optionally-shifted register value. It updates the condition flags based on the result, and discards the result.

`Rn + shift(Rm, amount)`, where *R* is either *w* or *x*.

## Related information

[ADDS \(shifted register\) \(A64\)](#) on page 677

[A64 instructions in alphabetical order](#) on page 664

## 17.54 CMP (extended register) (A64)

Compare (extended register).

This instruction is an alias of `SUBS` (extended register).

The equivalent instruction is `SUBS WZR, Wn|WSP, Wm, {extend #{amount}}`.

## Syntax

`CMP Wn|WSP, Wm, {extend #{amount}} ; 32-bit`

`CMP Xn|SP, Rm, {extend #{amount}} ; 64-bit`

Where:

***Rn | WSP***

Is the 32-bit name of the first source general-purpose register or stack pointer.

***Rm***

Is the 32-bit name of the second general-purpose source register.

***extend***

Is the extension to be applied to the second source operand:

**32-bit general registers**

Can be one of UXTB, UXTH, LSL|UXTW, UXTX, SXTB, SXTH, SXTW or SXTX.

If *Rn* is WSP then LSL is preferred rather than UXTW, and can be omitted when *amount* is 0. In all other cases *extend* is required and must be UXTW rather than LSL.

**64-bit general registers**

Can be one of UXTB, UXTH, UXTW, LSL|UXTX, SXTB, SXTH, SXTW or SXTX.

If *Rn* is SP then LSL is preferred rather than UXTX, and can be omitted when *amount* is 0. In all other cases *extend* is required and must be UXTX rather than LSL.

***Xn | SP***

Is the 64-bit name of the first source general-purpose register or stack pointer.

***R***

Is a width specifier, and can be either w or x.

***m***

Is the number [0-30] of the second general-purpose source register or the name ZR (31).

***amount***

Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0. It must be absent when *extend* is absent, is required when *extend* is LSL, and is optional when *extend* is present but not LSL.

**Operation**

Compare (extended register) subtracts a sign or zero-extended register value, followed by an optional left shift amount, from a register value. The argument that is extended from the *Rm* register can be a byte, halfword, word, or doubleword. It updates the condition flags based on the result, and discards the result.

*Rn* - LSL(*extend(Rm)*, *amount*), where *R* is either w or x.

**Usage**

**Table 17-6: CMP (64-bit general registers) specifier combinations**

| <b><i>R</i></b> | <b><i>extend</i></b> |
|-----------------|----------------------|
| W               | SXTB                 |
| W               | SXTH                 |

| <b>R</b> | <b>extend</b> |
|----------|---------------|
| W        | SXTW          |
| W        | UXTB          |
| W        | UXTH          |
| W        | UXTW          |
| X        | LSL   UXTX    |
| X        | SXTX          |

### Related information

[SUBS \(extended register\) \(A64\)](#) on page 793

[A64 instructions in alphabetical order](#) on page 664

## 17.55 CMP (immediate) (A64)

Compare (immediate).

This instruction is an alias of `SUBS` (immediate).

The equivalent instruction is `SUBS WZR, Wn|WSP, #imm , {shift}`.

### Syntax

`CMP Wn|WSP, #imm, {shift} ; 32-bit`

`CMP Xn|SP, #imm, {shift} ; 64-bit`

Where:

#### **Wn | WSP**

Is the 32-bit name of the source general-purpose register or stack pointer.

#### **Xn | SP**

Is the 64-bit name of the source general-purpose register or stack pointer.

#### **imm**

Is an unsigned immediate, in the range 0 to 4095.

#### **shift**

Is the optional left shift to apply to the immediate, defaulting to `LSL #0`, and can be either `LSL #0` or `LSL #12`.

### Operation

Compare (immediate) subtracts an optionally-shifted immediate value from a register value. It updates the condition flags based on the result, and discards the result.

$Rn - shift(imm)$ , where  $R$  is either `w` or `x`.

## Related information

[SUBS \(immediate\) \(A64\)](#) on page 795

[A64 instructions in alphabetical order](#) on page 664

## 17.56 CMP (shifted register) (A64)

Compare (shifted register).

This instruction is an alias of `SUBS` (shifted register).

The equivalent instruction is `SUBS WZR, Rn, Rm, {shift #amount}`.

### Syntax

`CMP Rn, Rm, {shift #amount} ; 32-bit`

`CMP Xn, Xm, {shift #amount} ; 64-bit`

Where:

**Rn**

Is the 32-bit name of the first general-purpose source register.

**Rm**

Is the 32-bit name of the second general-purpose source register.

**amount**

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

**Xn**

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register.

**shift**

Is the optional shift type to be applied to the second source operand, defaulting to LSL, and can be one of LSL, LSR, or ASR.

### Operation

Compare (shifted register) subtracts an optionally-shifted register value from a register value. It updates the condition flags based on the result, and discards the result.

$Rn - shift(Rm, amount)$ , where R is either w or x.

## Related information

[SUBS \(shifted register\) \(A64\)](#) on page 796

[A64 instructions in alphabetical order](#) on page 664

## 17.57 CNEG (A64)

Conditional Negate.

This instruction is an alias of `CSNEG`.

The equivalent instruction is `CSNEG Rd, Rn, cond ; invert(cond)`.

### Syntax

`CNEG Rd, Rn, cond ; 32-bit`

`CNEG Xd, Xn, cond ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the general-purpose source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the general-purpose source register.

**cond**

Is one of the standard conditions, excluding AL and NV.

### Operation

Conditional Negate returns, in the destination register, the negated value of the source register if the condition is TRUE, and otherwise returns the value of the source register.

$Rd = \text{if } cond \text{ then } -Rn \text{ else } Rn$ , where R is either w or x.

## Related information

[CSNEG \(A64\)](#) on page 724

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.58 CRC32B, CRC32H, CRC32W, CRC32X (A64)

CRC32 checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register.

### Syntax

`CRC32B Wd, Wn, Wm ; Wd = CRC32 (Wn, Rm[<7:0>])`

`CRC32H Wd, Wn, Wm ; Wd = CRC32 (Wn, Rm[<15:0>])`

`CRC32W Wd, Wn, Wm ; Wd = CRC32 (Wn, Rm[<31:0>])`

`CRC32X Wd, Wn, Xm ; Wd = CRC32 (Wn, Rm[<63:0>])`

Where:

***Wm***

Is the 32-bit name of the general-purpose data source register.

***Xm***

Is the 64-bit name of the general-purpose data source register.

***Wd***

Is the 32-bit name of the general-purpose accumulator output register.

***Wn***

Is the 32-bit name of the general-purpose accumulator input register.

### Operation

This instruction takes an input CRC value in the first source operand, performs a CRC on the input value in the second source operand, and returns the output CRC value. The second source operand can be 8, 16, 32, or 64 bits. To align with common usage, the bit order of the values is reversed as part of the operation, and the polynomial 0x04C11DB7 is used for the CRC calculation.



ID\_AA64ISAR0\_EL1.CRC32 indicates whether this instruction is supported. See [ID\\_AA64ISAR0\\_EL1](#) in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

---

`Wd = CRC32 (Wn, Rm<n:0>) // n = 7, 15, 31, 63.`

### Architectures supported

Supported in architecture Arm®v8.1 and later. Optionally supported in Armv8-A.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.59 CRC32CB, CRC32CH, CRC32CW, CRC32CX (A64)

CRC32C checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register.

### Syntax

`CRC32CB Wd, Wn, Wm ; Wd = CRC32C(Wn, Rm[<7:0>])`

`CRC32CH Wd, Wn, Wm ; Wd = CRC32C(Wn, Rm[<15:0>])`

`CRC32CW Wd, Wn, Wm ; Wd = CRC32C(Wn, Rm[<31:0>])`

`CRC32CX Wd, Wn, Xm ; Wd = CRC32C(Wn, Rm[<63:0>])`

Where:

**Wm**

Is the 32-bit name of the general-purpose data source register.

**Xm**

Is the 64-bit name of the general-purpose data source register.

**Wd**

Is the 32-bit name of the general-purpose accumulator output register.

**Wn**

Is the 32-bit name of the general-purpose accumulator input register.

### Operation

This instruction takes an input CRC value in the first source operand, performs a CRC on the input value in the second source operand, and returns the output CRC value. The second source operand can be 8, 16, 32, or 64 bits. To align with common usage, the bit order of the values is reversed as part of the operation, and the polynomial 0x1EDC6F41 is used for the CRC calculation.



ID\_AA64ISAR0\_EL1.CRC32 indicates whether this instruction is supported. See [ID\\_AA64ISAR0\\_EL1](#) in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

---

`Wd = CRC32C(Wn, Rm<n:0>) // n = 7, 15, 31, 63.`

### Architectures supported

Supported in architecture Arm®v8.1 and later. Optionally supported in Armv8-A.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.60 CSEL (A64)

Conditional Select.

### Syntax

`CSEL Rd, Rn, Rm, cond ; 32-bit`

`CSEL Xd, Xn, Xm, cond ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the first general-purpose source register.

**Rm**

Is the 32-bit name of the second general-purpose source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register.

**cond**

Is one of the standard conditions.

### Operation

Conditional Select returns, in the destination register, the value of the first source register if the condition is TRUE, and otherwise returns the value of the second source register.

`Rd = if cond then Rn else Rm`, where R is either w or x.

### Related information

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.61 CSET (A64)

Conditional Set.

This instruction is an alias of CSINC.

The equivalent instruction is `CSINC Rd, RZR, WZR, invert(cond)`.

## Syntax

CSET *Wd*, *cond* ; 32-bit

CSET *Xd*, *cond* ; 64-bit

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***cond***

Is one of the standard conditions, excluding AL and NV.

## Operation

Conditional Set sets the destination register to 1 if the condition is TRUE, and otherwise sets it to 0.

*Rd* = if *cond* then 1 else 0, where *R* is either *w* or *x*.

## Related information

[CSINC \(A64\)](#) on page 722

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.62 CSETM (A64)

Conditional Set Mask.

This instruction is an alias of `CSINV`.

The equivalent instruction is `CSINV Wd, WZR, WZR, invert(cond)`.

## Syntax

CSETM *Wd*, *cond* ; 32-bit

CSETM *Xd*, *cond* ; 64-bit

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***cond***

Is one of the standard conditions, excluding AL and NV.

**Operation**

Conditional Set Mask sets all bits of the destination register to 1 if the condition is TRUE, and otherwise sets all bits to 0.

$Rd = \text{if } cond \text{ then } -1 \text{ else } 0$ , where  $R$  is either  $w$  or  $x$ .

**Related information**

[CSINV \(A64\)](#) on page 723

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.63 CSINC (A64)

Conditional Select Increment.

This instruction is used by the aliases:

- CINC.
- CSET.

**Syntax**

CSINC  $Wd, Wn, Wm, cond ;$  32-bit

CSINC  $Xd, Xn, Xm, cond ;$  64-bit

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

***cond***

Is one of the standard conditions.

## Operation

Conditional Select Increment returns, in the destination register, the value of the first source register if the condition is TRUE, and otherwise returns the value of the second source register incremented by 1.

$Rd = \text{if } cond \text{ then } Rn \text{ else } (Rm + 1)$ , where  $R$  is either w or x.

## Related information

[CINC \(A64\)](#) on page 707

[CSET \(A64\)](#) on page 720

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.64 CSINV (A64)

Conditional Select Invert.

This instruction is used by the aliases:

- CINV.
- CSETM.

## Syntax

CSINV *Wd*, *Wn*, *Wm*, *cond* ; 32-bit

CSINV *Xd*, *Xn*, *Xm*, *cond* ; 64-bit

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register.

**cond**

Is one of the standard conditions.

## Operation

Conditional Select Invert returns, in the destination register, the value of the first source register if the condition is TRUE, and otherwise returns the bitwise inversion value of the second source register.

`Rd = if cond then Rn else NOT (Rm)`, where R is either w or x.

## Related information

[CINV \(A64\)](#) on page 707

[CSETM \(A64\)](#) on page 721

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.65 CSNEG (A64)

Conditional Select Negation.

This instruction is used by the alias CNEG.

## Syntax

`CSNEG Wd, Wn, Wm, cond ; 32-bit`

`CSNEG Xd, Xn, Xm, cond ; 64-bit`

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Wn**

Is the 32-bit name of the first general-purpose source register.

**Wm**

Is the 32-bit name of the second general-purpose source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register.

***cond***

Is one of the standard conditions.

**Operation**

Conditional Select Negation returns, in the destination register, the value of the first source register if the condition is TRUE, and otherwise returns the negated value of the second source register.

*Rd* = if *cond* then *Rn* else  $-R_m$ , where *R* is either *w* or *x*.

**Related information**

[CNEG \(A64\)](#) on page 717

[Condition code suffixes](#) on page 138

[A64 instructions in alphabetical order](#) on page 664

## 17.66 DC (A64)

Data Cache operation.

This instruction is an alias of *sys*.

The equivalent instruction is *sys #op1, c7, cm, #op2, xt*.

**Syntax**

`DC <dc_op>, xt`

Where:

**<dc\_op>**

Is a DC instruction name, as listed for the DC system instruction group, and can be one of the values shown in Usage.

**op1**

Is a 3-bit unsigned immediate, in the range 0 to 7.

**cm**

Is a name *cm*, with *m* in the range 0 to 15.

**op2**

Is a 3-bit unsigned immediate, in the range 0 to 7.

**xt**

Is the 64-bit name of the general-purpose source register.

**Usage**

Data Cache operation. For more information, see *A64 system instructions for cache maintenance* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The following table shows the valid specifier combinations:

**Table 17-7: SYS parameter values corresponding to DC operations**

| <dc_op> | <i>op1</i> | <i>Cm</i> | <i>op2</i> |
|---------|------------|-----------|------------|
| CISW    | 0          | 14        | 2          |
| CIVAC   | 3          | 14        | 1          |
| CSW     | 0          | 10        | 2          |
| CVAC    | 3          | 10        | 1          |
| CVAP    | 3          | 12        | 1          |
| CVAU    | 3          | 11        | 1          |
| ISW     | 0          | 6         | 2          |
| IVAC    | 0          | 6         | 1          |
| ZVA     | 3          | 4         | 1          |

### Related information

[SYS \(A64\)](#) on page 800

[A64 instructions in alphabetical order](#) on page 664

## 17.67 DCPS1 (A64)

Debug Change PE State to EL1.

### Syntax

DCPS1 #*{imm}*

Where:

*imm*

Is an optional 16-bit unsigned immediate, in the range 0 to 65535, defaulting to 0.

### Usage

Debug Change PE State to EL1, when executed in Debug state:

- If executed at EL0 changes the current Exception level and SP to EL1 using SP\_EL1.
- Otherwise, if executed at ELx, selects SP\_ELx.

The target exception level of a DCPS1 instruction is:

- EL1 if the instruction is executed at EL0.
- Otherwise, the Exception level at which the instruction is executed.

When the target Exception level of a DCPS1 instruction is ELx, on executing this instruction:

- *ELR\_ELx* becomes **UNKNOWN**.
- *SPSR\_ELx* becomes **UNKNOWN**.
- *ESR\_ELx* becomes **UNKNOWN**.

- $DLR\_EL0$  and  $DSPSR\_EL0$  become **UNKNOWN**.
- The endianness is set according to  $SCTRLR\_ELx.EE$ .

This instruction is **UNDEFINED** at EL0 in Non-secure state if EL2 is implemented and  $HCR\_EL2.TGE == 1$ .

This instruction is always undefined in Non-debug state.

For more information on the operation of the DCPSn instructions, see `DCPS` in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.68 DCPS2 (A64)

Debug Change PE State to EL2.

### Syntax

`DCPS2 # {imm}`

Where:

#### **imm**

Is an optional 16-bit unsigned immediate, in the range 0 to 65535, defaulting to 0.

### Usage

Debug Change PE State to EL2, when executed in Debug state:

- If executed at EL0 or EL1 changes the current Exception level and SP to EL2 using  $SP\_EL2$ .
- Otherwise, if executed at  $ELx$ , selects  $SP\_ELx$ .

The target exception level of a DCPS2 instruction is:

- EL2 if the instruction is executed at an exception level that is not EL3.
- EL3 if the instruction is executed at EL3.

When the target Exception level of a DCPS2 instruction is  $ELx$ , on executing this instruction:

- $ELR\_ELx$  becomes **UNKNOWN**.
- $SPSR\_ELx$  becomes **UNKNOWN**.
- $ESR\_ELx$  becomes **UNKNOWN**.
- $DLR\_EL0$  and  $DSPSR\_EL0$  become **UNKNOWN**.
- The endianness is set according to  $SCTRLR\_ELx.EE$ .

This instruction is undefined at the following exception levels:

- All exception levels if EL2 is not implemented.
- At EL0 and EL1 in Secure state if EL2 is implemented.

This instruction is always undefined in Non-debug state.

For more information on the operation of the DCPSn instructions, see DCPS in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.69 DCPS3 (A64)

Debug Change PE State to EL3.

### Syntax

DCPS3 #*imm*

Where:

*imm*

Is an optional 16-bit unsigned immediate, in the range 0 to 65535, defaulting to 0.

### Usage

Debug Change PE State to EL3, when executed in Debug state:

- If executed at EL3 selects SP\_EL3.
- Otherwise, changes the current Exception level and SP to EL3 using SP\_EL3.

The target exception level of a DCPS3 instruction is EL3.

On executing a DCPS3 instruction:

- *ELR\_EL3* becomes **UNKNOWN**.
- *SPSR\_EL3* becomes **UNKNOWN**.
- *ESR\_EL3* becomes **UNKNOWN**.
- *DLR\_EL0* and *DSPSR\_EL0* become **UNKNOWN**.
- The endianness is set according to *SCTRLR\_EL3.EE*.

This instruction is undefined at all exception levels if either:

- *EDSCR.SDD == 1*.
- EL3 is not implemented.

This instruction is always undefined in Non-debug state.

For more information on the operation of the DCPSn instructions, see DCPS in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.70 DMB (A64)

Data Memory Barrier.

### Syntax

`DMB option | #imm`

Where:

#### **option**

Specifies the limitation on the barrier operation. Values are:

##### **SY**

Full system is the required shareability domain, reads and writes are the required access types in both Group A and Group B. This option is referred to as the full system DMB.

##### **ST**

Full system is the required shareability domain, writes are the required access type in both Group A and Group B.

##### **LD**

Full system is the required shareability domain, reads are the required access type in Group A, and reads and writes are the required access types in Group B.

##### **ISH**

Inner Shareable is the required shareability domain, reads and writes are the required access types in both Group A and Group B.

##### **ISHST**

Inner Shareable is the required shareability domain, writes are the required access type in both Group A and Group B.

##### **ISHLD**

Inner Shareable is the required shareability domain, reads are the required access type in Group A, and reads and writes are the required access types in Group B.

##### **NSH**

Non-shareable is the required shareability domain, reads and writes are the required access types in both Group A and Group B.

##### **NSHST**

Non-shareable is the required shareability domain, writes are the required access type in both Group A and Group B.

**NSHLD**

Non-shareable is the required shareability domain, reads are the required access type in Group A, and reads and writes are the required access types in Group B.

**OSH**

Outer Shareable is the required shareability domain, reads and writes are the required access types in both Group A and Group B.

**OSHST**

Outer Shareable is the required shareability domain, writes are the required access type in both Group A and Group B.

**OSHLD**

Outer Shareable is the required shareability domain, reads are the required access type in Group A, and reads and writes are the required access types in Group B.

***imm***

Is a 4-bit unsigned immediate, in the range 0 to 15.

## Usage

Data Memory Barrier is a memory barrier that ensures the ordering of observations of memory accesses, see *Data Memory Barrier* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.71 DRPS (A64)

Debug restore process state.

### Syntax

DRPS

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.72 DSB (A64)

Data Synchronization Barrier.

### Syntax

DSB *option*|#*imm*

Where:

***option***

Specifies the limitation on the barrier operation. Values are:

**SY**

Full system is the required shareability domain, reads and writes are the required access types in both Group A and Group B. This option is referred to as the full system DMB.

**ST**

Full system is the required shareability domain, writes are the required access type in both Group A and Group B.

**LD**

Full system is the required shareability domain, reads are the required access type in Group A, and reads and writes are the required access types in Group B.

**ISH**

Inner Shareable is the required shareability domain, reads and writes are the required access types in both Group A and Group B.

**ISHST**

Inner Shareable is the required shareability domain, writes are the required access type in both Group A and Group B.

**ISHLD**

Inner Shareable is the required shareability domain, reads are the required access type in Group A, and reads and writes are the required access types in Group B.

**NSH**

Non-shareable is the required shareability domain, reads and writes are the required access types in both Group A and Group B.

**NSHST**

Non-shareable is the required shareability domain, writes are the required access type in both Group A and Group B.

**NSHLD**

Non-shareable is the required shareability domain, reads are the required access type in Group A, and reads and writes are the required access types in Group B.

**OSH**

Outer Shareable is the required shareability domain, reads and writes are the required access types in both Group A and Group B.

**OSHST**

Outer Shareable is the required shareability domain, writes are the required access type in both Group A and Group B.

**OSHLD**

Outer Shareable is the required shareability domain, reads are the required access type in Group A, and reads and writes are the required access types in Group B.

***imm***

Is a 4-bit unsigned immediate, in the range 0 to 15.

## Usage

Data Synchronization Barrier is a memory barrier that ensures the completion of memory accesses, see *Data Synchronization Barrier* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.73 EON (shifted register) (A64)

Bitwise Exclusive OR NOT (shifted register).

### Syntax

EON *Wd*, *Wn*, *Wm*, {*shift #amount*} ; 32-bit

EON *Xd*, *Xn*, *Xm*, {*shift #amount*} ; 64-bit

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***amount***

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

***shift***

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSL, LSR, ASR, or ROR.

## Operation

Bitwise Exclusive OR NOT (shifted register) performs a bitwise Exclusive OR NOT of a register value and an optionally-shifted register value, and writes the result to the destination register.

$Rd = Rn \text{ EOR NOT } shift(Rm, amount)$ , where  $R$  is either w or x.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.74 EOR (immediate) (A64)

Bitwise Exclusive OR (immediate).

### Syntax

EOR *Wd|WSP, Wn, #imm* ; 32-bit

EOR *Xd|SP, Xn, #imm* ; 64-bit

Where:

***Wd|WSP***

Is the 32-bit name of the destination general-purpose register or stack pointer.

***Wn***

Is the 32-bit name of the general-purpose source register.

***imm***

The bitmask immediate.

***Xd|SP***

Is the 64-bit name of the destination general-purpose register or stack pointer.

***Xn***

Is the 64-bit name of the general-purpose source register.

## Operation

Bitwise Exclusive OR (immediate) performs a bitwise Exclusive OR of a register value and an immediate value, and writes the result to the destination register.

$Rd = Rn \text{ EOR } imm$ , where  $R$  is either w or x.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.75 EOR (shifted register) (A64)

Bitwise Exclusive OR (shifted register).

### Syntax

`EOR Wd, Wn, Wm, {shift #amount} ; 32-bit`

`EOR Xd, Xn, Xm, {shift #amount} ; 64-bit`

Where:

**`Wd`**

Is the 32-bit name of the general-purpose destination register.

**`Wn`**

Is the 32-bit name of the first general-purpose source register.

**`Wm`**

Is the 32-bit name of the second general-purpose source register.

**`amount`**

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

**`Xd`**

Is the 64-bit name of the general-purpose destination register.

**`Xn`**

Is the 64-bit name of the first general-purpose source register.

**`Xm`**

Is the 64-bit name of the second general-purpose source register.

**`shift`**

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSL, LSR, ASR, or ROR.

### Operation

Bitwise Exclusive OR (shifted register) performs a bitwise Exclusive OR of a register value and an optionally-shifted register value, and writes the result to the destination register.

$Rd = Rn \text{ EOR } shift(Rm, amount)$ , where  $R$  is either `w` or `x`.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.76 ERET (A64)

Returns from an exception. It restores the processor state based on SPSR\_ELn and branches to ELR\_ELn, where n is the current exception level..

### Syntax

`ERET`

### Usage

Exception Return using the ELR and SPSR for the current Exception level. When executed, the PE restores PSTATE from the SPSR, and branches to the address held in the ELR.

The PE checks the SPSR for the current Exception level for an illegal return event. See *Illegal return events from AArch64 state* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

`ERET` is undefined at EL0.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.77 ERETAAC, ERETAB (A64)

Exception Return, with pointer authentication.

### Syntax

`EREAA`

`ERETAB`

### Architectures supported

Supported in the Arm®v8.3-A architecture and later.

### Usage

Exception Return, with pointer authentication. This instruction authenticates the address in ELR, using SP as the modifier and the specified key, the PE restores PSTATE from the SPSR for the current Exception level, and branches to the authenticated address.

Key A is used for `EREAA`, and key B is used for `ERETAB`.

If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a Translation fault is generated.

The authenticated address is not written back to ELR.

The PE checks the SPSR for the current Exception level for an illegal return event. See *Illegal return events from AArch64 state* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

`ERET` is undefined at ELO.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.78 ESB (A64)

Error Synchronization Barrier.

### Syntax

`ESB`

### Architectures supported

Supported in the Arm®v8.2 architecture and later.

### Usage

Error Synchronization Barrier.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.79 EXTR (A64)

Extract register.

This instruction is used by the alias `ROR` (immediate).

### Syntax

`EXTR Wd, Wn, Wm, #1sb ; 32-bit`

`EXTR Xd, Xn, Xm, #1sb ; 64-bit`

Where:

**`Wd`**

Is the 32-bit name of the general-purpose destination register.

**`Wn`**

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***lsb***

Depends on the instruction variant:

**32-bit general registers** Is the least significant bit position from which to extract, in the range 0 to 31.

**64-bit general registers** Is the least significant bit position from which to extract, in the range 0 to 63.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

**Usage**

Extract register extracts a register from a pair of registers.

**Related information**

[ROR \(immediate\) \(A64\)](#) on page 776

[A64 instructions in alphabetical order](#) on page 664

## 17.80 HINT (A64)

Hint instruction.

**Syntax**

```
HINT #imm ; Hints 6 and 7
```

```
HINT #imm ; Hints 8 to 15, and 24 to 127
```

```
HINT #imm ; Hints 17 to 23
```

Where:

***imm***

Is a 7-bit unsigned immediate, in the range 0 to 127, but excludes the following:

**0**

NOP.

**1**

YIELD.

**2**  
WFE.  
**3**  
WFI.  
**4**  
SEV.  
**5**  
SEVL.

## Usage

Hint instruction is for the instruction set space that is reserved for architectural hint instructions.

The encoding variants described here are unallocated in this revision of the architecture, and behave as a `nop`. These encodings might be allocated to other hint functionality in future revisions of the architecture.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.81 HLT (A64)

Halt instruction.

### Syntax

`HLT #imm`

Where:

***imm***

Is a 16-bit unsigned immediate, in the range 0 to 65535.

## Usage

Halt instruction generates a Halt Instruction debug event.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.82 HVC (A64)

Hypervisor call to allow OS code to call the Hypervisor. It generates an exception targeting exception level 2 (EL2).

### Syntax

`HVC #imm`

Where:

***imm***

Is a 16-bit unsigned immediate, in the range 0 to 65535. This value is made available to the handler in the Exception Syndrome Register.

### Usage

Hypervisor Call causes an exception to EL2. Non-secure software executing at EL1 can use this instruction to call the hypervisor to request a service.

The `hvc` instruction is undefined:

- At EL0, and Secure EL1.
- When SCR\_EL3.HCE is set to 0.

On executing an `hvc` instruction, the PE records the exception as a Hypervisor Call exception in `ESR_ELx`, using the EC value 0x16, and the value of the immediate argument.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.83 IC (A64)

Instruction Cache operation.

This instruction is an alias of `sys`.

The equivalent instruction is `sys #op1, c7, cm, #op2, {xt}`.

### Syntax

`IC <ic_op>, {xt}`

Where:

***<ic\_op>***

Is an IC instruction name, as listed for the IC system instruction pages, and can be one of the values shown in Usage.

***op1***

Is a 3-bit unsigned immediate, in the range 0 to 7.

***Cm***

Is a name *Cm*, with *m* in the range 0 to 15.

***op2***

Is a 3-bit unsigned immediate, in the range 0 to 7.

***xt***

Is the 64-bit name of the optional general-purpose source register, defaulting to 31.

**Usage**

Instruction Cache operation. For more information, see *A64 system instructions for cache maintenance* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The following table shows the valid specifier combinations:

**Table 17-8: SYS parameter values corresponding to IC operations**

| <ic_op> | <i>op1</i> | <i>Cm</i> | <i>op2</i> |
|---------|------------|-----------|------------|
| IALLU   | 0          | 5         | 0          |
| IALLUIS | 0          | 1         | 0          |
| IVAU    | 3          | 5         | 1          |

**Related information**

[SYS \(A64\)](#) on page 800

[A64 instructions in alphabetical order](#) on page 664

## 17.84 ISB (A64)

Instruction Synchronization Barrier.

**Syntax**

`ISB {option|#imm}`

Where:

***option***

Specifies an optional limitation on the barrier operation. Values are:

***SY***

Full system barrier operation. Can be omitted.

***imm***

Is an optional 4-bit unsigned immediate, in the range 0 to 15, defaulting to 15.

## Usage

Instruction Synchronization Barrier flushes the pipeline in the PE, so that all instructions following the `ISB` are fetched from cache or memory, after the instruction has been completed. It ensures that the effects of context changing operations executed before the `ISB` instruction are visible to the instructions fetched after the `ISB`. Context changing operations include changing the ASID, TLB maintenance instructions, and all changes to the System registers. In addition, any branches that appear in program order after the `ISB` instruction are written into the branch prediction logic with the context that is visible after the `ISB` instruction. This is needed to ensure correct execution of the instruction stream. For more information, see *Instruction Synchronization Barrier (ISB)* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.85 LSL (register) (A64)

Logical Shift Left (register).

This instruction is an alias of `LSLV`.

The equivalent instruction is `LSLV Wd, Wn, Wm`.

### Syntax

`LSL Wd, Wn, Wm ; 32-bit`

`LSL Xd, Xn, Xm ; 64-bit`

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Wn**

Is the 32-bit name of the first general-purpose source register.

**Wm**

Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in its bottom 5 bits.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in its bottom 6 bits.

## Operation

Logical Shift Left (register) shifts a register value left by a variable number of bits, shifting in zeros, and writes the result to the destination register. The remainder obtained by dividing the second source register by the data size defines the number of bits by which the first source register is left-shifted.

$Rd = LSL(Rn, Rm)$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[LSLV \(A64\)](#) on page 743

[A64 instructions in alphabetical order](#) on page 664

## 17.86 LSL (immediate) (A64)

Logical Shift Left (immediate).

This instruction is an alias of `UBFM`.

The equivalent instruction is `UBFM Wd, Wn, #(-shift MOD 32), #(31-shift)`.

### Syntax

`LSL Wd, Wn, #shift ; 32-bit`

`LSL Xd, Xn, #shift ; 64-bit`

Where:

#### **Wd**

Is the 32-bit name of the general-purpose destination register.

#### **Wn**

Is the 32-bit name of the general-purpose source register.

#### **shift**

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31.

**64-bit general registers** Is the shift amount, in the range 0 to 63.

#### **Xd**

Is the 64-bit name of the general-purpose destination register.

#### **Xn**

Is the 64-bit name of the general-purpose source register.

## Operation

Logical Shift Left (immediate) shifts a register value left by an immediate number of bits, shifting in zeros, and writes the result to the destination register.

$Rd = LSL(Rn, shift)$ , where  $R$  is either w or x.

## Related information

[UBFM \(A64\)](#) on page 807

[A64 instructions in alphabetical order](#) on page 664

## 17.87 LSLV (A64)

Logical Shift Left Variable.

This instruction is used by the alias `LSL` (register).

### Syntax

`LSLV Rd, Rn, Rm ; 32-bit`

`LSLV Xd, Xn, Xm ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the first general-purpose source register.

**Rm**

Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in its bottom 5 bits.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in its bottom 6 bits.

## Operation

Logical Shift Left Variable shifts a register value left by a variable number of bits, shifting in zeros, and writes the result to the destination register. The remainder obtained by dividing the second source register by the data size defines the number of bits by which the first source register is left-shifted.

$Rd = LSL(Rn, Rm)$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[LSL \(register\) \(A64\)](#) on page 741

[A64 instructions in alphabetical order](#) on page 664

## 17.88 LSR (register) (A64)

Logical Shift Right (register).

This instruction is an alias of `LSRV`.

The equivalent instruction is `LSRV Rd, Rn, Rm`.

### Syntax

`LSR Rd, Rn, Rm ; 32-bit`

`LSR Xd, Xn, Xm ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the first general-purpose source register.

**Rm**

Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in its bottom 5 bits.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in its bottom 6 bits.

### Operation

Logical Shift Right (register) shifts a register value right by a variable number of bits, shifting in zeros, and writes the result to the destination register. The remainder obtained by dividing the second source register by the data size defines the number of bits by which the first source register is right-shifted.

$Rd = LSR(Rn, Rm)$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[LSRV \(A64\)](#) on page 746

[A64 instructions in alphabetical order](#) on page 664

## 17.89 LSR (immediate) (A64)

Logical Shift Right (immediate).

This instruction is an alias of `UBFM`.

The equivalent instruction is `UBFM Rd, Rn, #shift, #31`.

### Syntax

`LSR Rd, Rn, #shift ; 32-bit`

`LSR Xd, Xn, #shift ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the general-purpose source register.

**shift**

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31.

**64-bit general registers** Is the shift amount, in the range 0 to 63.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the general-purpose source register.

### Operation

Logical Shift Right (immediate) shifts a register value right by an immediate number of bits, shifting in zeros, and writes the result to the destination register.

$Rd = \text{LSR}(Rn, shift)$ , where  $R$  is either `w` or `x`.

### Related information

[UBFM \(A64\)](#) on page 807

[A64 instructions in alphabetical order](#) on page 664

## 17.90 LSRV (A64)

Logical Shift Right Variable.

This instruction is used by the alias `LSR` (register).

### Syntax

`LSRV Rd, Rn, Rm ; 32-bit`

`LSRV Xd, Xn, Xm ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the first general-purpose source register.

**Rm**

Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in its bottom 5 bits.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in its bottom 6 bits.

### Operation

Logical Shift Right Variable shifts a register value right by a variable number of bits, shifting in zeros, and writes the result to the destination register. The remainder obtained by dividing the second source register by the data size defines the number of bits by which the first source register is right-shifted.

$Rd = LSR(Rn, Rm)$ , where  $R$  is either `w` or `x`.

### Related information

[LSR \(register\) \(A64\)](#) on page 744

[A64 instructions in alphabetical order](#) on page 664

## 17.91 MADD (A64)

Multiply-Add.

This instruction is used by the alias `MUL`.

### Syntax

`MADD Rd, Rn, Rm, Ra ; 32-bit`

`MADD Xd, Xn, Xm, Xa ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the first general-purpose source register holding the multiplicand.

**Rm**

Is the 32-bit name of the second general-purpose source register holding the multiplier.

**Ra**

Is the 32-bit name of the third general-purpose source register holding the addend.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register holding the multiplicand.

**Xm**

Is the 64-bit name of the second general-purpose source register holding the multiplier.

**Xa**

Is the 64-bit name of the third general-purpose source register holding the addend.

### Operation

Multiply-Add multiplies two register values, adds a third register value, and writes the result to the destination register.

$Rd = Ra + Rn * Rm$ , where  $R$  is either `w` or `x`.

### Related information

[MUL \(A64\)](#) on page 759

[A64 instructions in alphabetical order](#) on page 664

## 17.92 MNEG (A64)

Multiply-Negate.

This instruction is an alias of `MSUB`.

The equivalent instruction is `MSUB Rd, Rn, Rm, WZR`.

### Syntax

`MNEG Rd, Rn, Rm ; 32-bit`

`MNEG Xd, Xn, Xm ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the first general-purpose source register holding the multiplicand.

**Rm**

Is the 32-bit name of the second general-purpose source register holding the multiplier.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register holding the multiplicand.

**Xm**

Is the 64-bit name of the second general-purpose source register holding the multiplier.

### Operation

Multiply-Negate multiplies two register values, negates the product, and writes the result to the destination register.

$Rd = - (Rn * Rm)$ , where R is either w or x.

### Related information

[MSUB \(A64\)](#) on page 758

[A64 instructions in alphabetical order](#) on page 664

## 17.93 MOV (to or from SP) (A64)

Move between register and stack pointer.

This instruction is an alias of `ADD` (immediate).

The equivalent instruction is ADD *Wd|WSP*, *Wn|WSP*, #0.

## Syntax

MOV *Wd|WSP*, *Wn|WSP* ; 32-bit

MOV *Xd|SP*, *Xn|SP* ; 64-bit

Where:

***Wd|WSP***

Is the 32-bit name of the destination general-purpose register or stack pointer.

***Wn|WSP***

Is the 32-bit name of the source general-purpose register or stack pointer.

***Xd|SP***

Is the 64-bit name of the destination general-purpose register or stack pointer.

***Xn|SP***

Is the 64-bit name of the source general-purpose register or stack pointer.

## Operation

Rd = Rn, where R is either W or X.

## Related information

[ADD \(immediate\) \(A64\)](#) on page 672

[A64 instructions in alphabetical order](#) on page 664

## 17.94 MOV (inverted wide immediate) (A64)

Move (inverted wide immediate).

This instruction is an alias of MOVN.

The equivalent instruction is MOVN *Wd*, #*imm16*, LSL #*shift*.

## Syntax

MOV *Wd*, #*imm* ; 32-bit

MOV *Xd*, #*imm* ; 64-bit

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***imm***

Depends on the instruction variant:

**32-bit general registers** Is a 32-bit immediate.

**64-bit general registers** Is a 64-bit immediate.

#### ***Xd***

Is the 64-bit name of the general-purpose destination register.

### Operation

Move (inverted wide immediate) moves an inverted 16-bit immediate value to a register.

$Rd = imm$ , where  $R$  is either w or x.

### Related information

[MOVN \(A64\)](#) on page 754

[A64 instructions in alphabetical order](#) on page 664

## 17.95 MOV (wide immediate) (A64)

Move (wide immediate).

This instruction is an alias of `MOVZ`.

The equivalent instruction is `MOVZ Rd, #imm16, LSL #shift`.

### Syntax

`MOV Rd, #imm ; 32-bit`

`MOV Xd, #imm ; 64-bit`

Where:

#### ***Rd***

Is the 32-bit name of the general-purpose destination register.

#### ***imm***

Depends on the instruction variant:

**32-bit general registers** Is a 32-bit immediate.

**64-bit general registers** Is a 64-bit immediate.

#### ***Xd***

Is the 64-bit name of the general-purpose destination register.

### Operation

Move (wide immediate) moves a 16-bit immediate value to a register.

$Rd = imm$ , where  $R$  is either w or x.

## Related information

[MOVZ \(A64\)](#) on page 755

[A64 instructions in alphabetical order](#) on page 664

## 17.96 MOV (bitmask immediate) (A64)

Move (bitmask immediate).

This instruction is an alias of ORR (immediate).

The equivalent instruction is ORR *wd|wsp*, WZR, #*imm*.

### Syntax

MOV *wd|wsp*, #*imm* ; 32-bit

MOV *xd|sp*, #*imm* ; 64-bit

Where:

***wd|wsp***

Is the 32-bit name of the destination general-purpose register or stack pointer.

***imm***

The bitmask immediate but excluding values which could be encoded by MOVZ or MOVN.

***xd|sp***

Is the 64-bit name of the destination general-purpose register or stack pointer.

### Operation

Move (bitmask immediate) writes a bitmask immediate value to a register.

*Rd* = *imm*, where *R* is either w or x.

## Related information

[ORR \(immediate\) \(A64\)](#) on page 766

[A64 instructions in alphabetical order](#) on page 664

## 17.97 MOV (register) (A64)

Move (register).

This instruction is an alias of ORR (shifted register).

The equivalent instruction is ORR *wd*, WZR, *wm*.

## Syntax

`MOV Wd, Wm ; 32-bit`

`MOV Xd, Xm ; 64-bit`

Where:

**`Wd`**

Is the 32-bit name of the general-purpose destination register.

**`Wm`**

Is the 32-bit name of the general-purpose source register.

**`Xd`**

Is the 64-bit name of the general-purpose destination register.

**`Xm`**

Is the 64-bit name of the general-purpose source register.

## Operation

Move (register) copies the value in a source register to the destination register.

`Rd = Rm`, where `R` is either `w` or `x`.

## Related information

[ORR \(shifted register\) \(A64\)](#) on page 766

[A64 instructions in alphabetical order](#) on page 664

## 17.98 MOVK (A64)

Move wide with keep.

## Syntax

`MOVK Wd, #imm, LSL #{shift} ; 32-bit`

`MOVK Xd, #imm, LSL #{shift} ; 64-bit`

Where:

**`Wd`**

Is the 32-bit name of the general-purpose destination register.

**`shift`**

Depends on the instruction variant:

**32-bit general registers** Is the amount by which to shift the immediate left, either 0 (the default) or 16.

**64-bit general registers** Is the amount by which to shift the immediate left, either 0 (the default), 16, 32 or 48.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***imm***

Is the 16-bit unsigned immediate, in the range 0 to 65535.

## Operation

Move wide with keep moves an optionally-shifted 16-bit immediate value into a register, keeping other bits unchanged.

`Rd<shift+15:shift> = imm16`, where `R` is either `w` or `x`.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.99 MOVL pseudo-instruction (A64)

Load a register with either a 32-bit or 64-bit immediate value or any address.

`MOVL` generates either two or four instructions. If a `wd` register is specified, `MOVL` generates a `MOV`, `MOVK` pair. If an `xd` register is specified, `MOVL` generates a `MOV` followed by three `MOVK` instructions. If the assembler can load the register using a single `MOV` instruction, it additionally generates either one or three `NOP`s.

## Syntax

`MOVL wd, expr`

`MOVL xd, expr`

where:

***wd***

Is the register to load with a 32-bit value.

***xd***

Is the register to load with a 64-bit value.

***expr***

Can be any one of the following:

***symbol***

A label in this or another program area.

**#constant**

Any 32-bit or 64-bit immediate value.

***symbol + constant***

A label plus a 32-bit or 64-bit immediate value.

## Usage

Use the `MOVL` pseudo-instruction to:

- Generate literal constants when an immediate value cannot be generated in a single instruction.
- Load a PC-relative or external address into a register. The address remains valid regardless of where the linker places the ELF section containing the `MOVL`.



An address loaded in this way is fixed at link time, so the code is not position-independent.

## Examples

```
MOVL w3, #0xABCD12      ; loads 0xABCD12 into w3
MOVL x1, Trigger+12     ; loads the address that is 12 bytes higher than
                         ; the address Trigger into x1
```

## Related information

[ORR \(A32\)](#) on page 352

[Register-relative and PC-relative expressions](#) on page 222

[Numeric local labels](#) on page 225

[Load immediate values using LDR Rd, =const](#) on page 109

[A-Profile Architectures](#)

## 17.100 MOVN (A64)

Move wide with NOT.

This instruction is used by the alias `MOV` (inverted wide immediate).

### Syntax

```
MOVN Wd, #imm, LSL #shift ; 32-bit
```

```
MOVN Xd, #imm, LSL #shift ; 64-bit
```

Where:

#### ***Wd***

Is the 32-bit name of the general-purpose destination register.

#### ***shift***

Depends on the instruction variant:

**32-bit general registers** Is the amount by which to shift the immediate left, either 0 (the default) or 16.

**64-bit general registers** Is the amount by which to shift the immediate left, either 0 (the default), 16, 32 or 48.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***imm***

Is the 16-bit unsigned immediate, in the range 0 to 65535.

## Operation

Move wide with NOT moves the inverse of an optionally-shifted 16-bit immediate value to a register.

*Rd* = NOT (LSL (*imm16*, *shift*)), where *R* is either w or x.

## Related information

[MOV \(inverted wide immediate\) \(A64\)](#) on page 749

[A64 instructions in alphabetical order](#) on page 664

## 17.101 MOVZ (A64)

Move wide with zero.

This instruction is used by the alias `mov` (wide immediate).

### Syntax

`MOVZ Rd, #imm, LSL #{shift} ; 32-bit`

`MOVZ Xd, #imm, LSL #{shift} ; 64-bit`

Where:

***Rd***

Is the 32-bit name of the general-purpose destination register.

***shift***

Depends on the instruction variant:

**32-bit general registers** Is the amount by which to shift the immediate left, either 0 (the default) or 16.

**64-bit general registers** Is the amount by which to shift the immediate left, either 0 (the default), 16, 32 or 48.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***imm***

Is the 16-bit unsigned immediate, in the range 0 to 65535.

**Operation**

Move wide with zero moves an optionally-shifted 16-bit immediate value to a register.

$Rd = LSL (imm16, shift)$ , where  $R$  is either w or x.

**Related information**

[MOV \(wide immediate\) \(A64\)](#) on page 750

[A64 instructions in alphabetical order](#) on page 664

## 17.102 MRS (A64)

Move System Register.

**Syntax**

`MRS Xt, (systemreg|Sop0_op1_Cn_Cm_op2)`

Where:

**Xt**

Is the 64-bit name of the general-purpose destination register.

**systemreg**

Is a System register name.

**op0**

Is an unsigned immediate, and can be either 2 or 3.

**op1**

Is a 3-bit unsigned immediate, in the range 0 to 7.

**Cn**

Is a name  $c_n$ , with  $n$  in the range 0 to 15.

**Cm**

Is a name  $c_m$ , with  $m$  in the range 0 to 15.

**op2**

Is a 3-bit unsigned immediate, in the range 0 to 7.

**Usage**

Move System Register allows the PE to read an AArch64 System register into a general-purpose register.

**Related information**

[A64 instructions in alphabetical order](#) on page 664

## 17.103 MSR (immediate) (A64)

Move immediate value to Special Register.

### Syntax

```
MSR pstatefield, #imm
```

Where:

***pstatefield***

Is a PSTATE field name, and can be one of UAO, PAN, SPSel, DAIFSet or DAIFClr.

***imm***

Is a 4-bit unsigned immediate, in the range 0 to 15.

### Usage

Move immediate value to Special Register moves an immediate value to selected bits of the PSTATE. For more information, see *Process state, PSTATE* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The bits that can be written are D, A, I, F, and SP. This set of bits is expanded in extensions to the architecture as follows:

- Arm®v8.1 adds the PAN bit.
- Armv8.2 adds the UAO bit.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.104 MSR (register) (A64)

Move general-purpose register to System Register.

### Syntax

```
MSR (systemreg|Sop0_op1_Cn_Cm_op2), Xt
```

Where:

***systemreg***

Is a System register name.

***op0***

Is an unsigned immediate, and can be either 2 or 3.

***op1***

Is a 3-bit unsigned immediate, in the range 0 to 7.

***Cn***

Is a name *Cn*, with *n* in the range 0 to 15.

***Cm***

Is a name *Cm*, with *m* in the range 0 to 15.

***op2***

Is a 3-bit unsigned immediate, in the range 0 to 7.

***xt***

Is the 64-bit name of the general-purpose source register.

**Usage**

Move general-purpose register to System Register allows the PE to write an AArch64 System register from a general-purpose register.

**Related information**

[A64 instructions in alphabetical order](#) on page 664

## 17.105 MSUB (A64)

Multiply-Subtract.

This instruction is used by the alias `MNEG`.

**Syntax**

`MSUB Wd, Wn, Wm, Wa ; 32-bit`

`MSUB Xd, Xn, Xm, Xa ; 64-bit`

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register holding the multiplicand.

***Wm***

Is the 32-bit name of the second general-purpose source register holding the multiplier.

***Wa***

Is the 32-bit name of the third general-purpose source register holding the minuend.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register holding the multiplicand.

**Xm**

Is the 64-bit name of the second general-purpose source register holding the multiplier.

**Xa**

Is the 64-bit name of the third general-purpose source register holding the minuend.

**Operation**

Multiply-Subtract multiplies two register values, subtracts the product from a third register value, and writes the result to the destination register.

$Rd = Ra - Rn * Rm$ , where  $R$  is either w or x.

**Related information**

[MNEG \(A64\)](#) on page 747

[A64 instructions in alphabetical order](#) on page 664

## 17.106 MUL (A64)

Multiply.

This instruction is an alias of `MADD`.

The equivalent instruction is `MADD Rd, Rn, Rm, RZR`.

**Syntax**

`MUL Rd, Rn, Rm ; 32-bit`

`MUL Xd, Xn, Xm ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the first general-purpose source register holding the multiplicand.

**Rm**

Is the 32-bit name of the second general-purpose source register holding the multiplier.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register holding the multiplicand.

**Xm**

Is the 64-bit name of the second general-purpose source register holding the multiplier.

## Operation

$Rd = Rn * Rm$ , where  $R$  is either w or x.

## Related information

[MADD \(A64\)](#) on page 746

[A64 instructions in alphabetical order](#) on page 664

## 17.107 MVN (A64)

Bitwise NOT.

This instruction is an alias of ORN (shifted register).

The equivalent instruction is ORN  $Wd, WZR, Wm, \{shift\#amount\}$ .

## Syntax

MVN  $Wd, Wm, \{shift\#amount\}$  ; 32-bit

MVN  $Xd, Xm, \{shift\#amount\}$  ; 64-bit

Where:

**$Wd$**

Is the 32-bit name of the general-purpose destination register.

**$Wm$**

Is the 32-bit name of the general-purpose source register.

**$amount$**

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

**$Xd$**

Is the 64-bit name of the general-purpose destination register.

**$Xm$**

Is the 64-bit name of the general-purpose source register.

**$shift$**

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSL, LSR, ASR, or ROR.

## Operation

Bitwise NOT writes the bitwise inverse of a register value to the destination register.

$Rd = \text{NOT } shift(Rm, amount)$ , where  $R$  is either w or x.

## Related information

[ORN \(shifted register\) \(A64\)](#) on page 765  
[A64 instructions in alphabetical order](#) on page 664

## 17.108 NEG (shifted register) (A64)

Negate (shifted register).

This instruction is an alias of `SUB` (shifted register).

The equivalent instruction is `SUB Rd, WZR, Wm, {shift #amount}`.

### Syntax

```
NEG Rd, Wm, {shift #amount} ; 32-bit
```

```
NEG Xd, Xm, {shift #amount} ; 64-bit
```

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Wm**

Is the 32-bit name of the general-purpose source register.

**amount**

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xm**

Is the 64-bit name of the general-purpose source register.

**shift**

Is the optional shift type to be applied to the second source operand, defaulting to LSL, and can be one of LSL, LSR, or ASR.

### Operation

Negate (shifted register) negates an optionally-shifted register value, and writes the result to the destination register.

$Rd = 0 - \text{shift}(Rm, amount)$ , where R is either w or x.

## Related information

[SUB \(shifted register\) \(A64\)](#) on page 792

[A64 instructions in alphabetical order](#) on page 664

## 17.109 NEGS (A64)

Negate, setting flags.

This instruction is an alias of `SUBS` (shifted register).

The equivalent instruction is `SUBS Rd, WZR, Rm, {shift #amount}`.

### Syntax

`NEGS Rd, Rm, {shift #amount} ; 32-bit`

`NEGS Xd, Xm, {shift #amount} ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rm**

Is the 32-bit name of the general-purpose source register.

**amount**

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xm**

Is the 64-bit name of the general-purpose source register.

**shift**

Is the optional shift type to be applied to the second source operand, defaulting to LSL, and can be one of LSL, LSR, or ASR.

### Operation

Negate, setting flags, negates an optionally-shifted register value, and writes the result to the destination register. It updates the condition flags based on the result.

$Rd = 0 - \text{shift}(Rm, amount)$ , where  $R$  is either w or x.

## Related information

[SUBS \(shifted register\) \(A64\)](#) on page 796  
[A64 instructions in alphabetical order](#) on page 664

## 17.110 NGC (A64)

Negate with Carry.

This instruction is an alias of `SBC`.

The equivalent instruction is `SBC Rd, Rm`, `WZR, Wm`.

### Syntax

`NGC Rd, Rm ; 32-bit`

`NGC Xd, Xm ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rm**

Is the 32-bit name of the general-purpose source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xm**

Is the 64-bit name of the general-purpose source register.

### Operation

Negate with Carry negates the sum of a register value and the value of NOT (Carry flag), and writes the result to the destination register.

$Rd = 0 - Rm - 1 + c$ , where  $R$  is either `w` or `x`.

### Related information

[SBC \(A64\)](#) on page 779  
[A64 instructions in alphabetical order](#) on page 664

## 17.111 NGCS (A64)

Negate with Carry, setting flags.

This instruction is an alias of `SBCS`.

The equivalent instruction is `SBCS Rd, WZR, Wm`.

## Syntax

`NGCS Rd, Wm ; 32-bit`

`NGCS Xd, Xm ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Wm**

Is the 32-bit name of the general-purpose source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xm**

Is the 64-bit name of the general-purpose source register.

## Operation

Negate with Carry, setting flags, negates the sum of a register value and the value of NOT (Carry flag), and writes the result to the destination register. It updates the condition flags based on the result.

$Rd = 0 - Rm - 1 + c$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[SBCS \(A64\)](#) on page 780

[A64 instructions in alphabetical order](#) on page 664

## 17.112 NOP (A64)

No Operation.

### Usage

No Operation does nothing, other than advance the value of the program counter by 4. This instruction can be used for instruction alignment purposes.



The timing effects of including a `NOP` instruction in a program are not guaranteed. It can increase execution time, leave it unchanged, or even reduce it. Therefore, `NOP` instructions are not suitable for timing loops.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.113 ORN (shifted register) (A64)

Bitwise OR NOT (shifted register).

This instruction is used by the alias `MVN`.

### Syntax

`ORN Rd, Rn, Rm, {shift #amount} ; 32-bit`

`ORN Xd, Xn, Xm, {shift #amount} ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the first general-purpose source register.

**Rm**

Is the 32-bit name of the second general-purpose source register.

**amount**

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register.

**shift**

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of `LSL`, `LSR`, `ASR`, or `ROB`.

### Operation

Bitwise OR NOT (shifted register) performs a bitwise (inclusive) OR of a register value and the complement of an optionally-shifted register value, and writes the result to the destination register.

$Rd = Rn \text{ OR NOT } \text{shift}(Rm, amount)$ , where  $R$  is either `w` or `x`.

### Related information

[MVN \(A64\)](#) on page 760

[A64 instructions in alphabetical order](#) on page 664

## 17.114 ORR (immediate) (A64)

Bitwise OR (immediate).

This instruction is used by the alias `mov` (bitmask immediate).

### Syntax

`ORR Wd|WSP, Wn, #imm ; 32-bit`

`ORR Xd|SP, Xn, #imm ; 64-bit`

Where:

**`Wd|WSP`**

Is the 32-bit name of the destination general-purpose register or stack pointer.

**`Wn`**

Is the 32-bit name of the general-purpose source register.

**`imm`**

The bitmask immediate.

**`Xd|SP`**

Is the 64-bit name of the destination general-purpose register or stack pointer.

**`Xn`**

Is the 64-bit name of the general-purpose source register.

### Operation

Bitwise OR (immediate) performs a bitwise (inclusive) OR of a register value and an immediate register value, and writes the result to the destination register.

$Rd = Rn \text{ OR } imm$ , where  $R$  is either `w` or `x`.

### Related information

[MOV \(bitmask immediate\) \(A64\)](#) on page 751

[A64 instructions in alphabetical order](#) on page 664

## 17.115 ORR (shifted register) (A64)

Bitwise OR (shifted register).

This instruction is used by the alias `mov` (register).

## Syntax

`ORR Rd, Rn, Rm, {shift #amount} ; 32-bit`

`ORR Xd, Xn, Xm, {shift #amount} ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the first general-purpose source register.

**Rm**

Is the 32-bit name of the second general-purpose source register.

**amount**

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register.

**shift**

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSL, LSR, ASR, or ROR.

## Operation

Bitwise OR (shifted register) performs a bitwise (inclusive) OR of a register value and an optionally-shifted register value, and writes the result to the destination register.

$Rd = Rn \text{ OR } \text{shift}(Rm, amount)$ , where R is either w or x.

## Related information

[MOV \(register\) \(A64\)](#) on page 751

[A64 instructions in alphabetical order](#) on page 664

## 17.116 PACDA, PACDZA (A64)

Pointer Authentication Code for Data address, using key A.

### Syntax

```
PACDA Xd, Xn|SP ; PACDA general registers

PACDZA Xd ; PACDZA general registers
```

Where:

#### ***Xn|SP***

Is the 64-bit name of the general-purpose source register or stack pointer.

#### ***Xd***

Is the 64-bit name of the general-purpose destination register.

### Architectures supported

Supported in the Arm®v8.3-A architecture and later.

### Usage

Pointer Authentication Code for Data address, using key A. This instruction computes and inserts a pointer authentication code for a data address, using a modifier and key A.

The address is in the general-purpose register that is specified by *xd*.

The modifier is:

- In the general-purpose register or stack pointer that is specified by *Xn|SP* for **PACDA**.
- The value zero, for **PACDZA**.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.117 PACDB, PACDZB (A64)

Pointer Authentication Code for Data address, using key B.

### Syntax

```
PACDB Xd, Xn|SP ; PACDB general registers

PACDZB Xd ; PACDZB general registers
```

Where:

#### ***Xn|SP***

Is the 64-bit name of the general-purpose source register or stack pointer.

***xd***

Is the 64-bit name of the general-purpose destination register.

**Architectures supported**

Supported in the Arm®v8.3-A architecture and later.

**Usage**

Pointer Authentication Code for Data address, using key B. This instruction computes and inserts a pointer authentication code for a data address, using a modifier and key B.

The address is in the general-purpose register that is specified by *xd*.

The modifier is:

- In the general-purpose register or stack pointer that is specified by *xn|sp* for `PACDB`.
- The value zero, for `PACDZB`.

**Related information**

[A64 instructions in alphabetical order](#) on page 664

## 17.118 PACGA (A64)

Pointer Authentication Code, using Generic key.

**Syntax**

`PACGA Xd, Xn, Xm|SP`

Where:

***xd***

Is the 64-bit name of the general-purpose destination register.

***xn***

Is the 64-bit name of the first general-purpose source register.

***Xm|SP***

Is the 64-bit name of the second general-purpose source register or stack pointer.

**Architectures supported**

Supported in the Arm®v8.3-A architecture and later.

**Usage**

Pointer Authentication Code, using Generic key. This instruction computes the pointer authentication code for an address in the first source register, using a modifier in the second source register, and the Generic key. The computed pointer authentication code is returned in the upper 32 bits of the destination register.

## Related information

[A64 instructions in alphabetical order](#) on page 664

# 17.119 PACIA, PACIZA, PACIA1716, PACIASP, PACIAZ (A64)

Pointer Authentication Code for Instruction address, using key A.

## Syntax

PACIA *Xd*, *Xn|SP* ; PACIA general registers

PACIZA *Xd* ; PACIZA general registers

PACIA1716

PACIASP

PACIAZ

Where:

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn|SP***

Is the 64-bit name of the general-purpose source register or stack pointer.

## Architectures supported

Supported in the Arm®v8.3-A architecture and later.

## Usage

Pointer Authentication Code for Instruction address, using key A. This instruction computes and inserts a pointer authentication code for an instruction address, using a modifier and key A.

The address is:

- In the general-purpose register that is specified by *xd* for **PACIA** and **PACIZA**.
- In X17, for **PACIA1716**.
- In X30, for **PACIASP** and **PACIAZ**.

The modifier is:

- In the general-purpose register or stack pointer that is specified by *Xn|SP* for **PACIA**.
- The value zero, for **PACIZA** and **PACIAZ**.
- In X16, for **PACIA1716**.
- In SP, for **PACIASP**.

## Related information

[A64 instructions in alphabetical order](#) on page 664

# 17.120 PACIB, PACIZB, PACIB1716, PACIBSP, PACIBZ (A64)

Pointer Authentication Code for Instruction address, using key B.

## Syntax

PACIB *Xd*, *Xn|SP* ; PACIB general registers

PACIZB *Xd* ; PACIZB general registers

PACIB1716

PACIBSP

PACIBZ

Where:

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn|SP***

Is the 64-bit name of the general-purpose source register or stack pointer.

## Architectures supported

Supported in the Arm®v8.3-A architecture and later.

## Usage

Pointer Authentication Code for Instruction address, using key B. This instruction computes and inserts a pointer authentication code for an instruction address, using a modifier and key B.

The address is:

- In the general-purpose register that is specified by *Xd* for **PACIB** and **PACIZB**.
- In X17, for **PACIB1716**.
- In X30, for **PACIBSP** and **PACIBZ**.

The modifier is:

- In the general-purpose register or stack pointer that is specified by *Xn|SP* for **PACIB**.
- The value zero, for **PACIZB** and **PACIBZ**.
- In X16, for **PACIB1716**.
- In SP, for **PACIBSP**.

**Related information**[A64 instructions in alphabetical order](#) on page 664

## 17.121 PSB (A64)

Profiling Synchronization Barrier.

**Syntax**

```
PSB CSYNC
```

**Architectures supported**

Supported in the Arm®v8.2 architecture and later.

**Usage**

Profiling Synchronization Barrier. This instruction is a barrier that ensures that all existing profiling data for the current PE has been formatted, and profiling buffer addresses have been translated such that all writes to the profiling buffer have been initiated. A following `DSB` instruction completes when the writes to the profiling buffer have completed.

**Related information**[A64 instructions in alphabetical order](#) on page 664

## 17.122 RBIT (A64)

Reverse Bits.

**Syntax**

```
RBIT Wd, Wn ; 32-bit
```

```
RBIT Xd, Xn ; 64-bit
```

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the general-purpose source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

## Usage

Reverse Bits reverses the bit order in a register.

## Related information

[A64 instructions in alphabetical order](#) on page 664

# 17.123 RET (A64)

Return from subroutine.

## Syntax

`RET {Xn}`

Where:

### **Xn**

Is the 64-bit name of the general-purpose register holding the address to be branched to.  
Defaults to X30 if absent.

## Usage

Return from subroutine branches unconditionally to an address in a register, with a hint that this is a subroutine return.

## Related information

[A64 instructions in alphabetical order](#) on page 664

# 17.124 RETAA, RETAB (A64)

Return from subroutine, with pointer authentication.

## Syntax

`RETAA`

`RETAB`

## Architectures supported

Supported in the Arm®v8.3-A architecture and later.

## Usage

Return from subroutine, with pointer authentication. This instruction authenticates the address that is held in LR, using SP as the modifier and the specified key, branches to the authenticated address, with a hint that this instruction is a subroutine return.

Key A is used for `RETAA`, and key B is used for `RETAB`.

If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a Translation fault is generated.

The authenticated address is not written back to LR.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.125 REV16 (A64)

Reverse bytes in 16-bit halfwords.

### Syntax

`REV16 Wd, Wn ; 32-bit`

`REV16 Xd, Xn ; 64-bit`

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the general-purpose source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

### Usage

Reverse bytes in 16-bit halfwords reverses the byte order in each 16-bit halfword of a register.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.126 REV32 (A64)

Reverse bytes in 32-bit words.

### Syntax

`REV32 Xd, Xn`

Where:

***xd***

Is the 64-bit name of the general-purpose destination register.

***xn***

Is the 64-bit name of the general-purpose source register.

**Usage**

Reverse bytes in 32-bit words reverses the byte order in each 32-bit word of a register.

**Related information**

[A64 instructions in alphabetical order](#) on page 664

## 17.127 REV64 (A64)

Reverse Bytes.

This instruction is an alias of `REV`.

The equivalent instruction is `REV xd, xn`.

**Syntax**

`REV64 xd, xn`

Where:

***xd***

Is the 64-bit name of the general-purpose destination register.

***xn***

Is the 64-bit name of the general-purpose source register.

**Usage**

Reverse Bytes reverses the byte order in a 64-bit general-purpose register.

When assembling for Arm®v8.2, an assembler must support this pseudo-instruction. It is optional whether an assembler supports this pseudo-instruction when assembling for an architecture earlier than Armv8.2.

**Related information**

[REV \(A64\)](#) on page 775

[A64 instructions in alphabetical order](#) on page 664

## 17.128 REV (A64)

Reverse Bytes.

This instruction is used by the alias `REV64`.

### Syntax

```
REV Wd, Wn ; 32-bit
```

```
REV Xd, Xn ; 64-bit
```

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the general-purpose source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

### Usage

Reverse Bytes reverses the byte order in a register.

### Related information

[REV64 \(A64\)](#) on page 775

[A64 instructions in alphabetical order](#) on page 664

## 17.129 ROR (immediate) (A64)

Rotate right (immediate).

This instruction is an alias of `EXTR`.

The equivalent instruction is `EXTR Wd, Ws, Ws, #shift`.

### Syntax

```
ROR Wd, Ws, #shift ; 32-bit
```

```
ROR Xd, Xs, #shift ; 64-bit
```

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Ws***

Is the 32-bit name of the general-purpose source register.

***shift***

Depends on the instruction variant:

**32-bit general registers** Is the amount by which to rotate, in the range 0 to 31.

**64-bit general registers** Is the amount by which to rotate, in the range 0 to 63.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xs***

Is the 64-bit name of the general-purpose source register.

## Operation

Rotate right (immediate) provides the value of the contents of a register rotated by a variable number of bits. The bits that are rotated off the right end are inserted into the vacated bit positions on the left.

$Rd = ROR(Rs, shift)$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[EXTR \(A64\)](#) on page 736

[A64 instructions in alphabetical order](#) on page 664

## 17.130 ROR (register) (A64)

Rotate Right (register).

This instruction is an alias of `RORV`.

The equivalent instruction is `RORV Wd, Wn, Wm`.

## Syntax

`ROR Wd, Wn, Wm ; 32-bit`

`ROR Xd, Xn, Xm ; 64-bit`

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in its bottom 5 bits.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in its bottom 6 bits.

## Operation

Rotate Right (register) provides the value of the contents of a register rotated by a variable number of bits. The bits that are rotated off the right end are inserted into the vacated bit positions on the left. The remainder obtained by dividing the second source register by the data size defines the number of bits by which the first source register is right-shifted.

*Rd* = ROR (*Rn*, *Rm*), where *R* is either *w* or *x*.

## Related information

[RORV \(A64\)](#) on page 778

[A64 instructions in alphabetical order](#) on page 664

## 17.131 RORV (A64)

Rotate Right Variable.

This instruction is used by the alias `ROR` (register).

## Syntax

`RORV Xd, Wn, Wm ; 32-bit`

`RORV Xd, Xn, Xm ; 64-bit`

Where:

***Xd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in its bottom 5 bits.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in its bottom 6 bits.

## Operation

Rotate Right Variable provides the value of the contents of a register rotated by a variable number of bits. The bits that are rotated off the right end are inserted into the vacated bit positions on the left. The remainder obtained by dividing the second source register by the data size defines the number of bits by which the first source register is right-shifted.

$Rd = ROR(Rn, Rm)$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[ROR \(register\) \(A64\)](#) on page 777

[A64 instructions in alphabetical order](#) on page 664

## 17.132 SBC (A64)

Subtract with Carry.

This instruction is used by the alias `NGC`.

## Syntax

`SBC Wd, Wn, Wm ; 32-bit`

`SBC Xd, Xn, Xm ; 64-bit`

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

## Operation

Subtract with Carry subtracts a register value and the value of NOT (Carry flag) from a register value, and writes the result to the destination register.

$Rd = Rn - Rm - 1 + c$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[NGC \(A64\)](#) on page 763

[A64 instructions in alphabetical order](#) on page 664

## 17.133 SBCS (A64)

Subtract with Carry, setting flags.

This instruction is used by the alias `NGCS`.

## Syntax

`SBCS Wd, Wn, Wm ; 32-bit`

`SBCS Xd, Xn, Xm ; 64-bit`

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

## Operation

Subtract with Carry, setting flags, subtracts a register value and the value of NOT (Carry flag) from a register value, and writes the result to the destination register. It updates the condition flags based on the result.

$Rd = Rn - Rm - 1 + c$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[NGCS \(A64\)](#) on page 763

[A64 instructions in alphabetical order](#) on page 664

## 17.134 SBFIZ (A64)

Signed Bitfield Insert in Zero.

This instruction is an alias of `SBFM`.

The equivalent instruction is `SBFM Rd, Rn, #lsb, #width ; 32-bit`

`SBFIZ Rd, Rn, #lsb, #width ; 32-bit`

Where:

### **Rd**

Is the 32-bit name of the general-purpose destination register.

### **Rn**

Is the 32-bit name of the general-purpose source register.

### **lsb**

Depends on the instruction variant:

**32-bit general registers** Is the bit number of the lsb of the destination bitfield, in the range 0 to 31.

**64-bit general registers** Is the bit number of the lsb of the destination bitfield, in the range 0 to 63.

### **width**

Depends on the instruction variant:

**32-bit general registers** Is the width of the bitfield, in the range 1 to 32-*lsb*.

**64-bit general registers** Is the width of the bitfield, in the range 1 to 64-*lsb*.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

## Usage

Signed Bitfield Insert in Zero zeros the destination register and copies any number of contiguous bits from a source register into any position in the destination register, sign-extending the most significant bit of the transferred value.

## Related information

[SBFM \(A64\)](#) on page 782

[A64 instructions in alphabetical order](#) on page 664

## 17.135 SBFM (A64)

Signed Bitfield Move.

This instruction is used by the aliases:

- ASR (immediate).
- SBFIZ.
- SBFX.
- SXTB.
- SXTH.
- SXTW.

## Syntax

SBFM *Wd*, *Wn*, #<immr>, #<imms> ; 32-bit

SBFM *Xd*, *Xn*, #<immr>, #<imms> ; 64-bit

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the general-purpose source register.

**<immr>**

Depends on the instruction variant:

**32-bit general registers** Is the right rotate amount, in the range 0 to 31.

**64-bit general registers** Is the right rotate amount, in the range 0 to 63.

**<imms>**

Depends on the instruction variant:

**32-bit general registers** Is the leftmost bit number to be moved from the source, in the range 0 to 31.

**64-bit general registers** Is the leftmost bit number to be moved from the source, in the range 0 to 63.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the general-purpose source register.

## Usage

Signed Bitfield Move copies any number of low-order bits from a source register into the same number of adjacent bits at any position in the destination register, shifting in copies of the sign bit in the upper bits and zeros in the lower bits.

## Related information

[ASR \(immediate\) \(A64\)](#) on page 684

[SBFIZ \(A64\)](#) on page 781

[SBFX \(A64\)](#) on page 783

[SXTB \(A64\)](#) on page 797

[SXTH \(A64\)](#) on page 798

[SXTW \(A64\)](#) on page 799

[A64 instructions in alphabetical order](#) on page 664

## 17.136 SBFX (A64)

Signed Bitfield Extract.

This instruction is an alias of `SBFM`.

The equivalent instruction is `SBFM wd, wn, #lsb, #(lsb+width-1)`.

## Syntax

`SBFX wd, wn, #lsb, #width ; 32-bit`

`SBFX Xd, Xn, #lsb, #width ; 64-bit`

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the general-purpose source register.

***1sb***

Depends on the instruction variant:

**32-bit general registers** Is the bit number of the lsb of the source bitfield, in the range 0 to 31.

**64-bit general registers** Is the bit number of the lsb of the source bitfield, in the range 0 to 63.

***width***

Depends on the instruction variant:

**32-bit general registers** Is the width of the bitfield, in the range 1 to 32-*1sb*.

**64-bit general registers** Is the width of the bitfield, in the range 1 to 64-*1sb*.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

**Usage**

Signed Bitfield Extract extracts any number of adjacent bits at any position from a register, sign-extends them to the size of the register, and writes the result to the destination register.

**Related information**

[SBFM \(A64\)](#) on page 782

[A64 instructions in alphabetical order](#) on page 664

## 17.137 SDIV (A64)

Signed Divide.

**Syntax**

SDIV *Wd*, *Wn*, *Wm* ; 32-bit

SDIV *Xd*, *Xn*, *Xm* ; 64-bit

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

**Wn**

Is the 32-bit name of the first general-purpose source register.

**Wm**

Is the 32-bit name of the second general-purpose source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register.

## Operation

Signed Divide divides a signed integer register value by another signed integer register value, and writes the result to the destination register. The condition flags are not affected.

$Rd = Rn / Rm$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.138 SEV (A64)

Send Event.

### Usage

Send Event is a hint instruction. It causes an event to be signaled to all PEs in the multiprocessor system. For more information, see *Wait for Event mechanism and Send event* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.139 SEVL (A64)

Send Event Local.

### Usage

Send Event Local is a hint instruction. It causes an event to be signaled locally without the requirement to affect other PEs in the multiprocessor system. It can prime a wait-loop which starts with a `WFE` instruction.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.140 SMADDL (A64)

Signed Multiply-Add Long.

This instruction is used by the alias `SMULL`.

### Syntax

`SMADDL Xd, Wn, Wm, Xa`

Where:

**`Xd`**

Is the 64-bit name of the general-purpose destination register.

**`Wn`**

Is the 32-bit name of the first general-purpose source register holding the multiplicand.

**`Wm`**

Is the 32-bit name of the second general-purpose source register holding the multiplier.

**`Xa`**

Is the 64-bit name of the third general-purpose source register holding the addend.

### Operation

Signed Multiply-Add Long multiplies two 32-bit register values, adds a 64-bit register value, and writes the result to the 64-bit destination register.

$Xd = Xa + Wn * Wm$ .

### Related information

[SMULL \(A64\)](#) on page 789

[A64 instructions in alphabetical order](#) on page 664

## 17.141 SMC (A64)

Supervisor call to allow OS or Hypervisor code to call the Secure Monitor. It generates an exception targeting exception level 3 (EL3).

### Syntax

`SMC #imm`

Where:

***imm***

Is a 16-bit unsigned immediate, in the range 0 to 65535. This value is made available to the handler in the Exception Syndrome Register.

**Usage**

Secure Monitor Call causes an exception to EL3.

`smc` is available only for software executing at EL1 or higher. It is undefined in EL0.

If the values of HCR\_EL2.TSC and SCR\_EL3.SMD are both 0, execution of an `smc` instruction at EL1 or higher generates a Secure Monitor Call exception, recording it in ESR\_ELx, using the EC value 0x17, that is taken to EL3.

If the value of HCR\_EL2.TSC is 1, execution of an `smc` instruction in a Non-secure EL1 state generates an exception that is taken to EL2, regardless of the value of SCR\_EL3.SMD. For more information, see *Traps to EL2 of Non-secure EL1 execution of SMC instructions* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

If the value of HCR\_EL2.TSC is 0 and the value of SCR\_EL3.SMD is 1, the SMC instruction is undefined.

**Related information**

[A64 instructions in alphabetical order](#) on page 664

## 17.142 SMNEGL (A64)

Signed Multiply-Negate Long.

This instruction is an alias of `SMSUBL`.

The equivalent instruction is `SMSUBL Xd, Wn, Wm, XZR`.

**Syntax**

`SMNEGL Xd, Wn, Wm`

Where:

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register holding the multiplicand.

***Wm***

Is the 32-bit name of the second general-purpose source register holding the multiplier.

## Operation

Signed Multiply-Negate Long multiplies two 32-bit register values, negates the product, and writes the result to the 64-bit destination register.

$$Xd = -(Wn * Wm).$$

## Related information

[SMSUBL \(A64\)](#) on page 788

[A64 instructions in alphabetical order](#) on page 664

## 17.143 SMSUBL (A64)

Signed Multiply-Subtract Long.

This instruction is used by the alias `SMNEGL`.

## Syntax

`SMSUBL Xd, Wn, Wm, Xa`

Where:

**`Xd`**

Is the 64-bit name of the general-purpose destination register.

**`Wn`**

Is the 32-bit name of the first general-purpose source register holding the multiplicand.

**`Wm`**

Is the 32-bit name of the second general-purpose source register holding the multiplier.

**`Xa`**

Is the 64-bit name of the third general-purpose source register holding the minuend.

## Operation

Signed Multiply-Subtract Long multiplies two 32-bit register values, subtracts the product from a 64-bit register value, and writes the result to the 64-bit destination register.

$$Xd = Xa - Wn * Wm.$$

## Related information

[SMNEGL \(A64\)](#) on page 788

[A64 instructions in alphabetical order](#) on page 664

## 17.144 SMULH (A64)

Signed Multiply High.

### Syntax

`SMULH Xd, Xn, Xm`

Where:

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register holding the multiplicand.

**Xm**

Is the 64-bit name of the second general-purpose source register holding the multiplier.

### Operation

Signed Multiply High multiplies two 64-bit register values, and writes bits[127:64] of the 128-bit result to the 64-bit destination register.

`Xd = bits<127:64> of Xn * Xm.`

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.145 SMULL (A64)

Signed Multiply Long.

This instruction is an alias of `SMADDL`.

The equivalent instruction is `SMADDL Xd, Wn, Wm, XZR`.

### Syntax

`SMULL Xd, Wn, Wm`

Where:

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Wn**

Is the 32-bit name of the first general-purpose source register holding the multiplicand.

**Wm**

Is the 32-bit name of the second general-purpose source register holding the multiplier.

## Operation

Signed Multiply Long multiplies two 32-bit register values, and writes the result to the 64-bit destination register.

$Xd = Wn * Wm$ .

## Related information

[SMADDL \(A64\)](#) on page 786

[A64 instructions in alphabetical order](#) on page 664

## 17.146 SUB (extended register) (A64)

Subtract (extended register).

### Syntax

`SUB Rd|WSP, Rn|WSP, Rm, {extend #{amount}} ; 32-bit`

`SUB Xd|SP, Xn|SP, Rm, {extend #{amount}} ; 64-bit`

Where:

**Rd|WSP**

Is the 32-bit name of the destination general-purpose register or stack pointer.

**Rn|WSP**

Is the 32-bit name of the first source general-purpose register or stack pointer.

**Rm**

Is the 32-bit name of the second general-purpose source register.

**extend**

Is the extension to be applied to the second source operand:

### 32-bit general registers

Can be one of UXTB, UXTH, LSL|UXTW, UXTX, SXTB, SXTH, SXTW or SXTX.

If  $Rd$  or  $Rn$  is WSP then LSL is preferred rather than UXTW, and can be omitted when  $amount$  is 0. In all other cases *extend* is required and must be UXTW rather than LSL.

### 64-bit general registers

Can be one of UXTB, UXTH, UXTW, LSL|UXTX, SXTB, SXTH, SXTW or SXTX.

If  $Rd$  or  $Rn$  is SP then LSL is preferred rather than UXTX, and can be omitted when  $amount$  is 0. In all other cases *extend* is required and must be UXTX rather than LSL.

**Xd|SP**

Is the 64-bit name of the destination general-purpose register or stack pointer.

**Xn|SP**

Is the 64-bit name of the first source general-purpose register or stack pointer.

**R**

Is a width specifier, and can be either w or x.

**m**

Is the number [0-30] of the second general-purpose source register or the name ZR (31).

**amount**

Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0. It must be absent when *extend* is absent, is required when *extend* is LSL, and is optional when *extend* is present but not LSL.

## Operation

Subtract (extended register) subtracts a sign or zero-extended register value, followed by an optional left shift amount, from a register value, and writes the result to the destination register. The argument that is extended from the *Rm* register can be a byte, halfword, word, or doubleword.

$Rd = Rn - \text{LSL}(\text{extend}(Rm), \text{amount})$ , where *R* is either w or x.

## Usage

**Table 17-9: SUB (64-bit general registers) specifier combinations**

| <b>R</b> | <b>extend</b> |
|----------|---------------|
| W        | SXTB          |
| W        | SXTH          |
| W        | SXTW          |
| W        | UXTB          |
| W        | UXTH          |
| W        | UXTW          |
| X        | LSL UXTX      |
| X        | SXTX          |

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.147 SUB (immediate) (A64)

Subtract (immediate).

### Syntax

SUB *Wd|WSP, Wn|WSP, #imm, {shift}* ; 32-bit

SUB *Xd|SP, Xn|SP, #imm, {shift}* ; 64-bit

Where:

***Wd | WSP***

Is the 32-bit name of the destination general-purpose register or stack pointer.

***Wn | WSP***

Is the 32-bit name of the source general-purpose register or stack pointer.

***Xd | SP***

Is the 64-bit name of the destination general-purpose register or stack pointer.

***Xn | SP***

Is the 64-bit name of the source general-purpose register or stack pointer.

***imm***

Is an unsigned immediate, in the range 0 to 4095.

***shift***

Is the optional left shift to apply to the immediate, defaulting to `LSL #0`, and can be either `LSL #0` or `LSL #12`.

## Operation

Subtract (immediate) subtracts an optionally-shifted immediate value from a register value, and writes the result to the destination register.

$Rd = Rn - \text{shift}(\text{imm})$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.148 SUB (shifted register) (A64)

Subtract (shifted register).

This instruction is used by the alias `NEG` (shifted register).

## Syntax

```
SUB Wd, Wn, Wm, {shift #amount} ; 32-bit
```

```
SUB Xd, Xn, Xm, {shift #amount} ; 64-bit
```

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***amount***

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

***shift***

Is the optional shift type to be applied to the second source operand, defaulting to LSL, and can be one of LSL, LSR, or ASR.

## Operation

Subtract (shifted register) subtracts an optionally-shifted register value from a register value, and writes the result to the destination register.

$Rd = Rn - \text{shift}(Rm, amount)$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[NEG \(shifted register\) \(A64\)](#) on page 761

[A64 instructions in alphabetical order](#) on page 664

## 17.149 SUBS (extended register) (A64)

Subtract (extended register), setting flags.

This instruction is used by the alias `CMP` (extended register).

## Syntax

```
SUBS Xd, Wn|WSP, Wm, {extend #amount} ; 32-bit
```

```
SUBS Xd, Xn|SP, Rm, {extend #amount} ; 64-bit
```

Where:

***Xd***

Is the 32-bit name of the general-purpose destination register.

***Rn | WSP***

Is the 32-bit name of the first source general-purpose register or stack pointer.

***Rm***

Is the 32-bit name of the second general-purpose source register.

***extend***

Is the extension to be applied to the second source operand:

**32-bit general registers**

Can be one of UXTH, UXTH, LSL|UXTW, UXTX, SXTB, SXTH, SXTW or SXTX.

If *Rn* is WSP then LSL is preferred rather than UXTW, and can be omitted when *amount* is 0. In all other cases *extend* is required and must be UXTW rather than LSL.

**64-bit general registers**

Can be one of UXTB, UXTH, UXTW, LSL|UXTX, SXTB, SXTH, SXTW or SXTX.

If *Rn* is SP then LSL is preferred rather than UXTX, and can be omitted when *amount* is 0. In all other cases *extend* is required and must be UXTX rather than LSL.

***Rd***

Is the 64-bit name of the general-purpose destination register.

***Rn | SP***

Is the 64-bit name of the first source general-purpose register or stack pointer.

***R***

Is a width specifier, and can be either w or x.

***m***

Is the number [0-30] of the second general-purpose source register or the name ZR (31).

***amount***

Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0. It must be absent when *extend* is absent, is required when *extend* is LSL, and is optional when *extend* is present but not LSL.

**Operation**

Subtract (extended register), setting flags, subtracts a sign or zero-extended register value, followed by an optional left shift amount, from a register value, and writes the result to the destination register. The argument that is extended from the *Rm* register can be a byte, halfword, word, or doubleword. It updates the condition flags based on the result.

*Rd* = *Rn* - LSL(*extend(Rm)*, *amount*), where *R* is either w or x.

**Usage**

**Table 17-10: SUBS (64-bit general registers) specifier combinations**

| <b><i>R</i></b> | <b><i>extend</i></b> |
|-----------------|----------------------|
| W               | SXTB                 |

| <b>R</b> | <b>extend</b> |
|----------|---------------|
| W        | SXTH          |
| W        | SXTW          |
| W        | UXTB          |
| W        | UXTH          |
| W        | UXTW          |
| X        | LSL   UXTX    |
| X        | SXTX          |

## Related information

[CMP \(extended register\) \(A64\)](#) on page 713

[A64 instructions in alphabetical order](#) on page 664

## 17.150 SUBS (immediate) (A64)

Subtract (immediate), setting flags.

This instruction is used by the alias `CMP` (immediate).

### Syntax

`SUBS Wd, Wn|WSP, #imm, {shift} ; 32-bit`

`SUBS Xd, Xn|SP, #imm, {shift} ; 64-bit`

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Wn | WSP**

Is the 32-bit name of the source general-purpose register or stack pointer.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn | SP**

Is the 64-bit name of the source general-purpose register or stack pointer.

**imm**

Is an unsigned immediate, in the range 0 to 4095.

**shift**

Is the optional left shift to apply to the immediate, defaulting to `LSL #0`, and can be either `LSL #0` or `LSL #12`.

## Operation

Subtract (immediate), setting flags, subtracts an optionally-shifted immediate value from a register value, and writes the result to the destination register. It updates the condition flags based on the result.

$Rd = Rn - shift(imm)$ , where  $R$  is either w or x.

## Related information

[CMP \(immediate\) \(A64\)](#) on page 715

[A64 instructions in alphabetical order](#) on page 664

## 17.151 SUBS (shifted register) (A64)

Subtract (shifted register), setting flags.

This instruction is used by the aliases:

- [CMP \(shifted register\)](#).
- [NEGS](#).

## Syntax

`SUBS Rd, Rn, Rm, {shift #amount} ; 32-bit`

`SUBS Xd, Xn, Xm, {shift #amount} ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the first general-purpose source register.

**Rm**

Is the 32-bit name of the second general-purpose source register.

**amount**

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register.

***Xm***

Is the 64-bit name of the second general-purpose source register.

***shift***

Is the optional shift type to be applied to the second source operand, defaulting to LSL, and can be one of `LSL`, `LSR`, or `ASR`.

**Operation**

Subtract (shifted register), setting flags, subtracts an optionally-shifted register value from a register value, and writes the result to the destination register. It updates the condition flags based on the result.

$Rd = Rn - shift(Rm, amount)$ , where  $R$  is either `w` or `x`.

**Related information**

[CMP \(shifted register\) \(A64\)](#) on page 716

[NEGS \(A64\)](#) on page 762

[A64 instructions in alphabetical order](#) on page 664

## 17.152 SVC (A64)

Supervisor call to allow application code to call the OS. It generates an exception targeting exception level 1 (EL1).

**Syntax**

`SVC #imm`

Where:

***imm***

Is a 16-bit unsigned immediate, in the range 0 to 65535. This value is made available to the handler in the Exception Syndrome Register.

**Usage**

Supervisor Call causes an exception to be taken to EL1.

On executing an `svc` instruction, the PE records the exception as a Supervisor Call exception in `ESR_ELx`, using the EC value 0x15, and the value of the immediate argument.

**Related information**

[A64 instructions in alphabetical order](#) on page 664

## 17.153 SXTB (A64)

Signed Extend Byte.

This instruction is an alias of `SBFM`.

The equivalent instruction is `SBFM Rd, Rn, #0, #7`.

### Syntax

`SXTB Rd, Rn ; 32-bit`

`SXTB Xd, Rn ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the general-purpose source register.

### Operation

Signed Extend Byte extracts an 8-bit value from a register, sign-extends it to the size of the register, and writes the result to the destination register.

`Rd = SignExtend(Rn<7:0>), where R is either w or x.`

### Related information

[SBFM \(A64\)](#) on page 782

[A64 instructions in alphabetical order](#) on page 664

## 17.154 SXTH (A64)

Sign Extend Halfword.

This instruction is an alias of `SBFM`.

The equivalent instruction is `SBFM Rd, Rn, #0, #15`.

### Syntax

`SXTH Rd, Rn ; 32-bit`

`SXTH Xd, Rn ; 64-bit`

Where:

***wd***

Is the 32-bit name of the general-purpose destination register.

***xd***

Is the 64-bit name of the general-purpose destination register.

***wn***

Is the 32-bit name of the general-purpose source register.

## Operation

Sign Extend Halfword extracts a 16-bit value, sign-extends it to the size of the register, and writes the result to the destination register.

$Rd = \text{SignExtend}(Wn<15:0>)$ , where  $R$  is either  $w$  or  $x$ .

## Related information

[SBFM \(A64\)](#) on page 782

[A64 instructions in alphabetical order](#) on page 664

## 17.155 SXTW (A64)

Sign Extend Word.

This instruction is an alias of `SBFM`.

The equivalent instruction is `SBFM xd, xn, #0, #31`.

## Syntax

`SXTW xd, wn`

Where:

***xd***

Is the 64-bit name of the general-purpose destination register.

***xn***

Is the 64-bit name of the general-purpose source register.

***wn***

Is the 32-bit name of the general-purpose source register.

## Operation

Sign Extend Word sign-extends a word to the size of the register, and writes the result to the destination register.

$Xd = \text{SignExtend}(Wn<31:0>)$

## Related information

[SBFM \(A64\)](#) on page 782

[A64 instructions in alphabetical order](#) on page 664

# 17.156 SYS (A64)

System instruction.

This instruction is used by the aliases:

- AT.
- DC.
- IC.
- TLBI.

## Syntax

SYS #*op1*, *Cn*, *Cm*, #*op2*, {*Xt*}

Where:

### *op1*

Is a 3-bit unsigned immediate, in the range 0 to 7.

### *Cn*

Is a name *Cn*, with *n* in the range 0 to 15.

### *Cm*

Is a name *Cm*, with *m* in the range 0 to 15.

### *op2*

Is a 3-bit unsigned immediate, in the range 0 to 7.

### *Xt*

Is the 64-bit name of the optional general-purpose source register, defaulting to 31.

## Usage

System instruction. For more information, see *Op0 equals 0b01, cache maintenance, TLB maintenance, and address translation instructions* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#) for the encodings of System instructions.

## Related information

[AT \(A64\)](#) on page 686

[DC \(A64\)](#) on page 725

[IC \(A64\)](#) on page 739

[TLBI \(A64\)](#) on page 803

[A64 instructions in alphabetical order](#) on page 664

## 17.157 SYSL (A64)

System instruction with result.

### Syntax

`SYSL Xt, #op1, Cn, Cm, #op2`

Where:

**Xt**

Is the 64-bit name of the general-purpose destination register.

**op1**

Is a 3-bit unsigned immediate, in the range 0 to 7.

**Cn**

Is a name `Cn`, with `n` in the range 0 to 15.

**Cm**

Is a name `Cm`, with `m` in the range 0 to 15.

**op2**

Is a 3-bit unsigned immediate, in the range 0 to 7.

### Usage

System instruction with result. For more information, see *Op0 equals 0b01, cache maintenance, TLB maintenance, and address translation instructions* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#) for the encodings of System instructions.

### Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.158 TBNZ (A64)

Test bit and Branch if Nonzero.

### Syntax

`TBNZ R<t>, #imm, label`

Where:

**R**

Is a width specifier, and can be either `w` or `x`.

In assembler source code an `X` specifier is always permitted, but a `W` specifier is only permitted when the bit number is less than 32.

**<t>**

Is the number [0-30] of the general-purpose register to be tested or the name ZR (31).

***imm***

Is the bit number to be tested, in the range 0 to 63.

***label***

Is the program label to be conditionally branched to. Its offset from the address of this instruction, in the range ±32KB.

**Usage**

Test bit and Branch if Nonzero compares the value of a bit in a general-purpose register with zero, and conditionally branches to a label at a PC-relative offset if the comparison is not equal. It provides a hint that this is not a subroutine call or return. This instruction does not affect condition flags.

**Related information**

[A64 instructions in alphabetical order](#) on page 664

## 17.159 TBZ (A64)

Test bit and Branch if Zero.

**Syntax**

```
TBZ R<t>, #imm, label
```

Where:

**R**

Is a width specifier, and can be either w or x.

In assembler source code an X specifier is always permitted, but a W specifier is only permitted when the bit number is less than 32.

**<t>**

Is the number [0-30] of the general-purpose register to be tested or the name ZR (31).

***imm***

Is the bit number to be tested, in the range 0 to 63.

***label***

Is the program label to be conditionally branched to. Its offset from the address of this instruction, in the range ±32KB.

**Usage**

Test bit and Branch if Zero compares the value of a test bit with zero, and conditionally branches to a label at a PC-relative offset if the comparison is equal. It provides a hint that this is not a subroutine call or return. This instruction does not affect condition flags.

## Related information

[A64 instructions in alphabetical order](#) on page 664

# 17.160 TLBI (A64)

TLB Invalidate operation.

This instruction is an alias of `sys`.

The equivalent instruction is `sys #op1, c8, cm, #op2, {xt}`.

## Syntax

`TLBI <tlbi_op>, {xt}`

Where:

### `op1`

Is a 3-bit unsigned immediate, in the range 0 to 7.

### `cm`

Is a name `cm`, with `m` in the range 0 to 15.

### `op2`

Is a 3-bit unsigned immediate, in the range 0 to 7.

### `<tlbi_op>`

Is a TLBI instruction name, as listed for the TLBI system instruction group, and can be one of the values shown in Usage.

### `xt`

Is the 64-bit name of the optional general-purpose source register, defaulting to 31.

## Usage

TLB Invalidate operation. For more information, see *A64 system instructions for TLB maintenance* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The following table shows the valid specifier combinations:

**Table 17-11: SYS parameter values corresponding to TLBI operations**

| <code>&lt;tlbi_op&gt;</code> | <code>op1</code> | <code>cm</code> | <code>op2</code> |
|------------------------------|------------------|-----------------|------------------|
| ALLE1                        | 4                | 7               | 4                |
| ALLE1IS                      | 4                | 3               | 4                |
| ALLE2                        | 4                | 7               | 0                |
| ALLE2IS                      | 4                | 3               | 0                |
| ALLE3                        | 6                | 7               | 0                |
| ALLE3IS                      | 6                | 3               | 0                |
| ASIDE1                       | 0                | 7               | 2                |

| <b>&lt;tlbi_op&gt;</b> | <b>op1</b> | <b>Cm</b> | <b>op2</b> |
|------------------------|------------|-----------|------------|
| ASIDE1IS               | 0          | 3         | 2          |
| IPAS2E1                | 4          | 4         | 1          |
| IPAS2E1IS              | 4          | 0         | 1          |
| IPAS2LE1               | 4          | 4         | 5          |
| IPAS2LE1IS             | 4          | 0         | 5          |
| VAAE1                  | 0          | 7         | 3          |
| VAAE1IS                | 0          | 3         | 3          |
| VAALE1                 | 0          | 7         | 7          |
| VAALE1IS               | 0          | 3         | 7          |
| VAE1                   | 0          | 7         | 1          |
| VAE1IS                 | 0          | 3         | 1          |
| VAE2                   | 4          | 7         | 1          |
| VAE2IS                 | 4          | 3         | 1          |
| VAE3                   | 6          | 7         | 1          |
| VAE3IS                 | 6          | 3         | 1          |
| VALE1                  | 0          | 7         | 5          |
| VALE1IS                | 0          | 3         | 5          |
| VALE2                  | 4          | 7         | 5          |
| VALE2IS                | 4          | 3         | 5          |
| VALE3                  | 6          | 7         | 5          |
| VALE3IS                | 6          | 3         | 5          |
| VMALLE1                | 0          | 7         | 0          |
| VMALLE1IS              | 0          | 3         | 0          |
| VMALLS12E1             | 4          | 7         | 6          |
| VMALLS12E1IS           | 4          | 3         | 6          |

## Related information

[SYS \(A64\)](#) on page 800

[A64 instructions in alphabetical order](#) on page 664

## 17.161 TST (immediate) (A64)

Test (immediate) performs a logical AND operation on a register value and an immediate value. It updates the condition flags based on the result, and discards the result.

This instruction is an alias of `ANDS` (immediate).

The equivalent instruction is `ANDS WZR, Wn, #imm`.

### Syntax

`TST Wn, #imm ; 32-bit`

`TST Xn, #imm ; 64-bit`

Where:

***Wn***

Is the 32-bit name of the general-purpose source register.

***imm***

The bitmask immediate.

***Xn***

Is the 64-bit name of the general-purpose source register.

## Operation

`Rn AND imm`, where `R` is either `w` or `x`.

## Related information

[ANDS \(immediate\) \(A64\)](#) on page 681

[A64 instructions in alphabetical order](#) on page 664

## 17.162 TST (shifted register) (A64)

Test (shifted register).

This instruction is an alias of `ANDS` (shifted register).

The equivalent instruction is `ANDS WZR, Wn, Wm, {shift #amount}`.

## Syntax

`TST Wn, Wm, {shift #amount} ; 32-bit`

`TST Xn, Xm, {shift #amount} ; 64-bit`

Where:

***Wn***

Is the 32-bit name of the first general-purpose source register.

***Wm***

Is the 32-bit name of the second general-purpose source register.

***amount***

Depends on the instruction variant:

**32-bit general registers** Is the shift amount, in the range 0 to 31, defaulting to 0.

**64-bit general registers** Is the shift amount, in the range 0 to 63, defaulting to 0.

**Xn**

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register.

**shift**

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSI, LSR, ASR, or ROR.

**Operation**

Test (shifted register) performs a bitwise AND operation on a register value and an optionally-shifted register value. It updates the condition flags based on the result, and discards the result.

Rn AND *shift(Rm, amount)*, where R is either w or x.

**Related information**

[ANDS \(shifted register\) \(A64\)](#) on page 682

[A64 instructions in alphabetical order](#) on page 664

## 17.163 UBFIZ (A64)

Unsigned Bitfield Insert in Zero.

This instruction is an alias of [UBFM](#).

The equivalent instruction is `UBFM Wd, Wn, #(-lsb MOD 32), #(width-1)`.

**Syntax**

`UBFIZ Wd, Wn, #lsb, #width ; 32-bit`

`UBFIZ Xd, Xn, #lsb, #width ; 64-bit`

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Wn**

Is the 32-bit name of the general-purpose source register.

**lsb**

Depends on the instruction variant:

**32-bit general registers** Is the bit number of the lsb of the destination bitfield, in the range 0 to 31.

**64-bit general registers** Is the bit number of the lsb of the destination bitfield, in the range 0 to 63.

***width***

Depends on the instruction variant:

**32-bit general registers** Is the width of the bitfield, in the range 1 to 32-*lsb*.

**64-bit general registers** Is the width of the bitfield, in the range 1 to 64-*lsb*.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

**Usage**

Unsigned Bitfield Insert in Zero zeros the destination register and copies any number of contiguous bits from a source register into any position in the destination register.

**Related information**

[UBFM \(A64\)](#) on page 807

[A64 instructions in alphabetical order](#) on page 664

## 17.164 UBFM (A64)

Unsigned Bitfield Move.

This instruction is used by the aliases:

- `LSL` (immediate).
- `LSR` (immediate).
- `UBFIZ`.
- `UBFX`.
- `UXTB`.
- `UXTH`.

**Syntax**

`UBFM Wd, Wn, #<immr>, #<imms> ; 32-bit`

`UBFM Xd, Xn, #<immr>, #<imms> ; 64-bit`

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the general-purpose source register.

**<immr>**

Depends on the instruction variant:

**32-bit general registers** Is the right rotate amount, in the range 0 to 31.

**64-bit general registers** Is the right rotate amount, in the range 0 to 63.

**<imms>**

Depends on the instruction variant:

**32-bit general registers** Is the leftmost bit number to be moved from the source, in the range 0 to 31.

**64-bit general registers** Is the leftmost bit number to be moved from the source, in the range 0 to 63.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

## Usage

Unsigned Bitfield Move copies any number of low-order bits from a source register into the same number of adjacent bits at any position in the destination register, with zeros in the upper and lower bits.

## Related information

[LSL \(immediate\) \(A64\)](#) on page 742

[LSR \(immediate\) \(A64\)](#) on page 745

[UBFIZ \(A64\)](#) on page 806

[UBFX \(A64\)](#) on page 808

[UXTB \(A64\)](#) on page 813

[UXTH \(A64\)](#) on page 814

[A64 instructions in alphabetical order](#) on page 664

## 17.165 UBFX (A64)

Unsigned Bitfield Extract.

This instruction is an alias of `UBFM`.

The equivalent instruction is `UBFM wd, Wn, #lsb, #(lsb+width-1)`.

## Syntax

`UBFX Wd, Wn, #lsb, #width ; 32-bit`

`UBFX Xd, Xn, #lsb, #width ; 64-bit`

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Wn**

Is the 32-bit name of the general-purpose source register.

**lsb**

Depends on the instruction variant:

**32-bit general registers** Is the bit number of the lsb of the source bitfield, in the range 0 to 31.

**64-bit general registers** Is the bit number of the lsb of the source bitfield, in the range 0 to 63.

**width**

Depends on the instruction variant:

**32-bit general registers** Is the width of the bitfield, in the range 1 to 32-*lsb*.

**64-bit general registers** Is the width of the bitfield, in the range 1 to 64-*lsb*.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the general-purpose source register.

## Usage

Unsigned Bitfield Extract extracts any number of adjacent bits at any position from a register, zero-extends them to the size of the register, and writes the result to the destination register.

## Related information

[UBFM \(A64\)](#) on page 807

[A64 instructions in alphabetical order](#) on page 664

## 17.166 UDIV (A64)

Unsigned Divide.

## Syntax

`UDIV Wd, Wn, Wm ; 32-bit`

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

`UDIV Rd, Rn, Rm ; 64-bit`

Where:

**Rd**

Is the 32-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the first general-purpose source register.

**Rm**

Is the 32-bit name of the second general-purpose source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Xn**

Is the 64-bit name of the first general-purpose source register.

**Xm**

Is the 64-bit name of the second general-purpose source register.

## Operation

Unsigned Divide divides an unsigned integer register value by another unsigned integer register value, and writes the result to the destination register. The condition flags are not affected.

$Rd = Rn / Rm$ , where  $R$  is either w or x.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.167 UMADDL (A64)

Unsigned Multiply-Add Long.

This instruction is used by the alias `UMULL`.

## Syntax

`UMADDL Xd, Rn, Rm, Ra`

Where:

**Rd**

Is the 64-bit name of the general-purpose destination register.

**Rn**

Is the 32-bit name of the first general-purpose source register holding the multiplicand.

**Rm**

Is the 32-bit name of the second general-purpose source register holding the multiplier.

***Xa***

Is the 64-bit name of the third general-purpose source register holding the addend.

**Operation**

Unsigned Multiply-Add Long multiplies two 32-bit register values, adds a 64-bit register value, and writes the result to the 64-bit destination register.

$$Xd = Xa + Wn * Wm.$$

**Related information**

[UMULL \(A64\)](#) on page 813

[A64 instructions in alphabetical order](#) on page 664

## 17.168 UMNEGL (A64)

Unsigned Multiply-Negate Long.

This instruction is an alias of `UMSUBL`.

The equivalent instruction is `UMSUBL Xd, Wn, Wm, XZR`.

**Syntax**

`UMNEGL Xd, Wn, Wm`

Where:

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register holding the multiplicand.

***Wm***

Is the 32-bit name of the second general-purpose source register holding the multiplier.

**Operation**

Unsigned Multiply-Negate Long multiplies two 32-bit register values, negates the product, and writes the result to the 64-bit destination register.

$$Xd = - (Wn * Wm)$$

**Related information**

[UMSUBL \(A64\)](#) on page 811

[A64 instructions in alphabetical order](#) on page 664

## 17.169 UMSUBL (A64)

Unsigned Multiply-Subtract Long.

This instruction is used by the alias `UMNEGL`.

### Syntax

`UMSUBL Xd, Wn, Wm, Xa`

Where:

**`Xd`**

Is the 64-bit name of the general-purpose destination register.

**`Wn`**

Is the 32-bit name of the first general-purpose source register holding the multiplicand.

**`Wm`**

Is the 32-bit name of the second general-purpose source register holding the multiplier.

**`Xa`**

Is the 64-bit name of the third general-purpose source register holding the minuend.

### Operation

Unsigned Multiply-Subtract Long multiplies two 32-bit register values, subtracts the product from a 64-bit register value, and writes the result to the 64-bit destination register.

$$Xd = Xa - Wn * Wm$$

### Related information

[UMNEGL \(A64\)](#) on page 811

[A64 instructions in alphabetical order](#) on page 664

## 17.170 UMULH (A64)

Unsigned Multiply High.

### Syntax

`UMULH Xd, Xn, Xm`

Where:

**`Xd`**

Is the 64-bit name of the general-purpose destination register.

**`Xn`**

Is the 64-bit name of the first general-purpose source register holding the multiplicand.

***Xm***

Is the 64-bit name of the second general-purpose source register holding the multiplier.

**Operation**

Unsigned Multiply High multiplies two 64-bit register values, and writes bits[127:64] of the 128-bit result to the 64-bit destination register.

$Xd = \text{bits}[127:64] \text{ of } Xn * Xm$

**Related information**

[A64 instructions in alphabetical order](#) on page 664

## 17.171 UMULL (A64)

Unsigned Multiply Long.

This instruction is an alias of `UMADDL`.

The equivalent instruction is `UMADDL xd, Wn, Wm, XZR`.

**Syntax**

`UMULL xd, Wn, Wm`

Where:

***xd***

Is the 64-bit name of the general-purpose destination register.

***Wn***

Is the 32-bit name of the first general-purpose source register holding the multiplicand.

***Wm***

Is the 32-bit name of the second general-purpose source register holding the multiplier.

**Operation**

Unsigned Multiply Long multiplies two 32-bit register values, and writes the result to the 64-bit destination register.

$Xd = Wn * Wm$

**Related information**

[UMADDL \(A64\)](#) on page 810

[A64 instructions in alphabetical order](#) on page 664

## 17.172 UXTB (A64)

Unsigned Extend Byte.

This instruction is an alias of `UBFM`.

The equivalent instruction is `UBFM Wd, Wn, #0, #7`.

### Syntax

`UXTB Wd, Wn`

Where:

**`Wd`**

Is the 32-bit name of the general-purpose destination register.

**`Wn`**

Is the 32-bit name of the general-purpose source register.

### Operation

Unsigned Extend Byte extracts an 8-bit value from a register, zero-extends it to the size of the register, and writes the result to the destination register.

```
Wd = ZeroExtend(Wn<7:0>)
```

### Related information

[UBFM \(A64\)](#) on page 807

[A64 instructions in alphabetical order](#) on page 664

## 17.173 UXTH (A64)

Unsigned Extend Halfword.

This instruction is an alias of `UBFM`.

The equivalent instruction is `UBFM Wd, Wn, #0, #15`.

### Syntax

`UXTH Wd, Wn`

Where:

**`Wd`**

Is the 32-bit name of the general-purpose destination register.

**`Wn`**

Is the 32-bit name of the general-purpose source register.

## Operation

Unsigned Extend Halfword extracts a 16-bit value from a register, zero-extends it to the size of the register, and writes the result to the destination register.

```
Wd = ZeroExtend(Wn<15:0>)
```

## Related information

[UBFM \(A64\)](#) on page 807

[A64 instructions in alphabetical order](#) on page 664

# 17.174 WFE (A64)

Wait For Event.

## Usage

Wait For Event is a hint instruction that permits the PE to enter a low-power state until one of several events occurs, including events signaled by executing the `SEV` instruction on any PE in the multiprocessor system. For more information, see *Wait For Event mechanism and Send event* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

As described in *Wait For Event mechanism and Send event* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), the execution of a `WFE` instruction that would otherwise cause entry to a low-power state can be trapped to a higher Exception level. See:

- Traps to EL1 of EL0 execution of `WFE` and `WFI` instructions.
- Traps to EL2 of Non-secure EL0 and EL1 execution of `WFE` and `WFI` instructions.
- Traps to EL3 of EL2, EL1, and EL0 execution of `WFE` and `WFI` instructions.

## Related information

[A64 instructions in alphabetical order](#) on page 664

# 17.175 WFI (A64)

Wait For Interrupt.

## Usage

Wait For Interrupt is a hint instruction that permits the PE to enter a low-power state until one of several asynchronous event occurs. For more information, see *Wait For Interrupt* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

As described in *Wait For Interrupt* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), the execution of a `WFI` instruction that would otherwise cause entry to a low-power state can be trapped to a higher Exception level. See:

- Traps to EL1 of EL0 execution of WFE and WFI instructions.
- Traps to EL2 of Non-secure EL0 and EL1 execution of WFE and WFI instructions.
- Traps to EL3 of EL2, EL1, and EL0 execution of WFE and WFI instructions.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.176 XPACD, XPACI, XPACLRI (A64)

Strip Pointer Authentication Code.

### Syntax

XPACD *Xd* ; XPACD general registers

XPACI *Xd* ; XPACI general registers

XPACLRI

Where:

***Xd***

Is the 64-bit name of the general-purpose destination register.

### Architectures supported

Supported in the Arm®v8.3-A architecture and later.

### Usage

Strip Pointer Authentication Code. This instruction removes the pointer authentication code from an address. The address is in the specified general-purpose register for **XPACI** and **XPACD**, and is in LR for **XPACLRI**.

The **XPACD** instruction is used for data addresses, and **XPACI** and **XPACLRI** are used for instruction addresses.

## Related information

[A64 instructions in alphabetical order](#) on page 664

## 17.177 YIELD (A64)

YIELD.

### Usage

**YIELD** is a hint instruction. Software with a multithreading capability can use a **YIELD** instruction to indicate to the PE that it is performing a task, for example a spin-lock, that could be swapped out

to improve overall system performance. The PE can use this hint to suspend and resume multiple software threads if it supports the capability.

For more information about the recommended use of this instruction, see *The YIELD instruction* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 instructions in alphabetical order](#) on page 664

# 18. A64 Data Transfer Instructions

Describes the A64 data transfer instructions.

## 18.1 A64 data transfer instructions in alphabetical order

A summary of the A64 data transfer instructions and pseudo-instructions that are supported.

**Table 18-1: Summary of A64 data transfer instructions**

| Mnemonic                                               | Brief description                                       | See                                                          |
|--------------------------------------------------------|---------------------------------------------------------|--------------------------------------------------------------|
| CASA, CASAL, CAS, CASL, CASAL, CAS, CASL               | Compare and Swap word or doubleword in memory           | CASA, CASAL, CAS, CASL, CASAL, CAS, CASL (A64)               |
| CASAB, CASALB, CASB, CASLB                             | Compare and Swap byte in memory                         | CASAB, CASALB, CASB, CASLB (A64)                             |
| CASAH, CASALH, CASH, CASLH                             | Compare and Swap halfword in memory                     | CASAH, CASALH, CASH, CASLH (A64)                             |
| CASPA, CASPAL, CASP, CASPL, CASPAL, CASP, CASPL        | Compare and Swap Pair of words or doublewords in memory | CASPA, CASPAL, CASP, CASPL, CASPAL, CASP, CASPL (A64)        |
| LDADD, LDADDAL, LDADD, LDADDL, LDADDAL, LDADD, LDADDL  | Atomic add on word or doubleword in memory              | LDADD, LDADDAL, LDADD, LDADDL, LDADDAL, LDADD, LDADDL (A64)  |
| LDADDAB, LDADDALB, LDADDB, LDADDLB                     | Atomic add on byte in memory                            | LDADDAB, LDADDALB, LDADDB, LDADDLB (A64)                     |
| LDADDAH, LDADDALH, LDADDH, LDADDLH                     | Atomic add on halfword in memory                        | LDADDAH, LDADDALH, LDADDH, LDADDLH (A64)                     |
| LDAPR                                                  | Load-Acquire RCpc Register                              | LDAPR (A64)                                                  |
| LDAPRB                                                 | Load-Acquire RCpc Register Byte                         | LDAPRB (A64)                                                 |
| LDAPRH                                                 | Load-Acquire RCpc Register Halfword                     | LDAPRH (A64)                                                 |
| LDAR                                                   | Load-Acquire Register                                   | LDAR (A64)                                                   |
| LDARB                                                  | Load-Acquire Register Byte                              | LDARB (A64)                                                  |
| LDARH                                                  | Load-Acquire Register Halfword                          | LDARH (A64)                                                  |
| LDAXP                                                  | Load-Acquire Exclusive Pair of Registers                | LDAXP (A64)                                                  |
| LDAXR                                                  | Load-Acquire Exclusive Register                         | LDAXR (A64)                                                  |
| LDAXRB                                                 | Load-Acquire Exclusive Register Byte                    | LDAXRB (A64)                                                 |
| LDAXRH                                                 | Load-Acquire Exclusive Register Halfword                | LDAXRH (A64)                                                 |
| LDCLRA, LDCLRAL, LDCLR, LDCLRL, LDCLRAL, LDCLR, LDCLRL | Atomic bit clear on word or doubleword in memory        | LDCLRA, LDCLRAL, LDCLR, LDCLRL, LDCLRAL, LDCLR, LDCLRL (A64) |
| LDCLRAB, LDCLRALB, LDCLRB, LDCLRLB                     | Atomic bit clear on byte in memory                      | LDCLRAB, LDCLRALB, LDCLRB, LDCLRLB (A64)                     |
| LDCLRAH, LDCLRALH, LDCLRH, LDCLRLH                     | Atomic bit clear on halfword in memory                  | LDCLRAH, LDCLRALH, LDCLRH, LDCLRLH (A64)                     |
| LDEORA, LDEORAL, LDEOR, LDEORL, LDEORAL, LDEOR, LDEORL | Atomic exclusive OR on word or doubleword in memory     | LDEORA, LDEORAL, LDEOR, LDEORL, LDEORAL, LDEOR, LDEORL (A64) |
| LDEORAB, LDEORALB, LDEORB, LDEORLB                     | Atomic exclusive OR on byte in memory                   | LDEORAB, LDEORALB, LDEORB, LDEORLB (A64)                     |

| Mnemonic                                                         | Brief description                                                             | See                                                                                                                                                                                       |
|------------------------------------------------------------------|-------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| LDEORAH, LDEORALH, LDEORH, LDEORLH                               | Atomic exclusive OR on halfword in memory                                     | <a href="#">LDEORAH</a> , <a href="#">LDEORALH</a> , <a href="#">LDEORH</a> , <a href="#">LDEORLH</a> (A64)                                                                               |
| LDLAR                                                            | Load LOAcquire Register                                                       | <a href="#">LDLAR</a> (A64)                                                                                                                                                               |
| LDLARB                                                           | Load LOAcquire Register Byte                                                  | <a href="#">LDLARB</a> (A64)                                                                                                                                                              |
| LDLARH                                                           | Load LOAcquire Register Halfword                                              | <a href="#">LDLARH</a> (A64)                                                                                                                                                              |
| LDNP                                                             | Load Pair of Registers, with non-temporal hint                                | <a href="#">LDNP</a> (A64)                                                                                                                                                                |
| LDP                                                              | Load Pair of Registers                                                        | <a href="#">LDP</a> (A64)                                                                                                                                                                 |
| LDPSW                                                            | Load Pair of Registers Signed Word                                            | <a href="#">LDPSW</a> (A64)                                                                                                                                                               |
| LDR (immediate)                                                  | Load Register (immediate)                                                     | <a href="#">LDR</a> (immediate)                                                                                                                                                           |
| LDR (literal)                                                    | Load Register (literal)                                                       | <a href="#">LDR</a> (literal)                                                                                                                                                             |
| LDR pseudo-instruction                                           | Load a register with either a 32-bit or 64-bit immediate value or any address | <a href="#">LDR</a> pseudo-instruction                                                                                                                                                    |
| LDR (register)                                                   | Load Register (register)                                                      | <a href="#">LDR</a> (register)                                                                                                                                                            |
| LDRAA, LDRAB, LDRAB                                              | Load Register, with pointer authentication                                    | <a href="#">LDRAA</a> , <a href="#">LDRAB</a> , <a href="#">LDRAB</a> (A64)                                                                                                               |
| LDRB (immediate)                                                 | Load Register Byte (immediate)                                                | <a href="#">LDRB</a> (immediate)                                                                                                                                                          |
| LDRB (register)                                                  | Load Register Byte (register)                                                 | <a href="#">LDRB</a> (register)                                                                                                                                                           |
| LDRH (immediate)                                                 | Load Register Halfword (immediate)                                            | <a href="#">LDRH</a> (immediate)                                                                                                                                                          |
| LDRH (register)                                                  | Load Register Halfword (register)                                             | <a href="#">LDRH</a> (register)                                                                                                                                                           |
| LDRSB (immediate)                                                | Load Register Signed Byte (immediate)                                         | <a href="#">LDRSB</a> (immediate)                                                                                                                                                         |
| LDRSB (register)                                                 | Load Register Signed Byte (register)                                          | <a href="#">LDRSB</a> (register)                                                                                                                                                          |
| LDRSH (immediate)                                                | Load Register Signed Halfword (immediate)                                     | <a href="#">LDRSH</a> (immediate)                                                                                                                                                         |
| LDRSH (register)                                                 | Load Register Signed Halfword (register)                                      | <a href="#">LDRSH</a> (register)                                                                                                                                                          |
| LDRSW (immediate)                                                | Load Register Signed Word (immediate)                                         | <a href="#">LDRSW</a> (immediate)                                                                                                                                                         |
| LDRSW (literal)                                                  | Load Register Signed Word (literal)                                           | <a href="#">LDRSW</a> (literal)                                                                                                                                                           |
| LDRSW (register)                                                 | Load Register Signed Word (register)                                          | <a href="#">LDRSW</a> (register)                                                                                                                                                          |
| LDSETA, LDSETAL, LDSET, LDSETL,<br>LDSETAL, LDSET, LDSETL        | Atomic bit set on word or doubleword in memory                                | <a href="#">LDSETA</a> , <a href="#">LDSETAL</a> , <a href="#">LDSET</a> , <a href="#">LDSETL</a> , <a href="#">LDSETAL</a> , <a href="#">LDSET</a> , <a href="#">LDSETL</a> (A64)        |
| LDSETAB, LDSETALB, LDSETB, LDSETLB                               | Atomic bit set on byte in memory                                              | <a href="#">LDSETAB</a> , <a href="#">LDSETALB</a> , <a href="#">LDSETB</a> , <a href="#">LDSETLB</a> (A64)                                                                               |
| LDSETAH, LDSETALH, LDSETH, LDSETLH                               | Atomic bit set on halfword in memory                                          | <a href="#">LDSETAH</a> , <a href="#">LDSETALH</a> , <a href="#">LDSETH</a> , <a href="#">LDSETLH</a> (A64)                                                                               |
| LDSMAXA, LDSMAXAL, LDSMAX, LDSMAXL,<br>LDSMAXAL, LDSMAX, LDSMAXL | Atomic signed maximum on word or doubleword in memory                         | <a href="#">LDSMAXA</a> , <a href="#">LDSMAXAL</a> , <a href="#">LDSMAX</a> , <a href="#">LDSMAXL</a> , <a href="#">LDSMAXAL</a> , <a href="#">LDSMAX</a> , <a href="#">LDSMAXL</a> (A64) |
| LDSMAXAB, LDSMAXALB, LDSMAXB,<br>LDSMAXLB                        | Atomic signed maximum on byte in memory                                       | <a href="#">LDSMAXAB</a> , <a href="#">LDSMAXALB</a> , <a href="#">LDSMAXB</a> , <a href="#">LDSMAXLB</a> (A64)                                                                           |
| LDSMAXAH, LDSMAXALH, LDSMAXH,<br>LDSMAXLH                        | Atomic signed maximum on halfword in memory                                   | <a href="#">LDSMAXAH</a> , <a href="#">LDSMAXALH</a> , <a href="#">LDSMAXH</a> , <a href="#">LDSMAXLH</a> (A64)                                                                           |
| LDSMINA, LDSMINAL, LDSMIN, LDSMINL,<br>LDSMINAL, LDSMIN, LDSMINL | Atomic signed minimum on word or doubleword in memory                         | <a href="#">LDSMINA</a> , <a href="#">LDSMINAL</a> , <a href="#">LDSMIN</a> , <a href="#">LDSMINL</a> , <a href="#">LDSMINAL</a> , <a href="#">LDSMIN</a> , <a href="#">LDSMINL</a> (A64) |
| LDSMINAB, LDSMINALB, LDSMINB,<br>LDSMINLB                        | Atomic signed minimum on byte in memory                                       | <a href="#">LDSMINAB</a> , <a href="#">LDSMINALB</a> , <a href="#">LDSMINB</a> , <a href="#">LDSMINLB</a> (A64)                                                                           |

| Mnemonic                                                         | Brief description                                          | See                                                                       |
|------------------------------------------------------------------|------------------------------------------------------------|---------------------------------------------------------------------------|
| LDSMINAH, LDSMINALH, LDSMINH,<br>LDSMINLH                        | Atomic signed minimum on halfword in memory                | LDSMINAH, LDSMINALH, LDSMINH,<br>LDSMINLH (A64)                           |
| LDTR                                                             | Load Register (unprivileged)                               | LDTR (A64)                                                                |
| LDTRB                                                            | Load Register Byte (unprivileged)                          | LDTRB (A64)                                                               |
| LDTRH                                                            | Load Register Halfword (unprivileged)                      | LDTRH (A64)                                                               |
| LDTRSB                                                           | Load Register Signed Byte (unprivileged)                   | LDTRSB (A64)                                                              |
| LDTRSH                                                           | Load Register Signed Halfword (unprivileged)               | LDTRSH (A64)                                                              |
| LDTRSW                                                           | Load Register Signed Word (unprivileged)                   | LDTRSW (A64)                                                              |
| LDUMAXA, LDUMAXAL, LDUMAX, LDUMAXL,<br>LDUMAXAL, LDUMAX, LDUMAXL | Atomic unsigned maximum on word or doubleword in memory    | LDUMAXA, LDUMAXAL, LDUMAX,<br>LDUMAXL, LDUMAXAL, LDUMAX, LDUMAXL (A64)    |
| LDUMAXAB, LDUMAXALB, LDUMAXB,<br>LDUMAXLB                        | Atomic unsigned maximum on byte in memory                  | LDUMAXAB, LDUMAXALB, LDUMAXB,<br>LDUMAXLB (A64)                           |
| LDUMAXAH, LDUMAXALH, LDUMAXH,<br>LDUMAXLH                        | Atomic unsigned maximum on halfword in memory              | LDUMAXAH, LDUMAXALH, LDUMAXH,<br>LDUMAXLH (A64)                           |
| LDUMINA, LDUMINAL, LDUMIN, LDUMINL,<br>LDUMINAL, LDUMIN, LDUMINL | Atomic unsigned minimum on word or doubleword in memory    | LDUMINA, LDUMINAL, LDUMIN,<br>LDUMINL, LDUMINAL, LDUMIN,<br>LDUMINL (A64) |
| LDUMINAB, LDUMINALB, LDUMINB,<br>LDUMINLB                        | Atomic unsigned minimum on byte in memory                  | LDUMINAB, LDUMINALB, LDUMINB,<br>LDUMINLB (A64)                           |
| LDUMINAH, LDUMINALH, LDUMINH,<br>LDUMINLH                        | Atomic unsigned minimum on halfword in memory              | LDUMINAH, LDUMINALH, LDUMINH,<br>LDUMINLH (A64)                           |
| LDUR                                                             | Load Register (unscaled)                                   | LDUR (A64)                                                                |
| LDURB                                                            | Load Register Byte (unscaled)                              | LDURB (A64)                                                               |
| LDURH                                                            | Load Register Halfword (unscaled)                          | LDURH (A64)                                                               |
| LDURSB                                                           | Load Register Signed Byte (unscaled)                       | LDURSB (A64)                                                              |
| LDURSH                                                           | Load Register Signed Halfword (unscaled)                   | LDURSH (A64)                                                              |
| LDURSW                                                           | Load Register Signed Word (unscaled)                       | LDURSW (A64)                                                              |
| LDXP                                                             | Load Exclusive Pair of Registers                           | LDXP (A64)                                                                |
| LDXR                                                             | Load Exclusive Register                                    | LDXR (A64)                                                                |
| LDXRB                                                            | Load Exclusive Register Byte                               | LDXRB (A64)                                                               |
| LDXRH                                                            | Load Exclusive Register Halfword                           | LDXRH (A64)                                                               |
| PRFM (immediate)                                                 | Prefetch Memory (immediate)                                | PRFM (immediate) (A64)                                                    |
| PRFM (literal)                                                   | Prefetch Memory (literal)                                  | PRFM (literal) (A64)                                                      |
| PRFM (register)                                                  | Prefetch Memory (register)                                 | PRFM (register) (A64)                                                     |
| PRFUM (unscaled offset)                                          | Prefetch Memory (unscaled offset)                          | PRFUM (unscaled offset) (A64)                                             |
| STADD, STADDL, STADDL                                            | Atomic add on word or doubleword in memory, without return | STADD, STADDL, STADDL (A64)                                               |
| STADDB, STADDLB                                                  | Atomic add on byte in memory, without return               | STADDB, STADDLB (A64)                                                     |
| STADDH, STADDLH                                                  | Atomic add on halfword in memory, without return           | STADDH, STADDLH (A64)                                                     |

| Mnemonic                    | Brief description                                                     | See                               |
|-----------------------------|-----------------------------------------------------------------------|-----------------------------------|
| STCLR, STCLRL, STCIRL       | Atomic bit clear on word or doubleword in memory, without return      | STCLR, STCLRL, STCLRL (A64)       |
| STCLRB, STCLRLB             | Atomic bit clear on byte in memory, without return                    | STCLRB, STCLRLB (A64)             |
| STCLRH, STCLRLH             | Atomic bit clear on halfword in memory, without return                | STCLRH, STCLRLH (A64)             |
| STEOR, STEORL, STEORL       | Atomic exclusive OR on word or doubleword in memory, without return   | STEOR, STEORL, STEORL (A64)       |
| STEORB, STEORLB             | Atomic exclusive OR on byte in memory, without return                 | STEORB, STEORLB (A64)             |
| STEORH, STEORLH             | Atomic exclusive OR on halfword in memory, without return             | STEORH, STEORLH (A64)             |
| STLLR                       | Store Lorelease Register                                              | STLLR (A64)                       |
| STLLRB                      | Store Lorelease Register Byte                                         | STLLRB (A64)                      |
| STLLRH                      | Store Lorelease Register Halfword                                     | STLLRH (A64)                      |
| STLR                        | Store-Release Register                                                | STLR (A64)                        |
| STLRB                       | Store-Release Register Byte                                           | STLRB (A64)                       |
| STLRH                       | Store-Release Register Halfword                                       | STLRH (A64)                       |
| STLXP                       | Store-Release Exclusive Pair of registers                             | STLXP (A64)                       |
| STLXR                       | Store-Release Exclusive Register                                      | STLXR (A64)                       |
| STLXRB                      | Store-Release Exclusive Register Byte                                 | STLXRB (A64)                      |
| STLXRH                      | Store-Release Exclusive Register Halfword                             | STLXRH (A64)                      |
| STNP                        | Store Pair of Registers, with non-temporal hint                       | STNP (A64)                        |
| STP                         | Store Pair of Registers                                               | STP (A64)                         |
| STR (immediate)             | Store Register (immediate)                                            | STR (immediate) (A64)             |
| STR (register)              | Store Register (register)                                             | STR (register) (A64)              |
| STRB (immediate)            | Store Register Byte (immediate)                                       | STRB (immediate) (A64)            |
| STRB (register)             | Store Register Byte (register)                                        | STRB (register) (A64)             |
| STRH (immediate)            | Store Register Halfword (immediate)                                   | STRH (immediate) (A64)            |
| STRH (register)             | Store Register Halfword (register)                                    | STRH (register) (A64)             |
| STSET, STSETL, STSETL       | Atomic bit set on word or doubleword in memory, without return        | STSET, STSETL, STSETL (A64)       |
| STSETB, STSETLB             | Atomic bit set on byte in memory, without return                      | STSETB, STSETLB (A64)             |
| STSETH, STSETLH             | Atomic bit set on halfword in memory, without return                  | STSETH, STSETLH (A64)             |
| STS MAX, STS MAXL, STS MAXL | Atomic signed maximum on word or doubleword in memory, without return | STS MAX, STS MAXL, STS MAXL (A64) |
| STS MAXB, STS MAXLB         | Atomic signed maximum on byte in memory, without return               | STS MAXB, STS MAXLB (A64)         |
| STS MAXH, STS MAXLH         | Atomic signed maximum on halfword in memory, without return           | STS MAXH, STS MAXLH (A64)         |

| Mnemonic                                    | Brief description                                                       | See                                               |
|---------------------------------------------|-------------------------------------------------------------------------|---------------------------------------------------|
| STSMIN, STSMINL, STSMINL                    | Atomic signed minimum on word or doubleword in memory, without return   | STSMIN, STSMINL, STSMINL (A64)                    |
| STSMINB, STSMINLB                           | Atomic signed minimum on byte in memory, without return                 | STSMINB, STSMINLB (A64)                           |
| STSMINH, STSMINLH                           | Atomic signed minimum on halfword in memory, without return             | STSMINH, STSMINLH (A64)                           |
| STTR                                        | Store Register (unprivileged)                                           | STTR (A64)                                        |
| STTRB                                       | Store Register Byte (unprivileged)                                      | STTRB (A64)                                       |
| STTRH                                       | Store Register Halfword (unprivileged)                                  | STTRH (A64)                                       |
| STUMAX, STUMAXL, STUMAXL                    | Atomic unsigned maximum on word or doubleword in memory, without return | STUMAX, STUMAXL, STUMAXL (A64)                    |
| STUMAXB, STUMAXLB                           | Atomic unsigned maximum on byte in memory, without return               | STUMAXB, STUMAXLB (A64)                           |
| STUMAXH, STUMAXLH                           | Atomic unsigned maximum on halfword in memory, without return           | STUMAXH, STUMAXLH (A64)                           |
| STUMIN, STUMINL, STUMINL                    | Atomic unsigned minimum on word or doubleword in memory, without return | STUMIN, STUMINL, STUMINL (A64)                    |
| STUMINB, STUMINLB                           | Atomic unsigned minimum on byte in memory, without return               | STUMINB, STUMINLB (A64)                           |
| STUMINH, STUMINLH                           | Atomic unsigned minimum on halfword in memory, without return           | STUMINH, STUMINLH (A64)                           |
| STUR                                        | Store Register (unscaled)                                               | STUR (A64)                                        |
| STURB                                       | Store Register Byte (unscaled)                                          | STURB (A64)                                       |
| STURH                                       | Store Register Halfword (unscaled)                                      | STURH (A64)                                       |
| STXP                                        | Store Exclusive Pair of registers                                       | STXP (A64)                                        |
| STXR                                        | Store Exclusive Register                                                | STXR (A64)                                        |
| STXRB                                       | Store Exclusive Register Byte                                           | STXRB (A64)                                       |
| STXRH                                       | Store Exclusive Register Halfword                                       | STXRH (A64)                                       |
| SWPA, SWPAL, SWP, SWPL, SWPAL, SWP,<br>SWPL | Swap word or doubleword in memory                                       | SWPA, SWPAL, SWP, SWPL, SWPAL, SWP,<br>SWPL (A64) |
| SWPAB, SWPALB, SWPB, SWPLB                  | Swap byte in memory                                                     | SWPAB, SWPALB, SWPB, SWPLB (A64)                  |
| SWPAH, SWPALH, SWPH, SWPLH                  | Swap halfword in memory                                                 | SWPAH, SWPALH, SWPH, SWPLH (A64)                  |

## 18.2 CASA, CASAL, CAS, CASL, CASAL, CAS, CASL (A64)

Compare and Swap word or doubleword in memory.

### Syntax

```
CASA Ws, Wt, [Xn|SP,#0] ; 32-bit, acquire general registers
```

```
CASAL Ws, Wt, [Xn|SP,#0] ; 32-bit, acquire and release general registers
```

```
CAS Ws, Wt, [Xn|SP,#0] ; 32-bit, no memory ordering general registers
```

```
CASL Ws, Wt, [Xn|SP,#0] ; 32-bit, release general registers

CASA Xs, Xt, [Xn|SP,#0] ; 64-bit, acquire general registers

CASAL Xs, Xt, [Xn|SP,#0] ; 64-bit, acquire and release general registers

CAS Xs, Xt, [Xn|SP,#0] ; 64-bit, no memory ordering general registers

CASL Xs, Xt, [Xn|SP,#0] ; 64-bit, release general registers
```

Where:

#### ***Ws***

Is the 32-bit name of the general-purpose register to be compared and loaded.

#### ***Wt***

Is the 32-bit name of the general-purpose register to be conditionally stored.

#### ***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

#### ***Xs***

Is the 64-bit name of the general-purpose register to be compared and loaded.

#### ***Xt***

Is the 64-bit name of the general-purpose register to be conditionally stored.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Compare and Swap word or doubleword in memory reads a 32-bit word or 64-bit doubleword from memory, and compares it against the value held in a first register. If the comparison is equal, the value in a second register is written to memory. If the write is performed, the read and write occur atomically such that no other modification of the memory location can take place between the read and write.

- CASA and CASAL load from memory with acquire semantics.
- CASL and CASAL store to memory with release semantics.
- CAS has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The architecture permits that the data read clears any exclusive monitors associated with that location, even if the compare then fails.

If the instruction generates a synchronous Data Abort, the register which is compared and loaded, that is *ws*, or *xs*, is restored to the value held in the register before the instruction was executed.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.3 CASAB, CASALB, CASB, CASLB (A64)

Compare and Swap byte in memory.

## Syntax

CASAB *ws*, *wt*, [Xn|SP,#0] ; Acquire general registers

CASALB *ws*, *wt*, [Xn|SP,#0] ; Acquire and release general registers

CASB *ws*, *wt*, [Xn|SP,#0] ; No memory ordering general registers

CASLB *ws*, *wt*, [Xn|SP,#0] ; Release general registers

Where:

### **ws**

Is the 32-bit name of the general-purpose register to be compared and loaded.

### **wt**

Is the 32-bit name of the general-purpose register to be conditionally stored.

### **Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Compare and Swap byte in memory reads an 8-bit byte from memory, and compares it against the value held in a first register. If the comparison is equal, the value in a second register is written to memory. If the write is performed, the read and write occur atomically such that no other modification of the memory location can take place between the read and write.

- CASAB and CASALB load from memory with acquire semantics.
- CASLB and CASALB store to memory with release semantics.
- CASB has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The architecture permits that the data read clears any exclusive monitors associated with that location, even if the compare then fails.

If the instruction generates a synchronous Data Abort, the register which is compared and loaded, that is *ws*, is restored to the values held in the register before the instruction was executed.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.4 CASAH, CASALH, CASH, CASLH (A64)

Compare and Swap halfword in memory.

### Syntax

CASAH *Ws*, *Wt*, [Xn|SP,#0] ; Acquire general registers

CASALH *Ws*, *Wt*, [Xn|SP,#0] ; Acquire and release general registers

CASH *Ws*, *Wt*, [Xn|SP,#0] ; No memory ordering general registers

CASLH *Ws*, *Wt*, [Xn|SP,#0] ; Release general registers

Where:

**Ws**

Is the 32-bit name of the general-purpose register to be compared and loaded.

**Wt**

Is the 32-bit name of the general-purpose register to be conditionally stored.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Compare and Swap halfword in memory reads a 16-bit halfword from memory, and compares it against the value held in a first register. If the comparison is equal, the value in a second register is written to memory. If the write is performed, the read and write occur atomically such that no other modification of the memory location can take place between the read and write.

- CASAH and CASALH load from memory with acquire semantics.
- CASLH and CASALH store to memory with release semantics.

- CAS has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The architecture permits that the data read clears any exclusive monitors associated with that location, even if the compare then fails.

If the instruction generates a synchronous Data Abort, the register which is compared and loaded, that is *ws*, is restored to the values held in the register before the instruction was executed.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.5 CASPA, CASPAL, CASP, CASPL, CASPAL, CASP, CASPL (A64)

Compare and Swap Pair of words or doublewords in memory.

### Syntax

CASPA *Ws*, <W(s+1)>, *Wt*, <W(t+1)>, [Xn|SP,#0] ; 32-bit, acquire general registers

CASPAL *Ws*, <W(s+1)>, *Wt*, <W(t+1)>, [Xn|SP,#0] ; 32-bit, acquire and release general registers

CASP *Ws*, <W(s+1)>, *Wt*, <W(t+1)>, [Xn|SP,#0] ; 32-bit, no memory ordering general registers

CASPL *Ws*, <W(s+1)>, *Wt*, <W(t+1)>, [Xn|SP,#0] ; 32-bit, release general registers

CASPA *Xs*, <X(s+1)>, *Xt*, <X(t+1)>, [Xn|SP,#0] ; 64-bit, acquire general registers

CASPAL *Xs*, <X(s+1)>, *Xt*, <X(t+1)>, [Xn|SP,#0] ; 64-bit, acquire and release general registers

CASP *Xs*, <X(s+1)>, *Xt*, <X(t+1)>, [Xn|SP,#0] ; 64-bit, no memory ordering general registers

CASPL *Xs*, <X(s+1)>, *Xt*, <X(t+1)>, [Xn|SP,#0] ; 64-bit, release general registers

Where:

### Ws

Is the 32-bit name of the first general-purpose register to be compared and loaded.

**<W(s+1)>**

Is the 32-bit name of the second general-purpose register to be compared and loaded.

**wt**

Is the 32-bit name of the first general-purpose register to be conditionally stored.

**<W(t+1)>**

Is the 32-bit name of the second general-purpose register to be conditionally stored.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**xs**

Is the 64-bit name of the first general-purpose register to be compared and loaded.

**<X(s+1)>**

Is the 64-bit name of the second general-purpose register to be compared and loaded.

**xt**

Is the 64-bit name of the first general-purpose register to be conditionally stored.

**<X(t+1)>**

Is the 64-bit name of the second general-purpose register to be conditionally stored.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Compare and Swap Pair of words or doublewords in memory reads a pair of 32-bit words or 64-bit doublewords from memory, and compares them against the values held in the first pair of registers. If the comparison is equal, the values in the second pair of registers are written to memory. If the writes are performed, the reads and writes occur atomically such that no other modification of the memory location can take place between the reads and writes.

- `CASPA` and `CASPAL` load from memory with acquire semantics.
- `CASPL` and `CASPAL` store to memory with release semantics.
- `CAS` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The architecture permits that the data read clears any exclusive monitors associated with that location, even if the compare then fails.

If the instruction generates a synchronous Data Abort, the registers which are compared and loaded, that is `ws` and `<W(s+1)>`, or `xs` and `<X(s+1)>`, are restored to the values held in the registers before the instruction was executed.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.6 LDADDA, LDADDAL, LDADD, LDADDL, LDADDAL, LDADD, LDADDL (A64)

Atomic add on word or doubleword in memory.

## Syntax

```
LDADDA Ws, Wt, [Xn|SP] ; 32-bit, acquire general registers
LDADDAL Ws, Wt, [Xn|SP] ; 32-bit, acquire and release general registers
LDADD Ws, Wt, [Xn|SP] ; 32-bit, no memory ordering general registers
LDADDL Ws, Wt, [Xn|SP] ; 32-bit, release general registers
LDADDA Xs, Xt, [Xn|SP] ; 64-bit, acquire general registers
LDADDAL Xs, Xt, [Xn|SP] ; 64-bit, acquire and release general registers
LDADD Xs, Xt, [Xn|SP] ; 64-bit, no memory ordering general registers
LDADDL Xs, Xt, [Xn|SP] ; 64-bit, release general registers
```

Where:

### Ws

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

### Wt

Is the 32-bit name of the general-purpose register to be loaded.

### Xn | SP

Is the 64-bit name of the general-purpose base register or stack pointer.

### Xs

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

### Xt

Is the 64-bit name of the general-purpose register to be loaded.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic add on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, adds the value held in a register to it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

- If the destination register is not one of `wZR` or `xZR`, `LDADDA` and `LDADDAL` load from memory with acquire semantics.
- `LDADDL` and `LDADDAL` store to memory with release semantics.
- `LDADD` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.7 LDADDAB, LDADDALB, LDADDB, LDADDLB (A64)

Atomic add on byte in memory.

### Syntax

`LDADDAB Ws, Wt, [Xn|SP] ; Acquire general registers`

`LDADDALB Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`LDADDB Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`LDADDLB Ws, Wt, [Xn|SP] ; Release general registers`

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic add on byte in memory atomically loads an 8-bit byte from memory, adds the value held in a register to it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDADDAB` and `LDADDALB` load from memory with acquire semantics.
- `LDADDLB` and `LDADDALB` store to memory with release semantics.
- `LDADDB` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.8 LDADDAH, LDADDALH, LDADDH, LDADDLH (A64)

Atomic add on halfword in memory.

### Syntax

`LDADDAH Ws, Wt, [Xn|SP] ; Acquire general registers`

`LDADDALH Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`LDADDH Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`LDADDLH Ws, Wt, [Xn|SP] ; Release general registers`

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic add on halfword in memory atomically loads a 16-bit halfword from memory, adds the value held in a register to it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDADDH` and `LDADDALH` load from memory with acquire semantics.
- `LDADDLH` and `LDADDALH` store to memory with release semantics.
- `LDADDH` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.9 LDAPR (A64)

Load-Acquire RCpc Register.

### Syntax

`LDAPR Wt, [Xn|SP ,#0] ; 32-bit`

`LDAPR Xt, [Xn|SP ,#0] ; 64-bit`

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be loaded.

**Xt**

Is the 64-bit name of the general-purpose register to be loaded.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

This instruction is supported in the Arm®v8.3-A architecture and later. It is optionally supported in the Armv8.2-A architecture with the RCpc extension.

## Usage

Load-Acquire RCpc Register derives an address from a base register value, loads a 32-bit word or 64-bit doubleword from the derived address in memory, and writes it to a register.

The instruction has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), except that:

- There is no ordering requirement, separate from the requirements of a Load-Acquirepc or a Store-Release, created by having a Store-Release followed by a Load-Acquirepc instruction.
- The reading of a value written by a Store-Release by a Load-Acquirepc instruction by the same observer does not make the write of the Store-Release globally observed.

This difference in memory ordering is not described in the pseudocode.

For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.10 LDAPRB (A64)

Load-Acquire RCpc Register Byte.

### Syntax

`LDAPRB Wt, [Xn|SP ,#0]`

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be loaded.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

This instruction is supported in the Arm®v8.3-A architecture and later. It is optionally supported in the Armv8.2-A architecture with the RCpc extension.

### Usage

Load-Acquire RCpc Register Byte derives an address from a base register value, loads a byte from the derived address in memory, zero-extends it and writes it to a register.

The instruction has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), except that:

- There is no ordering requirement, separate from the requirements of a Load-Acquirepc or a Store-Release, created by having a Store-Release followed by a Load-Acquirepc instruction.
- The reading of a value written by a Store-Release by a Load-Acquirepc instruction by the same observer does not make the write of the Store-Release globally observed.

This difference in memory ordering is not described in the pseudocode.

For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.11 LDAPRH (A64)

Load-Acquire RCpc Register Halfword.

### Syntax

`LDAPRH Wt, [Xn|SP ,#0]`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be loaded.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

This instruction is supported in the Arm®v8.3-A architecture and later. It is optionally supported in Armv8.2-A architecture with the RCpc extension.

### Usage

Load-Acquire RCpc Register Halfword derives an address from a base register value, loads a halfword from the derived address in memory, zero-extends it and writes it to a register.

The instruction has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), except that:

- There is no ordering requirement, separate from the requirements of a Load-Acquirepc or a Store-Release, created by having a Store-Release followed by a Load-Acquirepc instruction.
- The reading of a value written by a Store-Release by a Load-Acquirepc instruction by the same observer does not make the write of the Store-Release globally observed.

This difference in memory ordering is not described in the pseudocode.

For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.12 LDAR (A64)

Load-Acquire Register.

### Syntax

`LDAR Wt, [Xn|SP,#0] ; 32-bit`

`LDAR Xt, [Xn|SP,#0] ; 64-bit`

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xt**

Is the 64-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Usage

Load-Acquire Register derives an address from a base register value, loads a 32-bit word or 64-bit doubleword from memory, and writes it to a register. The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.13 LDARB (A64)

Load-Acquire Register Byte.

### Syntax

`LDARB Wt, [Xn|SP,#0]`

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Load-Acquire Register Byte derives an address from a base register value, loads a byte from memory, zero-extends it and writes it to a register. The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.14 LDARH (A64)

Load-Acquire Register Halfword.

### Syntax

`LDARH Wt, [Xn|SP,#0]`

Where:

**`Wt`**

Is the 32-bit name of the general-purpose register to be transferred.

**`Xn|SP`**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Usage

Load-Acquire Register Halfword derives an address from a base register value, loads a halfword from memory, zero-extends it, and writes it to a register. The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.15 LDAXP (A64)

Load-Acquire Exclusive Pair of Registers.

### Syntax

`LDAXP Wt1, Wt2, [Xn|SP,#0] ; 32-bit`

`LDAXP Xt1, Xt2, [Xn|SP,#0] ; 64-bit`

Where:

**Wt1**

Is the 32-bit name of the first general-purpose register to be transferred.

**Wt2**

Is the 32-bit name of the second general-purpose register to be transferred.

**Xt1**

Is the 64-bit name of the first general-purpose register to be transferred.

**Xt2**

Is the 64-bit name of the second general-purpose register to be transferred.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Load-Acquire Exclusive Pair of Registers derives an address from a base register value, loads two 32-bit words or two 64-bit doublewords from memory, and writes them to two registers. A 32-bit pair requires the address to be doubleword aligned and is single-copy atomic at doubleword granularity. A 64-bit pair requires the address to be quadword aligned and is single-copy atomic for each doubleword at doubleword granularity. The PE marks the physical address being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly `LDAXP`.

---

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.16 LDAXR (A64)

Load-Acquire Exclusive Register.

### Syntax

`LDAZR Wt, [Xn|SP,#0] ; 32-bit`

`LDAZR Xt, [Xn|SP,#0] ; 64-bit`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load-Acquire Exclusive Register derives an address from a base register value, loads a 32-bit word or 64-bit doubleword from memory, and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.17 LDAXRB (A64)

Load-Acquire Exclusive Register Byte.

**Syntax**

`LDAXRB Wt, [Xn|SP,#0]`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load-Acquire Exclusive Register Byte derives an address from a base register value, loads a byte from memory, zero-extends it and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory

accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.18 LDAXRH (A64)

Load-Acquire Exclusive Register Halfword.

### Syntax

```
LDAXRH Wt, [Xn|SP,#0]
```

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

### Usage

Load-Acquire Exclusive Register Halfword derives an address from a base register value, loads a halfword from memory, zero-extends it and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.19 LDCLRA, LDCLRAL, LDCLR, LDCLRL, LDCLRAL, LDCLR, LDCLRL (A64)

Atomic bit clear on word or doubleword in memory.

### Syntax

```
LDCLRA Ws, Wt, [Xn|SP] ; 32-bit, acquire general registers
```

```
LDCLRAL Ws, Wt, [Xn|SP] ; 32-bit, acquire and release general registers
```

```
LDCLR Ws, Wt, [Xn|SP] ; 32-bit, no memory ordering general registers

LDCLRL Ws, Wt, [Xn|SP] ; 32-bit, release general registers

LDCLRA Xs, Xt, [Xn|SP] ; 64-bit, acquire general registers

LDCLRAL Xs, Xt, [Xn|SP] ; 64-bit, acquire and release general registers

LDCLR Xs, Xt, [Xn|SP] ; 64-bit, no memory ordering general registers

LDCLRL Xs, Xt, [Xn|SP] ; 64-bit, release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Wt**

Is the 32-bit name of the general-purpose register to be loaded.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Xs**

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xt**

Is the 64-bit name of the general-purpose register to be loaded.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic bit clear on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, performs a bitwise AND with the complement of the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

- If the destination register is not one of `wZR` or `xZR`, `LDCLRA` and `LDCLRAL` load from memory with acquire semantics.
- `LDCLRL` and `LDCLRAL` store to memory with release semantics.
- `LDCLR` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.20 LDCLRAB, LDCLRALB, LDCLRAB, LDCLRLB (A64)

Atomic bit clear on byte in memory.

## Syntax

`LDCLRAB Ws, Wt, [Xn|SP] ; Acquire general registers`

`LDCLRALB Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`LDCLRAB Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`LDCLRLB Ws, Wt, [Xn|SP] ; Release general registers`

Where:

### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic bit clear on byte in memory atomically loads an 8-bit byte from memory, performs a bitwise AND with the complement of the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDCLRAB` and `LDCLRALB` load from memory with acquire semantics.
- `LDCLRLB` and `LDCLRALB` store to memory with release semantics.
- `LDCLRAB` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.21 LDCLRAH, LDCLRALH, LDCLRH, LDCLRLH (A64)

Atomic bit clear on halfword in memory.

## Syntax

`LDCLRAH Ws, Wt, [Xn|SP] ; Acquire general registers`

`LDCLRALH Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`LDCLRH Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`LDCLRLH Ws, Wt, [Xn|SP] ; Release general registers`

Where:

### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic bit clear on halfword in memory atomically loads a 16-bit halfword from memory, performs a bitwise AND with the complement of the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDCLRAH` and `LDCLRALH` load from memory with acquire semantics.
- `LDCLRLH` and `LDCLRALH` store to memory with release semantics.
- `LDCLRH` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.22 LDEORA, LDEORAL, LDEOR, LDEORL, LDEORAL, LDEOR, LDEORL (A64)

Atomic exclusive OR on word or doubleword in memory.

### Syntax

```
LDEORA Ws, Wt, [Xn|SP] ; 32-bit, acquire general registers
LDEORAL Ws, Wt, [Xn|SP] ; 32-bit, acquire and release general registers
LDEOR Ws, Wt, [Xn|SP] ; 32-bit, no memory ordering general registers
LDEORL Ws, Wt, [Xn|SP] ; 32-bit, release general registers
LDEORA Xs, Xt, [Xn|SP] ; 64-bit, acquire general registers
LDEORAL Xs, Xt, [Xn|SP] ; 64-bit, acquire and release general registers
LDEOR Xs, Xt, [Xn|SP] ; 64-bit, no memory ordering general registers
LDEORL Xs, Xt, [Xn|SP] ; 64-bit, release general registers
```

Where:

#### Ws

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### Wt

Is the 32-bit name of the general-purpose register to be loaded.

#### Xn | SP

Is the 64-bit name of the general-purpose base register or stack pointer.

#### Xs

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### Xt

Is the 64-bit name of the general-purpose register to be loaded.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic exclusive OR on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, performs an exclusive OR with the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

- If the destination register is not one of `wZR` or `xZR`, `LDEORA` and `LDEORAL` load from memory with acquire semantics.
- `LDEORL` and `LDEORAL` store to memory with release semantics.
- `LDEOR` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.23 LDEORAB, LDEORALB, LDEORB, LDEORLB (A64)

Atomic exclusive OR on byte in memory.

### Syntax

```
LDEORAB Ws, Wt, [Xn|SP] ; Acquire general registers
LDEORALB Ws, Wt, [Xn|SP] ; Acquire and release general registers
LDEORB Ws, Wt, [Xn|SP] ; No memory ordering general registers
LDEORLB Ws, Wt, [Xn|SP] ; Release general registers
```

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

#### **Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic exclusive OR on byte in memory atomically loads an 8-bit byte from memory, performs an exclusive OR with the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDEORAB` and `LDEORALB` load from memory with acquire semantics.
- `LDEORLB` and `LDEORALB` store to memory with release semantics.
- `LDEORB` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.24 LDEORAH, LDEORALH, LDEORH, LDEORLH (A64)

Atomic exclusive OR on halfword in memory.

### Syntax

`LDEORAH Ws, Wt, [Xn|SP] ; Acquire general registers`

`LDEORALH Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`LDEORH Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`LDEORLH Ws, Wt, [Xn|SP] ; Release general registers`

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic exclusive OR on halfword in memory atomically loads a 16-bit halfword from memory, performs an exclusive OR with the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDEORAH` and `LDEORALH` load from memory with acquire semantics.
- `LDEORLH` and `LDEORALH` store to memory with release semantics.
- `LDEORH` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.25 LDLAR (A64)

Load LOAcquire Register.

### Syntax

`LDLAR Wt, [Xn|SP,#0] ; 32-bit`

`LDLAR Xt, [Xn|SP,#0] ; 64-bit`

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xt**

Is the 64-bit name of the general-purpose register to be transferred.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Load LOAcquire Register loads a 32-bit word or 64-bit doubleword from memory, and writes it to a register. The instruction also has memory ordering semantics as described in *Load LOAcquire, Store LORelease* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.26 LDLARB (A64)

Load LOAcquire Register Byte.

### Syntax

`LDLARB Wt, [Xn | SP, #0]`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn* | *SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Load LOAcquire Register Byte loads a byte from memory, zero-extends it and writes it to a register. The instruction also has memory ordering semantics as described in *Load LOAcquire, Store LORelease* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.27 LDLARH (A64)

Load LOAcquire Register Halfword.

### Syntax

`LDLARH Wt, [Xn | SP, #0]`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

**Usage**

Load LOAcquire Register Halfword loads a halfword from memory, zero-extends it, and writes it to a register. The instruction also has memory ordering semantics as described in *Load LOAcquire, Store LORelease* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.28 LDNP (A64)

Load Pair of Registers, with non-temporal hint.

**Syntax**

```
LDNP Wt1, Wt2, [Xn|SP, #{imm}] ; 32-bit, Signed offset
```

```
LDNP Xt1, Xt2, [Xn|SP, #{imm}] ; 64-bit, Signed offset
```

Where:

**Wt1**

Is the 32-bit name of the first general-purpose register to be transferred.

**Wt2**

Is the 32-bit name of the second general-purpose register to be transferred.

**imm**

Depends on the instruction variant:

**32-bit general registers** Is the optional signed immediate byte offset, a multiple of 4 in the range -256 to 252, defaulting to 0.

**64-bit general registers** Is the optional signed immediate byte offset, a multiple of 8 in the range -512 to 504, defaulting to 0.

**Xt1**

Is the 64-bit name of the first general-purpose register to be transferred.

**Xt2**

Is the 64-bit name of the second general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load Pair of Registers, with non-temporal hint, calculates an address from a base register value and an immediate offset, loads two 32-bit words or two 64-bit doublewords from memory, and writes them to two registers.

For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about Non-temporal pair instructions, see *Load/Store Non-temporal pair* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



**Note** For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly `LDNP`.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.29 LDP (A64)

Load Pair of Registers.

**Syntax**

```
LDP Wt1, Wt2, [Xn|SP], #imm ; 32-bit, Post-index
LDP Xt1, Xt2, [Xn|SP], #imm ; 64-bit, Post-index
LDP Wt1, Wt2, [Xn|SP, #imm]! ; 32-bit, Pre-index
LDP Xt1, Xt2, [Xn|SP, #imm]! ; 64-bit, Pre-index
LDP Wt1, Wt2, [Xn|SP, #{imm}] ; 32-bit, Signed offset
LDP Xt1, Xt2, [Xn|SP, #{imm}] ; 64-bit, Signed offset
```

Where:

**Wt1**

Is the 32-bit name of the first general-purpose register to be transferred.

**Wt2**

Is the 32-bit name of the second general-purpose register to be transferred.

***imm***

Depends on the instruction variant:

**32-bit general registers** Is the signed immediate byte offset, a multiple of 4 in the range -256 to 252.

**64-bit general registers** Is the signed immediate byte offset, a multiple of 8 in the range -512 to 504.

***xt1***

Is the 64-bit name of the first general-purpose register to be transferred.

***xt2***

Is the 64-bit name of the second general-purpose register to be transferred.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Load Pair of Registers calculates an address from a base register value and an immediate offset, loads two 32-bit words or two 64-bit doublewords from memory, and writes them to two registers. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly `LDP`.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.30 LDPSW (A64)

Load Pair of Registers Signed Word.

### Syntax

```
LDPSW Xt1, Xt2, [Xn|SP], #imm ; Post-index general registers
```

```
LDPSW Xt1, Xt2, [Xn|SP, #imm]! ; Pre-index general registers
```

```
LDPSW Xt1, Xt2, [Xn|SP, #{imm}] ; Signed offset general registers
```

Where:

***imm***

Depends on the instruction variant:

**Post-index and Pre-index general registers** Is the signed immediate byte offset, a multiple of 4 in the range -256 to 252.

**Signed offset general registers** Is the optional signed immediate byte offset, a multiple of 4 in the range -256 to 252, defaulting to 0.

***Xt1***

Is the 64-bit name of the first general-purpose register to be transferred.

***Xt2***

Is the 64-bit name of the second general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load Pair of Registers Signed Word calculates an address from a base register value and an immediate offset, loads two 32-bit words from memory, sign-extends them, and writes them to two registers. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly `LDPSW`.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.31 LDR (immediate)

Load Register (immediate).

**Syntax**

```
LDR Wt, [Xn|SP], #simm ; 32-bit, Post-index
LDR Xt, [Xn|SP], #simm ; 64-bit, Post-index
LDR Wt, [Xn|SP, #simm]! ; 32-bit, Pre-index
LDR Xt, [Xn|SP, #simm]! ; 64-bit, Pre-index
LDR Wt, [Xn|SP, #{pimm}] ; 32-bit
LDR Xt, [Xn|SP, #{pimm}] ; 64-bit
```

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***simm***

Is the signed immediate byte offset, in the range -256 to 255.

***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

***pimm***

Depends on the instruction variant:

**32-bit general registers** Is the optional positive immediate byte offset, a multiple of 4 in the range 0 to 16380, defaulting to 0.

**64-bit general registers** Is the optional positive immediate byte offset, a multiple of 8 in the range 0 to 32760, defaulting to 0.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load Register (immediate) loads a word or doubleword from memory and writes it to a register. The address that is used for the load is calculated from a base register and an immediate offset. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). The Unsigned offset variant scales the immediate offset value by the size of the value accessed before adding it to the base register value.



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly *LDR (immediate)*.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.32 LDR (literal)

Load Register (literal).

**Syntax**

`LDR Wt, label ; 32-bit`

`LDR Xt, label ; 64-bit`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be loaded.

***Xt***

Is the 64-bit name of the general-purpose register to be loaded.

***label***

Is the program label from which the data is to be loaded. Its offset from the address of this instruction, in the range  $\pm 1\text{MB}$ .

**Usage**

Load Register (literal) calculates an address from the PC value and an immediate offset, loads a word from memory, and writes it to a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.33 LDR pseudo-instruction

Load a register with either a 32-bit or 64-bit immediate value or an address.



This description is for the `LDR` pseudo-instruction only, and not for the `lDR` instruction.

**Syntax**

`LDR Wd, =expr`

`LDR Xd, =expr`

`LDR Wd, =label_expr`

`LDR Xd, =label_expr`

where:

***Wd***

Is the register to load with a 32-bit value.

***Xd***

Is the register to load with a 64-bit value.

***expr***

Evaluates to a numeric value.

***label\_expr***

Is a PC-relative or external expression of an address in the form of a label plus or minus a numeric value.

**Usage**

When using the `LDR` pseudo-instruction, the assembler places the value of `expr` or `label_expr` in a literal pool and generates a PC-relative `LDR` instruction that reads the constant from the literal pool.



- An address loaded in this way is fixed at link time, so the code is not position-independent.
- The address holding the constant remains valid regardless of where the linker places the ELF section containing the `LDR` instruction.

If `label_expr` is an external expression, or is not contained in the current section, the assembler places a linker relocation directive in the object file. The linker generates the address at link time.

If `label_expr` is a local label, the assembler places a linker relocation directive in the object file and generates a symbol for that local label. The address is generated at link time.

The offset from the PC to the value in the literal pool must be less than ±1MB . You are responsible for ensuring that there is a literal pool within range.

**Examples**

```

LDR      w1,=0xffff    ; loads 0xffff into W1
                ; => LDR w1,[pc,offset_to_litpool]
                ; ...
                ;     litpool DCD 4095

LDR      x2,=place     ; loads the address of
                ; place into X2
                ; => LDR x2,[pc,offset_to_litpool]
                ; ...
                ;     litpool DCQ place

```

**Related information**

[Numeric constants](#) on page 220

[Register-relative and PC-relative expressions](#) on page 222

[Numeric local labels](#) on page 225

[Load immediate values using LDR Rd, =const](#) on page 109

[A-Profile Architectures](#)

## 18.34 LDR (register)

Load Register (register).

### Syntax

```
LDR Wt, [Xn|SP, (Wm|Xm) {, extend {amount} }] ; 32-bit
```

```
LDR Xt, [Xn|SP, (Wm|Xm) {, extend {amount} }] ; 64-bit
```

Where:

#### ***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

#### ***amount***

Is the index shift amount, optional only when *extend* is not LSL. Where it is permitted to be optional, it defaults to #0. It is:

#### **32-bit general registers**

Can be one of #0 or #2.

#### **64-bit general registers**

Can be one of #0 or #3.

#### ***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

#### ***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

#### ***Wm***

When "option<0>" is set to 0, is the 32-bit name of the general-purpose index register.

#### ***Xm***

When "option<0>" is set to 1, is the 64-bit name of the general-purpose index register.

#### ***extend***

Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option when *amount* is omitted, and can be one of the values shown in Usage.

### Usage

Load Register (register) calculates an address from a base register value and an offset register value, loads a word from memory, and writes it to a register. The offset register value can optionally be shifted and extended. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.35 LDRAA, LDRAB, LDRAB (A64)

Load Register, with pointer authentication.

### Syntax

```
LDRAA Xt, [Xn|SP, #{simm}] ; LDRAA
LDRAA Xt, [Xn|SP, #{simm}]! ; LDRAA
LDRAB Xt, [Xn|SP, #{simm}] ; LDRAB
LDRAB Xt, [Xn|SP, #{simm}]! ; LDRAB
```

Where:

#### **Xt**

Is the 64-bit name of the general-purpose register to be transferred.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

#### **simm**

Is the optional signed immediate byte offset, in the range -512 to 511, defaulting to 0.

### Architectures supported

Supported in the Arm®v8.2-A architecture and later.

### Usage

Load Register, with pointer authentication. This instruction authenticates an address from a base register using a modifier of zero and the specified key, adds an immediate offset to the authenticated address, and loads a 64-bit doubleword from memory at this resulting address into a register.

Key A is used for `LDRAA`, and key B is used for `LDRAB`.

If the authentication passes, the PE behaves the same as for an `LDR` instruction. If the authentication fails, a Translation fault is generated.

The authenticated address is not written back to the base register, unless the pre-indexed variant of the instruction is used. In this case, the address that is written back to the base register does not include the pointer authentication code.

For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8](#), for Armv8-A architecture profile.

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.36 LDRB (immediate)

Load Register Byte (immediate).

### Syntax

```
LDRB Wt, [Xn|SP], #simm ; Post-index general registers
LDRB Wt, [Xn|SP, #simm]! ; Pre-index general registers
LDRB Wt, [Xn|SP, #{pimm}] ; Unsigned offset general registers
```

Where:

***simm***

Is the signed immediate byte offset, in the range -256 to 255.

***pimm***

Is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to 0.

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

### Usage

Load Register Byte (immediate) loads a byte from memory, zero-extends it, and writes the result to a register. The address that is used for the load is calculated from a base register and an immediate offset. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly *LDRB (immediate)*.

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.37 LDRB (register)

Load Register Byte (register).

### Syntax

```
LDRB Wt, [Xn|SP, (Wm|Xm), extend {amount}] ; Extended register general registers
```

```
LDRB Wt, [Xn|SP, Xm, LSL {amount}] ; Shifted register general registers
```

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Wm**

When "option<0>" is set to 0, is the 32-bit name of the general-purpose index register.

**Xm**

When "option<0>" is set to 1, is the 64-bit name of the general-purpose index register.

**extend**

Is the index extend specifier, and can be one of the values shown in Usage.

**amount**

Is the index shift amount, it must be.

## Usage

Load Register Byte (register) calculates an address from a base register value and an offset register value, loads a byte from memory, zero-extends it, and writes it to a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.38 LDRH (immediate)

Load Register Halfword (immediate).

### Syntax

```
LDRH Wt, [Xn|SP], #pimm ; Post-index general registers
```

```
LDRH Wt, [Xn|SP, #pimm]! ; Pre-index general registers
```

```
LDRH {Wt}, [{Xn|SP}{, #{pimm}}] ; Unsigned offset general registers
```

Where:

**simm**

Is the signed immediate byte offset, in the range -256 to 255.

**pimm**

Is the optional positive immediate byte offset, a multiple of 2 in the range 0 to 8190, defaulting to 0.

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load Register Halfword (immediate) loads a halfword from memory, zero-extends it, and writes the result to a register. The address that is used for the load is calculated from a base register and an immediate offset. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly *LDRH (immediate)*.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.39 LDRH (register)

Load Register Halfword (register).

**Syntax**

```
LDRH Wt, [Xn|SP, (Wm|Xm){, extend {amount}}]
```

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***Wm***

When "option<0>" is set to 0, is the 32-bit name of the general-purpose index register.

***Xm***

When "option<0>" is set to 1, is the 64-bit name of the general-purpose index register.

***extend***

Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option when *amount* is omitted, and can be one of UXTW, LSL, SXTW or SXTX.

***amount***

Is the index shift amount, optional only when *extend* is not LSL. Where it is permitted to be optional, it defaults to #0. It is, and can be either #0 or #1.

## Usage

Load Register Halfword (register) calculates an address from a base register value and an offset register value, loads a halfword from memory, zero-extends it, and writes it to a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly *LDRH* (register).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.40 LDRSB (immediate)

Load Register Signed Byte (immediate).

### Syntax

`LDRSB Wt, [Xn|SP], #simm ; 32-bit, Post-index`

`LDRSB Xt, [Xn|SP], #simm ; 64-bit, Post-index`

`LDRSB Wt, [Xn|SP, #simm]! ; 32-bit, Pre-index`

`LDRSB Xt, [Xn|SP, #simm]! ; 64-bit, Pre-index`

`LDRSB Wt, [Xn|SP, #{pimm}] ; 32-bit`

`LDRSB Xt, [Xn|SP, #{pimm}] ; 64-bit`

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**simm**

Is the signed immediate byte offset, in the range -256 to 255.

**Xt**

Is the 64-bit name of the general-purpose register to be transferred.

**pimm**

Is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to 0.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Load Register Signed Byte (immediate) loads a byte from memory, sign-extends it to either 32 bits or 64 bits, and writes the result to a register. The address that is used for the load is calculated from a base register and an immediate offset. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly *LDRSB (immediate)*.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.41 LDRSB (register)

Load Register Signed Byte (register).

### Syntax

```
LDRSB Wt, [Xn|SP, (Wm|Xm), extend {amount}] ; 32-bit with extended register offset  
general registers
```

```
LDRSB Wt, [Xn|SP, Xm{, LSL amount}] ; 32-bit with shifted register offset general  
registers
```

```
LDRSB Xt, [Xn|SP, (Wm|Xm), extend {amount}] ; 64-bit with extended register offset  
general registers
```

```
LDRSB Xt, [Xn|SP, Xm{, LSL amount}] ; 64-bit with shifted register offset general  
registers
```

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Wm**

When "option<0>" is set to 0, is the 32-bit name of the general-purpose index register.

**Xm**

When "option<0>" is set to 1, is the 64-bit name of the general-purpose index register.

***extend***

Is the index extend specifier:

Can be one of `UXTW`, `SXTW` or `SXTX`.

***amount***

Is the index shift amount, it must be.

***xt***

Is the 64-bit name of the general-purpose register to be transferred.

**Usage**

Load Register Signed Byte (register) calculates an address from a base register value and an offset register value, loads a byte from memory, sign-extends it, and writes it to a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.42 LDRSH (immediate)

Load Register Signed Halfword (immediate).

**Syntax**

```
LDRSH Wt, [Xn|SP], #simm ; 32-bit, Post-index
LDRSH Xt, [Xn|SP], #simm ; 64-bit, Post-index
LDRSH Wt, [Xn|SP, #simm]! ; 32-bit, Pre-index
LDRSH Xt, [Xn|SP, #simm]! ; 64-bit, Pre-index
LDRSH Wt, [Xn|SP, #{pimm}] ; 32-bit
LDRSH Xt, [Xn|SP, #{pimm}] ; 64-bit
```

Where:

***wt***

Is the 32-bit name of the general-purpose register to be transferred.

***simm***

Is the signed immediate byte offset, in the range -256 to 255.

***xt***

Is the 64-bit name of the general-purpose register to be transferred.

**pimm**

Is the optional positive immediate byte offset, a multiple of 2 in the range 0 to 8190, defaulting to 0.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load Register Signed Halfword (immediate) loads a halfword from memory, sign-extends it to 32 bits or 64 bits, and writes the result to a register. The address that is used for the load is calculated from a base register and an immediate offset. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly *LDRSH (immediate)*.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.43 LDRSH (register)

Load Register Signed Halfword (register).

**Syntax**

```
LDRSH Wt, [Xn|SP, (Wm|Xm){, extend {amount}}] ; 32-bit
```

```
LDRSH Xt, [Xn|SP, (Wm|Xm){, extend {amount}}] ; 64-bit
```

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xt**

Is the 64-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Wm**

When "option<0>" is set to 0, is the 32-bit name of the general-purpose index register.

**Xm**

When "option<0>" is set to 1, is the 64-bit name of the general-purpose index register.

***extend***

Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option when *amount* is omitted, and can be one of the values shown in Usage.

***amount***

Is the index shift amount, optional only when *extend* is not LSL. Where it is permitted to be optional, it defaults to #0. It is, and can be either #0 or #1.

**Usage**

Load Register Signed Halfword (register) calculates an address from a base register value and an offset register value, loads a halfword from memory, sign-extends it, and writes it to a register. For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.44 LDRSW (immediate)

Load Register Signed Word (immediate).

**Syntax**

```
LDRSW Xt, [Xn|SP], #simm ; Post-index general registers
LDRSW Xt, [Xn|SP, #simm]! ; Pre-index general registers
LDRSW Xt, [Xn|SP, #{pimm}] ; Unsigned offset general registers
```

Where:

***simm***

Is the signed immediate byte offset, in the range -256 to 255.

***pimm***

Is the optional positive immediate byte offset, a multiple of 4 in the range 0 to 16380, defaulting to 0.

***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load Register Signed Word (immediate) loads a word from memory, sign-extends it to 64 bits, and writes the result to a register. The address that is used for the load is calculated from a base register and an immediate offset. For information about memory accesses, see *Load/Store*

addressing modes in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



**Note** For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly *LDRSW (immediate)*.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.45 LDRSW (literal)

Load Register Signed Word (literal).

### Syntax

`LDRSW Xt, label`

Where:

#### **Xt**

Is the 64-bit name of the general-purpose register to be loaded.

#### **label**

Is the program label from which the data is to be loaded. Its offset from the address of this instruction, in the range  $\pm 1\text{MB}$ .

### Usage

Load Register Signed Word (literal) calculates an address from the PC value and an immediate offset, loads a word from memory, and writes it to a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.46 LDRSW (register)

Load Register Signed Word (register).

### Syntax

`LDRSW Xt, [Xn|SP, (Wm|Xm){, extend {amount}}]`

Where:

***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***Wm***

When "option<0>" is set to 0, is the 32-bit name of the general-purpose index register.

***Xm***

When "option<0>" is set to 1, is the 64-bit name of the general-purpose index register.

***extend***

Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option when *amount* is omitted, and can be one of `UXTW`, `LSL`, `SXTW` or `SXTX`.

***amount***

Is the index shift amount, optional only when *extend* is not LSL. Where it is permitted to be optional, it defaults to #0. It is, and can be either #0 or #2.

**Usage**

Load Register Signed Word (register) calculates an address from a base register value and an offset register value, loads a word from memory, sign-extends it to form a 64-bit value, and writes it to a register. The offset register value can be shifted left by 0 or 2 bits. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.47 LDSETA, LDSETAL, LDSET, LDSETL, LDSETAL, LDSET, LDSETL (A64)

Atomic bit set on word or doubleword in memory.

**Syntax**

```
LDSETA Ws, Wt, [Xn|SP] ; 32-bit, acquire general registers
LDSETAL Ws, Wt, [Xn|SP] ; 32-bit, acquire and release general registers
LDSET Ws, Wt, [Xn|SP] ; 32-bit, no memory ordering general registers
LDSETL Ws, Wt, [Xn|SP] ; 32-bit, release general registers
LDSETA Xs, Xt, [Xn|SP] ; 64-bit, acquire general registers
LDSETAL Xs, Xt, [Xn|SP] ; 64-bit, acquire and release general registers
```

`LDSET Xs, Xt, [Xn|SP] ; 64-bit, no memory ordering general registers`

`LDSETL Xs, Xt, [Xn|SP] ; 64-bit, release general registers`

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Wt**

Is the 32-bit name of the general-purpose register to be loaded.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Xs**

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xt**

Is the 64-bit name of the general-purpose register to be loaded.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic bit set on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, performs a bitwise OR with the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

- If the destination register is not one of `WZR` or `XZR`, `LDSETA` and `LDSETAL` load from memory with acquire semantics.
- `LDSETL` and `LDSETAL` store to memory with release semantics.
- `LDSET` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.48 LDSETAB, LDSETALB, LDSETB, LDSETLB (A64)

Atomic bit set on byte in memory.

### Syntax

```
LDSETAB Ws, Wt, [Xn|SP] ; Acquire general registers
LDSETALB Ws, Wt, [Xn|SP] ; Acquire and release general registers
LDSETB Ws, Wt, [Xn|SP] ; No memory ordering general registers
LDSETLB Ws, Wt, [Xn|SP] ; Release general registers
```

Where:

#### ***Ws***

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### ***Wt***

Is the 32-bit name of the general-purpose register to be loaded.

#### ***Xn* | *SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Atomic bit set on byte in memory atomically loads an 8-bit byte from memory, performs a bitwise OR with the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDSETAB` and `LDSETALB` load from memory with acquire semantics.
- `LDSETLB` and `LDSETALB` store to memory with release semantics.
- `LDSETB` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.49 LDSETAH, LDSETALH, LDSETH, LDSETLH (A64)

Atomic bit set on halfword in memory.

### Syntax

```
LDSETAH Ws, Wt, [Xn|SP] ; Acquire general registers
LDSETALH Ws, Wt, [Xn|SP] ; Acquire and release general registers
LDSETH Ws, Wt, [Xn|SP] ; No memory ordering general registers
LDSETLH Ws, Wt, [Xn|SP] ; Release general registers
```

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Atomic bit set on halfword in memory atomically loads a 16-bit halfword from memory, performs a bitwise OR with the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

- If the destination register is not WZR, LDSETAH and LDSETALH load from memory with acquire semantics.
- LDSETLH and LDSETALH store to memory with release semantics.
- LDSETH has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.50 LDSMAXA, LDSMAXAL, LDSMAX, LDSMAXL, LDSMAXAL, LDSMAX, LDSMAXL (A64)

Atomic signed maximum on word or doubleword in memory.

### Syntax

```
LDSMAXA Ws, Wt, [Xn|SP] ; 32-bit, acquire general registers
LDSMAXAL Ws, Wt, [Xn|SP] ; 32-bit, acquire and release general registers
LDSMAX Ws, Wt, [Xn|SP] ; 32-bit, no memory ordering general registers
LDSMAXL Ws, Wt, [Xn|SP] ; 32-bit, release general registers
LDSMAXA Xs, Xt, [Xn|SP] ; 64-bit, acquire general registers
LDSMAXAL Xs, Xt, [Xn|SP] ; 64-bit, acquire and release general registers
LDSMAX Xs, Xt, [Xn|SP] ; 64-bit, no memory ordering general registers
LDSMAXL Xs, Xt, [Xn|SP] ; 64-bit, release general registers
```

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

#### **Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

#### **Xs**

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Xt**

Is the 64-bit name of the general-purpose register to be loaded.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Atomic signed maximum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the values as signed numbers. The value initially loaded from memory is returned in the destination register.

- If the destination register is not one of `wZR` or `xZR`, `LDSMAXA` and `LDSMAXAL` load from memory with acquire semantics.
- `LDSMAXL` and `LDSMAXAL` store to memory with release semantics.
- `LDSMAX` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.51 LDSMAXAB, LDSMAXALB, LDSMAXB, LDSMAXLB (A64)

Atomic signed maximum on byte in memory.

### Syntax

`LDSMAXAB Ws, Wt, [Xn|SP] ; Acquire general registers`

`LDSMAXALB Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`LDSMAXB Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`LDSMAXLB Ws, Wt, [Xn|SP] ; Release general registers`

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Atomic signed maximum on byte in memory atomically loads an 8-bit byte from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the

values as signed numbers. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDSMAXAB` and `LDSMAXALB` load from memory with acquire semantics.
- `LDSMAXLB` and `LDSMAXALB` store to memory with release semantics.
- `LDSMAXB` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.52 LDSMAXAH, LDSMAXALH, LDSMAXH, LDSMAXLH (A64)

Atomic signed maximum on halfword in memory.

### Syntax

`LDSMAXAH Ws, Wt, [Xn|SP] ; Acquire general registers`

`LDSMAXALH Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`LDSMAXH Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`LDSMAXLH Ws, Wt, [Xn|SP] ; Release general registers`

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic signed maximum on halfword in memory atomically loads a 16-bit halfword from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the values as signed numbers. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDSMAXAH` and `LDSMAXALH` load from memory with acquire semantics.
- `LDSMAXLH` and `LDSMAXALH` store to memory with release semantics.
- `LDSMAXH` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.53 LDSMINA, LDSMINAL, LDSMIN, LDSMINL, LDSMINAL, LDSMIN, LDSMINL (A64)

Atomic signed minimum on word or doubleword in memory.

### Syntax

```
LDSMINA Ws, Wt, [Xn|SP] ; 32-bit, acquire general registers
LDSMINAL Ws, Wt, [Xn|SP] ; 32-bit, acquire and release general registers
LDSMIN Ws, Wt, [Xn|SP] ; 32-bit, no memory ordering general registers
LDSMINL Ws, Wt, [Xn|SP] ; 32-bit, release general registers
LDSMINA Xs, Xt, [Xn|SP] ; 64-bit, acquire general registers
LDSMINAL Xs, Xt, [Xn|SP] ; 64-bit, acquire and release general registers
LDSMIN Xs, Xt, [Xn|SP] ; 64-bit, no memory ordering general registers
LDSMINL Xs, Xt, [Xn|SP] ; 64-bit, release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Wt**

Is the 32-bit name of the general-purpose register to be loaded.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Xs**

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xt**

Is the 64-bit name of the general-purpose register to be loaded.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

**Usage**

Atomic signed minimum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the values as signed numbers. The value initially loaded from memory is returned in the destination register.

- If the destination register is not one of `wZR` or `xZR`, `LDSMINA` and `LDSMINAL` load from memory with acquire semantics.
- `LDSMINL` and `LDSMINAL` store to memory with release semantics.
- `LDSMIN` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.54 LDSMINAB, LDSMINALB, LDSMINB, LDSMINLB (A64)

Atomic signed minimum on byte in memory.

**Syntax**

`LDSMINAB Ws, Wt, [Xn|SP] ; Acquire general registers`

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

`LDSMINALB Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`LDSMINB Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`LDSMINLB Ws, Wt, [Xn|SP] ; Release general registers`

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

#### **Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic signed minimum on byte in memory atomically loads an 8-bit byte from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the values as signed numbers. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDSMINAB` and `LDSMINALB` load from memory with acquire semantics.
- `LDSMINLB` and `LDSMINALB` store to memory with release semantics.
- `LDSMINB` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.55 LDSMINAH, LDSMINALH, LDSMINH, LDSMINLH (A64)

Atomic signed minimum on halfword in memory.

### Syntax

```
LDSMINAH Ws, Wt, [Xn|SP] ; Acquire general registers
```

```
LDSMINALH Ws, Wt, [Xn|SP] ; Acquire and release general registers
```

```
LDSMINH Ws, Wt, [Xn|SP] ; No memory ordering general registers
```

```
LDSMINLH Ws, Wt, [Xn|SP] ; Release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Wt**

Is the 32-bit name of the general-purpose register to be loaded.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Atomic signed minimum on halfword in memory atomically loads a 16-bit halfword from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the values as signed numbers. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDSMINAH` and `LDSMINALH` load from memory with acquire semantics.
- `LDSMINLH` and `LDSMINALH` store to memory with release semantics.
- `LDSMINH` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.56 LDTR (A64)

Load Register (unprivileged).

### Syntax

`LDTR Wt, [Xn|SP, #{simm}] ; 32-bit`

`LDTR Xt, [Xn|SP, #{simm}] ; 64-bit`

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xt**

Is the 64-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**simm**

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

### Usage

Load Register (unprivileged) loads a word or doubleword from memory, and writes it to a register. The address that is used for the load is calculated from a base register and an immediate offset.

The memory is restricted as if execution is at EL0 when:

- Executing at EL1.
- Executing at EL2, in Arm®v8.1, with HCR\_EL2.{E2H, TGE} set to {1, 1}.

Otherwise, the access permission is for the Exception level at which the instruction is executed. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.57 LDTRB (A64)

Load Register Byte (unprivileged).

### Syntax

`LDTRB Wt, [Xn|SP, #{simm}]`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***simm***

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

## Usage

Load Register Byte (unprivileged) loads a byte from memory, zero-extends it, and writes the result to a register. The address that is used for the load is calculated from a base register and an immediate offset.

The memory is restricted as if execution is at EL0 when:

- Executing at EL1.
- Executing at EL2, in Arm®v8.1, with HCR\_EL2.{E2H, TGE} set to {1, 1}.

Otherwise, the access permission is for the Exception level at which the instruction is executed. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.58 LDTRH (A64)

Load Register Halfword (unprivileged).

### Syntax

`LDTRH Wt, [Xn|SP, #{simm}]`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***simm***

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

## Usage

Load Register Halfword (unprivileged) loads a halfword from memory, zero-extends it, and writes the result to a register. The address that is used for the load is calculated from a base register and an immediate offset.

The memory is restricted as if execution is at EL0 when:

- Executing at EL1.
- Executing at EL2, in Arm®v8.1, with HCR\_EL2.{E2H, TGE} set to {1, 1}.

Otherwise, the access permission is for the Exception level at which the instruction is executed. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.59 LDTRSB (A64)

Load Register Signed Byte (unprivileged).

### Syntax

`LDTRSB Wt, [Xn|SP, #simm] ; 32-bit`

`LDTRSB Xt, [Xn|SP, #simm] ; 64-bit`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***simm***

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

## Usage

Load Register Signed Byte (unprivileged) loads a byte from memory, sign-extends it to 32 bits or 64 bits, and writes the result to a register. The address that is used for the load is calculated from a base register and an immediate offset.

The memory is restricted as if execution is at EL0 when:

- Executing at EL1.

- Executing at EL2, in Arm®v8.1, with HCR\_EL2.{E2H, TGE} set to {1, 1}.

Otherwise, the access permission is for the Exception level at which the instruction is executed. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.60 LDTRSH (A64)

Load Register Signed Halfword (unprivileged).

### Syntax

LDTRSH **Wt**, [Xn|SP, #*simm*] ; 32-bit

LDTRSH **Xt**, [Xn|SP, #*simm*] ; 64-bit

Where:

#### **Wt**

Is the 32-bit name of the general-purpose register to be transferred.

#### **Xt**

Is the 64-bit name of the general-purpose register to be transferred.

#### **Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

#### **simm**

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

### Usage

Load Register Signed Halfword (unprivileged) loads a halfword from memory, sign-extends it to 32 bits or 64 bits, and writes the result to a register. The address that is used for the load is calculated from a base register and an immediate offset.

The memory is restricted as if execution is at EL0 when:

- Executing at EL1.
- Executing at EL2, in Arm®v8.1, with HCR\_EL2.{E2H, TGE} set to {1, 1}.

Otherwise, the access permission is for the Exception level at which the instruction is executed. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.61 LDTRSW (A64)

Load Register Signed Word (unprivileged).

### Syntax

```
LDTRSW Xt, [Xn|SP, #{simm}]
```

Where:

#### Xt

Is the 64-bit name of the general-purpose register to be transferred.

#### Xn|SP

Is the 64-bit name of the general-purpose base register or stack pointer.

#### simm

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

### Usage

Load Register Signed Word (unprivileged) loads a word from memory, sign-extends it to 64 bits, and writes the result to a register. The address that is used for the load is calculated from a base register and an immediate offset.

The memory is restricted as if execution is at EL0 when:

- Executing at EL1.
- Executing at EL2, in Arm®v8.1, with HCR\_EL2.{E2H, TGE} set to {1, 1}.

Otherwise, the access permission is for the Exception level at which the instruction is executed. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.62 LDUMAXA, LDUMAXAL, LDUMAX, LDUMAXL, LDUMAXAL, LDUMAX, LDUMAXL (A64)

Atomic unsigned maximum on word or doubleword in memory.

### Syntax

```
LDUMAXA Ws, Wt, [Xn|SP] ; 32-bit, acquire general registers
```

```
LDUMAXAL Ws, Wt, [Xn|SP] ; 32-bit, acquire and release general registers
```

```
LDUMAX Ws, Wt, [Xn|SP] ; 32-bit, no memory ordering general registers
LDUMAXL Ws, Wt, [Xn|SP] ; 32-bit, release general registers
LDUMAXA Xs, Xt, [Xn|SP] ; 64-bit, acquire general registers
LDUMAXAL Xs, Xt, [Xn|SP] ; 64-bit, acquire and release general registers
LDUMAX Xs, Xt, [Xn|SP] ; 64-bit, no memory ordering general registers
LDUMAXL Xs, Xt, [Xn|SP] ; 64-bit, release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Wt**

Is the 32-bit name of the general-purpose register to be loaded.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Xs**

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xt**

Is the 64-bit name of the general-purpose register to be loaded.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic unsigned maximum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the values as unsigned numbers. The value initially loaded from memory is returned in the destination register.

- If the destination register is not one of `wZR` or `xZR`, `LDUMAXA` and `LDUMAXAL` load from memory with acquire semantics.
- `LDUMAXL` and `LDUMAXAL` store to memory with release semantics.
- `LDUMAX` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.63 LDUMAXAB, LDUMAXALB, LDUMAXB, LDUMAXLB (A64)

Atomic unsigned maximum on byte in memory.

## Syntax

`LDUMAXAB Ws, Wt, [Xn|SP] ; Acquire general registers`

`LDUMAXALB Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`LDUMAXB Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`LDUMAXLB Ws, Wt, [Xn|SP] ; Release general registers`

Where:

### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic unsigned maximum on byte in memory atomically loads an 8-bit byte from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the values as unsigned numbers. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDUMAXAB` and `LDUMAXALB` load from memory with acquire semantics.
- `LDUMAXLB` and `LDUMAXALB` store to memory with release semantics.
- `LDUMAXB` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.64 LDUMAXAH, LDUMAXALH, LDUMAXH, LDUMAXLH (A64)

Atomic unsigned maximum on halfword in memory.

## Syntax

`LDUMAXAH Ws, Wt, [Xn|SP] ; Acquire general registers`

`LDUMAXALH Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`LDUMAXH Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`LDUMAXLH Ws, Wt, [Xn|SP] ; Release general registers`

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Wt**

Is the 32-bit name of the general-purpose register to be loaded.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic unsigned maximum on halfword in memory atomically loads a 16-bit halfword from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the values as unsigned numbers. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDUMAXAH` and `LDUMAXALH` load from memory with acquire semantics.
- `LDUMAXLH` and `LDUMAXALH` store to memory with release semantics.
- `LDUMAXH` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.65 LDUMINA, LDUMINAL, LDUMIN, LDUMINL, LDUMINAL, LDUMIN, LDUMINL (A64)

Atomic unsigned minimum on word or doubleword in memory.

### Syntax

```
LDUMINA Ws, Wt, [Xn|SP] ; 32-bit, acquire general registers
LDUMINAL Ws, Wt, [Xn|SP] ; 32-bit, acquire and release general registers
LDUMIN Ws, Wt, [Xn|SP] ; 32-bit, no memory ordering general registers
LDUMINL Ws, Wt, [Xn|SP] ; 32-bit, release general registers
LDUMINA Xs, Xt, [Xn|SP] ; 64-bit, acquire general registers
LDUMINAL Xs, Xt, [Xn|SP] ; 64-bit, acquire and release general registers
LDUMIN Xs, Xt, [Xn|SP] ; 64-bit, no memory ordering general registers
LDUMINL Xs, Xt, [Xn|SP] ; 64-bit, release general registers
```

Where:

#### Ws

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### Wt

Is the 32-bit name of the general-purpose register to be loaded.

#### Xn | SP

Is the 64-bit name of the general-purpose base register or stack pointer.

#### Xs

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### Xt

Is the 64-bit name of the general-purpose register to be loaded.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic unsigned minimum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the values as unsigned numbers. The value initially loaded from memory is returned in the destination register.

- If the destination register is not one of `wZR` or `xZR`, `LDUMINA` and `LDUMINAL` load from memory with acquire semantics.
- `LDUMINL` and `LDUMINAL` store to memory with release semantics.
- `LDUMIN` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.66 LDUMINAB, LDUMINALB, LDUMINB, LDUMINLB (A64)

Atomic unsigned minimum on byte in memory.

### Syntax

`LDUMINAB Ws, Wt, [Xn|SP] ; Acquire general registers`

`LDUMINALB Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`LDUMINB Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`LDUMINLB Ws, Wt, [Xn|SP] ; Release general registers`

Where:

#### `Ws`

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### `Wt`

Is the 32-bit name of the general-purpose register to be loaded.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

**Usage**

Atomic unsigned minimum on byte in memory atomically loads an 8-bit byte from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the values as unsigned numbers. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `LDUMINAB` and `LDUMINALB` load from memory with acquire semantics.
- `LDUMINLB` and `LDUMINALB` store to memory with release semantics.
- `LDUMINB` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.67 LDUMINAH, LDUMINALH, LDUMINH, LDUMINLH (A64)

Atomic unsigned minimum on halfword in memory.

**Syntax**

`LDUMINAH Ws, Wt, [Xn|SP] ; Acquire general registers`

`LDUMINALH Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`LDUMINH Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`LDUMINLH Ws, Wt, [Xn|SP] ; Release general registers`

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

***Wt***

Is the 32-bit name of the general-purpose register to be loaded.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

**Usage**

Atomic unsigned minimum on halfword in memory atomically loads a 16-bit halfword from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the values as unsigned numbers. The value initially loaded from memory is returned in the destination register.

- If the destination register is not WZR, LDUMINAH and LDUMINALH load from memory with acquire semantics.
- LDUMINLH and LDUMINALH store to memory with release semantics.
- LDUMINH has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.68 LDUR (A64)

Load Register (unscaled).

**Syntax**

LDUR *Wt*, [*Xn|SP*, #*{simm}*] ; 32-bit

LDUR *Xt*, [*Xn|SP*, #*{simm}*] ; 64-bit

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**simm**

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

**Usage**

Load Register (unscaled) calculates an address from a base register and an immediate offset, loads a 32-bit word or 64-bit doubleword from memory, zero-extends it, and writes it to a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.69 LDURB (A64)

Load Register Byte (unscaled).

**Syntax**

```
LDURB Wt, [Xn|SP, #{simm}]
```

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**simm**

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

**Usage**

Load Register Byte (unscaled) calculates an address from a base register and an immediate offset, loads a byte from memory, zero-extends it, and writes it to a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.70 LDURH (A64)

Load Register Halfword (unscaled).

### Syntax

```
LDURH Wt, [Xn | SP, #simm]
```

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn* | *SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***simm***

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

### Usage

Load Register Halfword (unscaled) calculates an address from a base register and an immediate offset, loads a halfword from memory, zero-extends it, and writes it to a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.71 LDURSB (A64)

Load Register Signed Byte (unscaled).

### Syntax

```
LDURSB Wt, [Xn | SP, #simm] ; 32-bit
```

```
LDURSB Xt, [Xn | SP, #simm] ; 64-bit
```

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

***Xn* | *SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***simm***

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

## Usage

Load Register Signed Byte (unscaled) calculates an address from a base register and an immediate offset, loads a signed byte from memory, sign-extends it, and writes it to a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.72 LDURSH (A64)

Load Register Signed Halfword (unscaled).

### Syntax

`LDURSH Wt, [Xn|SP, #simm] ; 32-bit`

`LDURSH Xt, [Xn|SP, #simm] ; 64-bit`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***simm***

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

## Usage

Load Register Signed Halfword (unscaled) calculates an address from a base register and an immediate offset, loads a signed halfword from memory, sign-extends it, and writes it to a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.73 LDURSW (A64)

Load Register Signed Word (unscaled).

### Syntax

```
LDURSW Xt, [Xn|SP, #{simm}]
```

Where:

#### **Xt**

Is the 64-bit name of the general-purpose register to be transferred.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

#### **simm**

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

### Usage

Load Register Signed Word (unscaled) calculates an address from a base register and an immediate offset, loads a signed word from memory, sign-extends it, and writes it to a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.74 LDXP (A64)

Load Exclusive Pair of Registers.

### Syntax

```
LDXP Wt1, Wt2, [Xn|SP,#0] ; 32-bit
```

```
LDXP Xt1, Xt2, [Xn|SP,#0] ; 64-bit
```

Where:

#### **Wt1**

Is the 32-bit name of the first general-purpose register to be transferred.

#### **Wt2**

Is the 32-bit name of the second general-purpose register to be transferred.

#### **Xt1**

Is the 64-bit name of the first general-purpose register to be transferred.

#### **Xt2**

Is the 64-bit name of the second general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load Exclusive Pair of Registers derives an address from a base register value, loads two 32-bit words or two 64-bit doublewords from memory, and writes them to two registers. A 32-bit pair requires the address to be doubleword aligned and is single-copy atomic at doubleword granularity. A 64-bit pair requires the address to be quadword aligned and is single-copy atomic for each doubleword at doubleword granularity. The PE marks the physical address being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



**Note** For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly `LDXP`.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.75 LDXR (A64)

Load Exclusive Register.

**Syntax**

`LDXR Wt, [Xn|SP,#0] ; 32-bit`

`LDXR Xt, [Xn|SP,#0] ; 64-bit`

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xt**

Is the 64-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load Exclusive Register derives an address from a base register value, loads a 32-bit word or a 64-bit doubleword from memory, and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed as an exclusive access. This exclusive access

mark is checked by Store Exclusive instructions. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.76 LDXRB (A64)

Load Exclusive Register Byte.

### Syntax

`LDXRB Wt, [Xn|SP,#0]`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

### Usage

Load Exclusive Register Byte derives an address from a base register value, loads a byte from memory, zero-extends it and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.77 LDXRH (A64)

Load Exclusive Register Halfword.

### Syntax

`LDXRH Wt, [Xn|SP,#0]`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load Exclusive Register Halfword derives an address from a base register value, loads a halfword from memory, zero-extends it and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.78 PRFM (immediate) (A64)

Prefetch Memory (immediate).

**Syntax**

PRFM (*prfop*|#*imm5*), [Xn|SP, #*pimm*]

Where:

**prfop**

Is the prefetch operation, defined as *type*<target><policy>.

*type* is one of:

**PLD**

Prefetch for load.

**PLI**

Preload instructions.

**PST**

Prefetch for store.

<target> is one of:

**L1**

Level 1 cache.

**L2**

Level 2 cache.

**L3**

Level 3 cache.

`<policy>` is one of:

**KEEP**

Retained or temporal prefetch, allocated in the cache normally.

**STRM**

Streaming or non-temporal prefetch, for data that is used only once.

***imm5***

Is the prefetch operation encoding as an immediate, in the range 0 to 31.

This syntax is only for encodings that are not accessible using `prfop`.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***pimm***

Is the optional positive immediate byte offset, a multiple of 8 in the range 0 to 32760, defaulting to 0.

## Usage

Prefetch Memory (immediate) signals the memory system that data memory accesses from a specified address are likely to occur in the near future. The memory system can respond by taking actions that are expected to speed up the memory accesses when they do occur, such as preloading the cache line containing the specified address into one or more caches.

The effect of an `PRFM` instruction is **IMPLEMENTATION DEFINED**. For more information, see *Prefetch memory* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.79 PRFM (literal) (A64)

Prefetch Memory (literal).

### Syntax

`PRFM (prfop|#imm5), label`

Where:

***prfop***

Is the prefetch operation, defined as `type<target><policy>`.

`type` is one of:

**PLD**

Prefetch for load.

**PLI**

Preload instructions.

**PST**

Prefetch for store.

<target> is one of:

**L1**

Level 1 cache.

**L2**

Level 2 cache.

**L3**

Level 3 cache.

<policy> is one of:

**KEEP**

Retained or temporal prefetch, allocated in the cache normally.

**STRM**

Streaming or non-temporal prefetch, for data that is used only once.

**imm5**

Is the prefetch operation encoding as an immediate, in the range 0 to 31.

This syntax is only for encodings that are not accessible using *prefop*.

**label**

Is the program label from which the data is to be loaded. Its offset from the address of this instruction, in the range ±1MB.

## Usage

Prefetch Memory (literal) signals the memory system that data memory accesses from a specified address are likely to occur in the near future. The memory system can respond by taking actions that are expected to speed up the memory accesses when they do occur, such as preloading the cache line containing the specified address into one or more caches.

The effect of an `PRFM` instruction is **IMPLEMENTATION DEFINED**. For more information, see *Prefetch memory* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.80 PRFM (register) (A64)

Prefetch Memory (register).

### Syntax

```
PRFM {prfop|#imm5}, [Xn|SP, (Wm|Xm){, extend {amount}}]
```

Where:

#### **prfop**

Is the prefetch operation, defined as *type*<target><policy>.

*type* is one of:

#### **PLD**

Prefetch for load.

#### **PLI**

Preload instructions.

#### **PST**

Prefetch for store.

<target> is one of:

#### **L1**

Level 1 cache.

#### **L2**

Level 2 cache.

#### **L3**

Level 3 cache.

<policy> is one of:

#### **KEEP**

Retained or temporal prefetch, allocated in the cache normally.

#### **STRM**

Streaming or non-temporal prefetch, for data that is used only once.

#### **imm5**

Is the prefetch operation encoding as an immediate, in the range 0 to 31.

This syntax is only for encodings that are not accessible using *prfop*.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

***Wm***

When "option<0>" is set to 0, is the 32-bit name of the general-purpose index register.

***Xm***

When "option<0>" is set to 1, is the 64-bit name of the general-purpose index register.

***extend***

Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option when *amount* is omitted, and can be one of **UXTW**, **LSL**, **SXTW** or **SXTX**.

***amount***

Is the index shift amount, optional only when *extend* is not LSL. Where it is permitted to be optional, it defaults to #0. It is, and can be either #0 or #3.

## Usage

Prefetch Memory (register) signals the memory system that data memory accesses from a specified address are likely to occur in the near future. The memory system can respond by taking actions that are expected to speed up the memory accesses when they do occur, such as preloading the cache line containing the specified address into one or more caches.

The effect of an **PRFM** instruction is **IMPLEMENTATION DEFINED**. For more information, see *Prefetch memory* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.81 PRFUM (unscaled offset) (A64)

Prefetch Memory (unscaled offset).

### Syntax

```
PRFUM (prfop|#imm5), [Xn|SP, #{simm}]
```

Where:

***prfop***

Is the prefetch operation, defined as *type*<*target*><*policy*>.

*type* is one of:

***PLD***

Prefetch for load.

***PLI***

Preload instructions.

**PST**

Prefetch for store.

<target> is one of:

**L1**

Level 1 cache.

**L2**

Level 2 cache.

**L3**

Level 3 cache.

<policy> is one of:

**KEEP**

Retained or temporal prefetch, allocated in the cache normally.

**STRM**

Streaming or non-temporal prefetch, for data that is used only once.

**imm5**

Is the prefetch operation encoding as an immediate, in the range 0 to 31.

This syntax is only for encodings that are not accessible using *prfop*.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**simm**

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

## Usage

Prefetch Memory (unscaled offset) signals the memory system that data memory accesses from a specified address are likely to occur in the near future. The memory system can respond by taking actions that are expected to speed up the memory accesses when they do occur, such as preloading the cache line containing the specified address into one or more caches.

The effect of an `PRFUM` instruction is **IMPLEMENTATION DEFINED**. For more information, see *Prefetch memory* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.82 STADD, STADDL, STADDL (A64)

Atomic add on word or doubleword in memory, without return.

### Syntax

```
STADD Ws, [Xn|SP] ; 32-bit, no memory ordering general registers
```

```
STADDL Ws, [Xn|SP] ; 32-bit, release general registers
```

```
STADD Xs, [Xn|SP] ; 64-bit, no memory ordering general registers
```

```
STADDL Xs, [Xn|SP] ; 64-bit, release general registers
```

Where:

#### ***Ws***

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### ***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

#### ***Xs***

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Atomic add on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword from memory, adds the value held in a register to it, and stores the result back to memory.

- STADD has no memory ordering semantics.
- STADDL stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.83 STADDB, STADDLB (A64)

Atomic add on byte in memory, without return.

### Syntax

```
STADDB Ws, [Xn|SP] ; No memory ordering general registers
```

```
STADDLB Ws, [Xn|SP] ; Release general registers
```

Where:

#### ***Ws***

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### ***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Atomic add on byte in memory, without return, atomically loads an 8-bit byte from memory, adds the value held in a register to it, and stores the result back to memory.

- STADDB has no memory ordering semantics.
- STADDLB stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.84 STADDH, STADDLH (A64)

Atomic add on halfword in memory, without return.

### Syntax

```
STADDH Ws, [Xn|SP] ; No memory ordering general registers
```

```
STADDLH Ws, [Xn|SP] ; Release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

**Usage**

Atomic add on halfword in memory, without return, atomically loads a 16-bit halfword from memory, adds the value held in a register to it, and stores the result back to memory.

- STADDH has no memory ordering semantics.
- STADDLH stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.85 STCLR, STCLRL, STCLRL (A64)

Atomic bit clear on word or doubleword in memory, without return.

**Syntax**

```
STCLR Ws, [Xn|SP] ; 32-bit, no memory ordering general registers
```

```
STCLRL Ws, [Xn|SP] ; 32-bit, release general registers
```

```
STCLR Xs, [Xn|SP] ; 64-bit, no memory ordering general registers
```

```
STCLRL Xs, [Xn|SP] ; 64-bit, release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Xs**

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

**Usage**

Atomic bit clear on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword from memory, performs a bitwise AND with the complement of the value held in a register on it, and stores the result back to memory.

- STCLR has no memory ordering semantics.
- STCLRL stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.86 STCLRB, STCLRLB (A64)

Atomic bit clear on byte in memory, without return.

**Syntax**

```
STCLRB Ws, [Xn|SP] ; No memory ordering general registers
```

```
STCLRLB Ws, [Xn|SP] ; Release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic bit clear on byte in memory, without return, atomically loads an 8-bit byte from memory, performs a bitwise AND with the complement of the value held in a register on it, and stores the result back to memory.

- STCLRB has no memory ordering semantics.
- STCLRLB stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.87 STCLRH, STCLRLH (A64)

Atomic bit clear on halfword in memory, without return.

### Syntax

STCLRH *Ws*, [Xn|SP] ; No memory ordering general registers

STCLRLH *Ws*, [Xn|SP] ; Release general registers

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic bit clear on halfword in memory, without return, atomically loads a 16-bit halfword from memory, performs a bitwise AND with the complement of the value held in a register on it, and stores the result back to memory.

- STCLRH has no memory ordering semantics.
- STCLRLH stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.88 STEOR, STEORL, STEORL (A64)

Atomic exclusive OR on word or doubleword in memory, without return.

### Syntax

STEOR *Ws*, [Xn|SP] ; 32-bit, no memory ordering general registers

STEORL *Ws*, [Xn|SP] ; 32-bit, release general registers

STEOR Xs, [Xn|SP] ; 64-bit, no memory ordering general registers

STEORL Xs, [Xn|SP] ; 64-bit, release general registers

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

#### **Xs**

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Atomic exclusive OR on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword from memory, performs an exclusive OR with the value held in a register on it, and stores the result back to memory.

- STEOR has no memory ordering semantics.
- STEORL stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.89 STEORB, STEORLB (A64)

Atomic exclusive OR on byte in memory, without return.

### Syntax

```
STEORB Ws, [Xn|SP] ; No memory ordering general registers
```

```
STEORLB Ws, [Xn|SP] ; Release general registers
```

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Atomic exclusive OR on byte in memory, without return, atomically loads an 8-bit byte from memory, performs an exclusive OR with the value held in a register on it, and stores the result back to memory.

- STEORB has no memory ordering semantics.
- STEORLB stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.90 STEORH, STEORLH (A64)

Atomic exclusive OR on halfword in memory, without return.

### Syntax

```
STEORH Ws, [Xn|SP] ; No memory ordering general registers
```

```
STEORLH Ws, [Xn|SP] ; Release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

**Usage**

Atomic exclusive OR on halfword in memory, without return, atomically loads a 16-bit halfword from memory, performs an exclusive OR with the value held in a register on it, and stores the result back to memory.

- STEORH has no memory ordering semantics.
- STEORLH stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.91 STLLR (A64)

Store LORelease Register.

**Syntax**

```
STLLR Wt, [Xn|SP,#0] ; 32-bit
```

```
STLLR Xt, [Xn|SP,#0] ; 64-bit
```

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xt**

Is the 64-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

## Usage

Store LORelease Register stores a 32-bit word or a 64-bit doubleword to a memory location, from a register. The instruction also has memory ordering semantics as described in *Load LOAcquire, Store LORelease* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.92 STLLRB (A64)

Store LORelease Register Byte.

### Syntax

`STLLRB wt, [Xn | SP, #0]`

Where:

***wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn* | *SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Store LORelease Register Byte stores a byte from a 32-bit register to a memory location. The instruction also has memory ordering semantics as described in *Load LOAcquire, Store LORelease* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.93 STLLRH (A64)

Store LORelease Register Halfword.

### Syntax

`STLLRH wt, [Xn | SP, #0]`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Store LORelease Register Halfword stores a halfword from a 32-bit register to a memory location. The instruction also has memory ordering semantics as described in *Load LOAcquire, Store LORelease* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.94 STLR (A64)

Store-Release Register.

### Syntax

`STLR Wt, [Xn|SP,#0] ; 32-bit`

`STLR Xt, [Xn|SP,#0] ; 64-bit`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Store-Release Register stores a 32-bit word or a 64-bit doubleword to a memory location, from a register. The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.95 STLRB (A64)

Store-Release Register Byte.

**Syntax**

`STLRB Wt, [Xn|SP,#0]`

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Store-Release Register Byte stores a byte from a 32-bit register to a memory location. The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.96 STLRH (A64)

Store-Release Register Halfword.

**Syntax**

`STLRH Wt, [Xn|SP,#0]`

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Store-Release Register Halfword stores a halfword from a 32-bit register to a memory location. The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in

the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.97 STLXP (A64)

Store-Release Exclusive Pair of registers.

### Syntax

`STLXP ws, wt1, wt2, [Xn|SP,#0] ; 32-bit`

`STLXP ws, xt1, xt2, [Xn|SP,#0] ; 64-bit`

Where:

#### **wt1**

Is the 32-bit name of the first general-purpose register to be transferred.

#### **wt2**

Is the 32-bit name of the second general-purpose register to be transferred.

#### **xt1**

Is the 64-bit name of the first general-purpose register to be transferred.

#### **xt2**

Is the 64-bit name of the second general-purpose register to be transferred.

#### **ws**

Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is written. The value returned is:

#### **0**

If the operation updates memory.

#### **1**

If the operation fails to update memory.

#### **Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Aborts and alignment

If a synchronous Data Abort exception is generated by the execution of this instruction:

- Memory is not updated.
- *ws* is not updated.

Accessing an address that is not aligned to the size of the data being accessed causes an Alignment fault Data Abort exception to be generated, subject to the following rules:

- The exception is generated if the Exclusive Monitors for the current PE include all of the addresses associated with the virtual address region of size bytes starting at address. The immediately following memory write must be to the same addresses.
- Otherwise, it is **IMPLEMENTATION DEFINED** whether the exception is generated.

Whether the detection of memory aborts happens before or after the check on the local Exclusive Monitor depends on the implementation. As a result a failure of the local monitor can occur on some implementations even if the memory access would give a memory abort.

## Usage

Store-Release Exclusive Pair of registers stores two 32-bit words or two 64-bit doublewords to a memory location if the PE has exclusive access to the memory address, from two registers, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). A 32-bit pair requires the address to be doubleword aligned and is single-copy atomic at doubleword granularity. A 64-bit pair requires the address to be quadword aligned and, if the Store-Exclusive succeeds, it causes a single-copy atomic update of the 128-bit memory location being updated. The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly `STLXP`.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.98 STLXR (A64)

Store-Release Exclusive Register.

### Syntax

`STLXR Ws, Wt, [Xn|SP,#0] ; 32-bit`

`STLXR Ws, Xt, [Xn|SP,#0] ; 64-bit`

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

***xt***

Is the 64-bit name of the general-purpose register to be transferred.

***ws***

Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is written. The value returned is.

**0**

If the operation updates memory.

**1**

If the operation fails to update memory.

***xn | sp***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Aborts and alignment**

If a synchronous Data Abort exception is generated by the execution of this instruction:

- Memory is not updated.
- *ws* is not updated.

Accessing an address that is not aligned to the size of the data being accessed causes an Alignment fault Data Abort exception to be generated, subject to the following rules:

- The exception is generated if the Exclusive Monitors for the current PE include all of the addresses associated with the virtual address region of size bytes starting at address. The immediately following memory write must be to the same addresses.
- Otherwise, it is **IMPLEMENTATION DEFINED** whether the exception is generated.

Whether the detection of memory aborts happens before or after the check on the local Exclusive Monitor depends on the implementation. As a result a failure of the local monitor can occur on some implementations even if the memory access would give a memory abort.

**Usage**

Store-Release Exclusive Register stores a 32-bit word or a 64-bit doubleword to memory if the PE has exclusive access to the memory address, from two registers, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). The memory access is atomic. The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



For information about the **CONSTRAINED UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly `STLXR`.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.99 STLXRB (A64)

Store-Release Exclusive Register Byte.

### Syntax

`STLXRB Ws, Wt, [Xn | SP, #0]`

Where:

#### *Ws*

Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is written. The value returned is.

#### *0*

If the operation updates memory.

#### *1*

If the operation fails to update memory.

#### *Wt*

Is the 32-bit name of the general-purpose register to be transferred.

#### *Xn* | *SP*

Is the 64-bit name of the general-purpose base register or stack pointer.

### Aborts

If a synchronous Data Abort exception is generated by the execution of this instruction:

- Memory is not updated.
- *ws* is not updated.

Whether the detection of memory aborts happens before or after the check on the local Exclusive Monitor depends on the implementation. As a result a failure of the local monitor can occur on some implementations even if the memory access would give a memory abort.

### Usage

Store-Release Exclusive Register Byte stores a byte from a 32-bit register to memory if the PE has exclusive access to the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). The memory access is atomic. The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



**Note** For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly `STLXR.B`.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.100 STLXRH (A64)

Store-Release Exclusive Register Halfword.

### Syntax

`STLXRH ws, wt, [Xn|SP,#0]`

Where:

#### **ws**

Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is written. The value returned is.

**0**

If the operation updates memory.

**1**

If the operation fails to update memory.

#### **wt**

Is the 32-bit name of the general-purpose register to be transferred.

#### **Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Aborts and alignment

If a synchronous Data Abort exception is generated by the execution of this instruction:

- Memory is not updated.
- `ws` is not updated.

A non halfword-aligned memory address causes an Alignment fault Data Abort exception to be generated, subject to the following rules:

- The exception is generated if the Exclusive Monitors for the current PE include all of the addresses associated with the virtual address region of size bytes starting at address. The immediately following memory write must be to the same addresses.
- Otherwise, it is **IMPLEMENTATION DEFINED** whether the exception is generated.

Whether the detection of memory aborts happens before or after the check on the local Exclusive Monitor depends on the implementation. As a result a failure of the local monitor can occur on some implementations even if the memory access would give a memory abort.

## Usage

Store-Release Exclusive Register Halfword stores a halfword from a 32-bit register to memory if the PE has exclusive access to the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). The memory access is atomic. The instruction also has memory ordering semantics as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



Note

For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly `STLXRH`.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.101 STNP (A64)

Store Pair of Registers, with non-temporal hint.

### Syntax

```
STNP Wt1, Wt2, [Xn|SP, #{}imm] ; 32-bit, Signed offset
```

```
STNP Xt1, Xt2, [Xn|SP, #{}imm] ; 64-bit, Signed offset
```

Where:

***Wt1***

Is the 32-bit name of the first general-purpose register to be transferred.

***Wt2***

Is the 32-bit name of the second general-purpose register to be transferred.

***imm***

Depends on the instruction variant:

**32-bit general registers** Is the optional signed immediate byte offset, a multiple of 4 in the range -256 to 252, defaulting to 0.

**64-bit general registers** Is the optional signed immediate byte offset, a multiple of 8 in the range -512 to 504, defaulting to 0.

***Xt1***

Is the 64-bit name of the first general-purpose register to be transferred.

***Xt2***

Is the 64-bit name of the second general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Store Pair of Registers, with non-temporal hint, calculates an address from a base register value and an immediate offset, and stores two 32-bit words or two 64-bit doublewords to the calculated address, from two registers. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about Non-temporal pair instructions, see *Load/Store Non-temporal pair* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.102 STP (A64)

Store Pair of Registers.

**Syntax**

```
STP Wt1, Wt2, [Xn|SP], #imm ; 32-bit, Post-index
STP Xt1, Xt2, [Xn|SP], #imm ; 64-bit, Post-index
STP Wt1, Wt2, [Xn|SP, #imm]! ; 32-bit, Pre-index
STP Xt1, Xt2, [Xn|SP, #imm]! ; 64-bit, Pre-index
STP Wt1, Wt2, [Xn|SP, #{imm}] ; 32-bit, Signed offset
STP Xt1, Xt2, [Xn|SP, #{imm}] ; 64-bit, Signed offset
```

Where:

***Wt1***

Is the 32-bit name of the first general-purpose register to be transferred.

***Wt2***

Is the 32-bit name of the second general-purpose register to be transferred.

***imm***

Depends on the instruction variant:

**32-bit general registers** Is the signed immediate byte offset, a multiple of 4 in the range -256 to 252.

**64-bit general registers** Is the signed immediate byte offset, a multiple of 8 in the range -512 to 504.

#### ***Xt1***

Is the 64-bit name of the first general-purpose register to be transferred.

#### ***Xt2***

Is the 64-bit name of the second general-purpose register to be transferred.

#### ***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

### Usage

Store Pair of Registers calculates an address from a base register value and an immediate offset, and stores two 32-bit words or two 64-bit doublewords to the calculated address, from two registers. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly `STR`.

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.103 STR (immediate) (A64)

Store Register (immediate).

### Syntax

```
STR Wt, [Xn|SP], #simm ; 32-bit, Post-index
STR Xt, [Xn|SP], #simm ; 64-bit, Post-index
STR Wt, [Xn|SP, #simm]! ; 32-bit, Pre-index
STR Xt, [Xn|SP, #simm]! ; 64-bit, Pre-index
STR Wt, [Xn|SP, #{pimm}] ; 32-bit
STR Xt, [Xn|SP, #{pimm}] ; 64-bit
```

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***simm***

Is the signed immediate byte offset, in the range -256 to 255.

***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

***pimm***

Depends on the instruction variant:

**32-bit general registers** Is the optional positive immediate byte offset, a multiple of 4 in the range 0 to 16380, defaulting to 0.

**64-bit general registers** Is the optional positive immediate byte offset, a multiple of 8 in the range 0 to 32760, defaulting to 0.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Store Register (immediate) stores a word or a doubleword from a register to memory. The address that is used for the store is calculated from a base register and an immediate offset. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.104 STR (register) (A64)

Store Register (register).

### Syntax

```
STR Wt, [Xn|SP, (Wm|Xm){, extend {amount}}] ; 32-bit
```

```
STR Xt, [Xn|SP, (Wm|Xm){, extend {amount}}] ; 64-bit
```

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***amount***

Is the index shift amount, optional only when *extend* is not LSL. Where it is permitted to be optional, it defaults to #0. It is:

**32-bit general registers**

Can be one of #0 or #2.

**64-bit general registers**

Can be one of #0 or #3.

***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***Wm***

When "option<0>" is set to 0, is the 32-bit name of the general-purpose index register.

***Xm***

When "option<0>" is set to 1, is the 64-bit name of the general-purpose index register.

***extend***

Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option when *amount* is omitted, and can be one of the values shown in Usage.

**Usage**

Store Register (register) calculates an address from a base register value and an offset register value, and stores a 32-bit word or a 64-bit doubleword to the calculated address, from a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The instruction uses an offset addressing mode, that calculates the address used for the memory access from a base register value and an offset register value. The offset can be optionally shifted and extended.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.105 STRB (immediate) (A64)

Store Register Byte (immediate).

**Syntax**

```
STRB Wt, [Xn|SP], #simm ; Post-index general registers
```

```
STRB Wt, [Xn|SP, #simm]! ; Pre-index general registers
```

```
STRB Wt, [Xn|SP, #{pimm}] ; Unsigned offset general registers
```

Where:

**simm**

Is the signed immediate byte offset, in the range -256 to 255.

**pimm**

Is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to 0.

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Store Register Byte (immediate) stores the least significant byte of a 32-bit register to memory. The address that is used for the store is calculated from a base register and an immediate offset. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



Note

For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly *STRB (immediate)*.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.106 STRB (register) (A64)

Store Register Byte (register).

**Syntax**

```
STRB Wt, [Xn|SP, (Wm|Xm), extend {amount}] ; Extended register general registers
```

```
STRB Wt, [Xn|SP, Xm, LSL {amount}] ; Shifted register general registers
```

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Wm**

When "option<0>" is set to 0, is the 32-bit name of the general-purpose index register.

**Xm**

When "option<0>" is set to 1, is the 64-bit name of the general-purpose index register.

**`extend`**

Is the index extend specifier, and can be one of the values shown in Usage.

**`amount`**

Is the index shift amount, it must be.

**Usage**

Store Register Byte (register) calculates an address from a base register value and an offset register value, and stores a byte from a 32-bit register to the calculated address. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The instruction uses an offset addressing mode, that calculates the address used for the memory access from a base register value and an offset register value. The offset can be optionally shifted and extended.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.107 STRH (immediate) (A64)

Store Register Halfword (immediate).

**Syntax**

```
STRH Wt, [Xn|SP], #simm ; Post-index general registers
```

```
STRH Wt, [Xn|SP, #simm]! ; Pre-index general registers
```

```
STRH Wt, [Xn|SP, #{pimm}] ; Unsigned offset general registers
```

Where:

**`simm`**

Is the signed immediate byte offset, in the range -256 to 255.

**`pimm`**

Is the optional positive immediate byte offset, a multiple of 2 in the range 0 to 8190, defaulting to 0.

**`Wt`**

Is the 32-bit name of the general-purpose register to be transferred.

**`Xn|SP`**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Store Register Halfword (immediate) stores the least significant halfword of a 32-bit register to memory. The address that is used for the store is calculated from a base register and an immediate

offset. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



**Note** For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly *STRH (immediate)*.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.108 STRH (register) (A64)

Store Register Halfword (register).

### Syntax

`STRH Wt, [Xn|SP, (Wm|Xm) {, extend {amount} }]`

Where:

#### ***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

#### ***Xn* | *SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

#### ***Wm***

When "option<0>" is set to 0, is the 32-bit name of the general-purpose index register.

#### ***Xm***

When "option<0>" is set to 1, is the 64-bit name of the general-purpose index register.

#### ***extend***

Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option when *amount* is omitted, and can be one of *uxtw*, *lsl*, *sxtw* or *sctx*.

#### ***amount***

Is the index shift amount, optional only when *extend* is not LSL. Where it is permitted to be optional, it defaults to #0. It is, and can be either #0 or #1.

### Usage

Store Register Halfword (register) calculates an address from a base register value and an offset register value, and stores a halfword from a 32-bit register to the calculated address. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The instruction uses an offset addressing mode, that calculates the address used for the memory access from a base register value and an offset register value. The offset can be optionally shifted and extended.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.109 STSET, STSETL, STSETL (A64)

Atomic bit set on word or doubleword in memory, without return.

## Syntax

```
STSET Ws, [Xn|SP] ; 32-bit, no memory ordering general registers
```

```
STSETL Ws, [Xn|SP] ; 32-bit, release general registers
```

```
STSET Xs, [Xn|SP] ; 64-bit, no memory ordering general registers
```

```
STSETL Xs, [Xn|SP] ; 64-bit, release general registers
```

Where:

### Ws

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

### Xn | SP

Is the 64-bit name of the general-purpose base register or stack pointer.

### Xs

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic bit set on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword from memory, performs a bitwise OR with the value held in a register on it, and stores the result back to memory.

- STSET has no memory ordering semantics.
- STSETL stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.110 STSETB, STSETLB (A64)

Atomic bit set on byte in memory, without return.

## Syntax

`STSETB Ws, [Xn|SP] ; No memory ordering general registers`

`STSETLB Ws, [Xn|SP] ; Release general registers`

Where:

### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic bit set on byte in memory, without return, atomically loads an 8-bit byte from memory, performs a bitwise OR with the value held in a register on it, and stores the result back to memory.

- STSETB has no memory ordering semantics.
- STSETLB stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.111 STSETH, STSETLH (A64)

Atomic bit set on halfword in memory, without return.

## Syntax

`STSETH Ws, [Xn|SP] ; No memory ordering general registers`

```
STSETLH Ws, [Xn|SP] ; Release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Atomic bit set on halfword in memory, without return, atomically loads a 16-bit halfword from memory, performs a bitwise OR with the value held in a register on it, and stores the result back to memory.

- STSETH has no memory ordering semantics.
- STSETLH stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.112 STSMAX, STSMAXL, STSMAXL (A64)

Atomic signed maximum on word or doubleword in memory, without return.

### Syntax

```
STSMAX Ws, [Xn|SP] ; 32-bit, no memory ordering general registers
```

```
STSMAXL Ws, [Xn|SP] ; 32-bit, release general registers
```

```
STSMAX Xs, [Xn|SP] ; 64-bit, no memory ordering general registers
```

```
STSMAXL Xs, [Xn|SP] ; 64-bit, release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Xs**

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

**Usage**

Atomic signed maximum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the values as signed numbers.

- STSMAX has no memory ordering semantics.
- STSMAXL stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.113 STSMAXB, STSMAXLB (A64)

Atomic signed maximum on byte in memory, without return.

**Syntax**

```
STSMAXB Ws, [Xn|SP] ; No memory ordering general registers
```

```
STSMAXLB Ws, [Xn|SP] ; Release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic signed maximum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the values as signed numbers.

- STSMAXB has no memory ordering semantics.
- STSMAXLB stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.114 STSMAXH, STSMAXLH (A64)

Atomic signed maximum on halfword in memory, without return.

### Syntax

STSMAXH *Ws*, [Xn|SP] ; No memory ordering general registers

STSMAXLH *Ws*, [Xn|SP] ; Release general registers

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic signed maximum on halfword in memory, without return, atomically loads a 16-bit halfword from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the values as signed numbers.

- STSMAXH has no memory ordering semantics.
- STSMAXLH stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.115 STSMIN, STSMINL, STSMINL (A64)

Atomic signed minimum on word or doubleword in memory, without return.

## Syntax

STSMIN *Ws*, [Xn|SP] ; 32-bit, no memory ordering general registers

STSMINL *Ws*, [Xn|SP] ; 32-bit, release general registers

STSMIN *Xs*, [Xn|SP] ; 64-bit, no memory ordering general registers

STSMINL *Xs*, [Xn|SP] ; 64-bit, release general registers

Where:

### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

### **Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### **Xs**

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic signed minimum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the values as signed numbers.

- STSMIN has no memory ordering semantics.
- STSMINL stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.116 STSMINB, STSMINLB (A64)

Atomic signed minimum on byte in memory, without return.

### Syntax

```
STSMINB Ws, [Xn|SP] ; No memory ordering general registers
```

```
STSMINLB Ws, [Xn|SP] ; Release general registers
```

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

### Usage

Atomic signed minimum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the values as signed numbers.

- STSMINB has no memory ordering semantics.
- STSMINLB stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.117 STSMINH, STSMINLH (A64)

Atomic signed minimum on halfword in memory, without return.

### Syntax

```
STSMINH Ws, [Xn|SP] ; No memory ordering general registers
```

```
STSMINLH Ws, [Xn|SP] ; Release general registers
```

Where:

***Ws***

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

**Usage**

Atomic signed minimum on halfword in memory, without return, atomically loads a 16-bit halfword from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the values as signed numbers.

- `STSMINH` has no memory ordering semantics.
- `STSMINLH` stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.118 STTR (A64)

Store Register (unprivileged).

**Syntax**

`STTR Wt, [Xn|SP, #simm] ; 32-bit`

`STTR Xt, [Xn|SP, #simm] ; 64-bit`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xt***

Is the 64-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***simm***

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

## Usage

Store Register (unprivileged) stores a word or doubleword from a register to memory. The address that is used for the store is calculated from a base register and an immediate offset.

The memory is restricted as if execution is at EL0 when:

- Executing at EL1.
- Executing at EL2, in Arm®v8.1, with HCR\_EL2.{E2H, TGE} set to {1, 1}.

Otherwise, the access permission is for the Exception level at which the instruction is executed. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.119 STTRB (A64)

Store Register Byte (unprivileged).

### Syntax

`STTRB Wt, [Xn|SP, #{simm}]`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***simm***

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

## Usage

Store Register Byte (unprivileged) stores a byte from a 32-bit register to memory. The address that is used for the store is calculated from a base register and an immediate offset.

The memory is restricted as if execution is at EL0 when:

- Executing at EL1.
- Executing at EL2, in Arm®v8.1, with HCR\_EL2.{E2H, TGE} set to {1, 1}.

Otherwise, the access permission is for the Exception level at which the instruction is executed. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.120 STTRH (A64)

Store Register Halfword (unprivileged).

**Syntax**

`STTRH Wt, [Xn | SP, #{simm}]`

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn* | *SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***simm***

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

**Usage**

Store Register Halfword (unprivileged) stores a halfword from a 32-bit register to memory. The address that is used for the store is calculated from a base register and an immediate offset.

The memory is restricted as if execution is at EL0 when:

- Executing at EL1.
- Executing at EL2, in Arm®v8.1, with HCR\_EL2.{E2H, TGE} set to {1, 1}.

Otherwise, the access permission is for the Exception level at which the instruction is executed. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.121 STUMAX, STUMAXL, STUMAXXL (A64)

Atomic unsigned maximum on word or doubleword in memory, without return.

**Syntax**

`STUMAX Ws, [Xn | SP] ; 32-bit, no memory ordering general registers`

`STUMAXL Ws, [Xn | SP] ; 32-bit, release general registers`

```
STUMAX Xs, [Xn|SP] ; 64-bit, no memory ordering general registers
```

```
STUMAXL Xs, [Xn|SP] ; 64-bit, release general registers
```

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

#### **Xs**

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic unsigned maximum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the values as unsigned numbers.

- STUMAX has no memory ordering semantics.
- STUMAXL stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.122 STUMAXB, STUMAXLB (A64)

Atomic unsigned maximum on byte in memory, without return.

## Syntax

```
STUMAXB Ws, [Xn|SP] ; No memory ordering general registers
```

```
STUMAXLB Ws, [Xn|SP] ; Release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

**Usage**

Atomic unsigned maximum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the values as unsigned numbers.

- STUMAXB has no memory ordering semantics.
- STUMAXLB stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.123 STUMAXH, STUMAXLH (A64)

Atomic unsigned maximum on halfword in memory, without return.

**Syntax**

```
STUMAXH Ws, [Xn|SP] ; No memory ordering general registers
```

```
STUMAXLH Ws, [Xn|SP] ; Release general registers
```

Where:

**Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Architectures supported**

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic unsigned maximum on halfword in memory, without return, atomically loads a 16-bit halfword from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the values as unsigned numbers.

- `STUMAXH` has no memory ordering semantics.
- `STUMAXLH` stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.124 STUMIN, STUMINL, STUMINL (A64)

Atomic unsigned minimum on word or doubleword in memory, without return.

### Syntax

`STUMIN Ws, [Xn|SP] ; 32-bit, no memory ordering general registers`

`STUMINL Ws, [Xn|SP] ; 32-bit, release general registers`

`STUMIN Xs, [Xn|SP] ; 64-bit, no memory ordering general registers`

`STUMINL Xs, [Xn|SP] ; 64-bit, release general registers`

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

#### **Xs**

Is the 64-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

### Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic unsigned minimum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the values as unsigned numbers.

- STUMIN has no memory ordering semantics.
- STUMINL stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.125 STUMINB, STUMINLB (A64)

Atomic unsigned minimum on byte in memory, without return.

### Syntax

STUMINB *Ws*, [Xn|SP] ; No memory ordering general registers

STUMINLB *Ws*, [Xn|SP] ; Release general registers

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic unsigned minimum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the values as unsigned numbers.

- STUMINB has no memory ordering semantics.
- STUMINLB stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.126 STUMINH, STUMINLH (A64)

Atomic unsigned minimum on halfword in memory, without return.

## Syntax

STUMINH *Ws*, [Xn|SP] ; No memory ordering general registers

STUMINLH *Ws*, [Xn|SP] ; Release general registers

Where:

### **Ws**

Is the 32-bit name of the general-purpose register holding the data value to be operated on with the contents of the memory location.

### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Atomic unsigned minimum on halfword in memory, without return, atomically loads a 16-bit halfword from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the values as unsigned numbers.

- STUMINH has no memory ordering semantics.
- STUMINLH stores to memory with release semantics, as described in *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.127 STUR (A64)

Store Register (unscaled).

## Syntax

STUR *Wt*, [Xn|SP, #*simm*] ; 32-bit

```
STUR Xt, [Xn|SP, #{simm}] ; 64-bit
```

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xt**

Is the 64-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**simm**

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

## Usage

Store Register (unscaled) calculates an address from a base register value and an immediate offset, and stores a 32-bit word or a 64-bit doubleword to the calculated address, from a register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.128 STURB (A64)

Store Register Byte (unscaled).

### Syntax

```
STURB Wt, [Xn|SP, #{simm}]
```

Where:

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**simm**

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

## Usage

Store Register Byte (unscaled) calculates an address from a base register value and an immediate offset, and stores a byte to the calculated address, from a 32-bit register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.129 STURH (A64)

Store Register Halfword (unscaled).

**Syntax**

```
STURH Wt, [Xn | SP, #{simm}]
```

Where:

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn* | *SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***simm***

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

**Usage**

Store Register Halfword (unscaled) calculates an address from a base register value and an immediate offset, and stores a halfword to the calculated address, from a 32-bit register. For information about memory accesses, see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Related information**[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.130 STXP (A64)

Store Exclusive Pair of registers.

**Syntax**

```
STXP Ws, Wt1, Wt2, [Xn | SP, #0] ; 32-bit
```

```
STXP Ws, Xt1, Xt2, [Xn | SP, #0] ; 64-bit
```

Where:

***Wt1***

Is the 32-bit name of the first general-purpose register to be transferred.

***Wt2***

Is the 32-bit name of the second general-purpose register to be transferred.

***xt1***

Is the 64-bit name of the first general-purpose register to be transferred.

***xt2***

Is the 64-bit name of the second general-purpose register to be transferred.

***ws***

Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is written. The value returned is:

**0**

If the operation updates memory.

**1**

If the operation fails to update memory.

***xn|sp***

Is the 64-bit name of the general-purpose base register or stack pointer.

## Aborts and alignment

If a synchronous Data Abort exception is generated by the execution of this instruction:

- Memory is not updated.
- *ws* is not updated.

Accessing an address that is not aligned to the size of the data being accessed causes an Alignment fault Data Abort exception to be generated, subject to the following rules:

- The exception is generated if the Exclusive Monitors for the current PE include all of the addresses associated with the virtual address region of size bytes starting at address. The immediately following memory write must be to the same addresses.
- Otherwise, it is **IMPLEMENTATION DEFINED** whether the exception is generated.

Whether the detection of memory aborts happens before or after the check on the local Exclusive Monitor depends on the implementation. As a result a failure of the local monitor can occur on some implementations even if the memory access would give a memory abort.

## Usage

Store Exclusive Pair of registers stores two 32-bit words or two 64-bit doublewords from two registers to a memory location if the PE has exclusive access to the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). A 32-bit pair requires the address to be doubleword aligned and is single-copy atomic at doubleword granularity. A 64-bit pair requires the address to be quadword aligned and, if the Store-Exclusive succeeds, it causes a single-copy atomic update of the 128-bit memory location being updated. For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



**Note** For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly `STXP`.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.131 STXR (A64)

Store Exclusive Register.

### Syntax

`STXR ws, wt, [Xn|SP,#0] ; 32-bit`

`STXR ws, xt, [Xn|SP,#0] ; 64-bit`

Where:

**wt**

Is the 32-bit name of the general-purpose register to be transferred.

**xt**

Is the 64-bit name of the general-purpose register to be transferred.

**ws**

Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is written. The value returned is:

**0**

If the operation updates memory.

**1**

If the operation fails to update memory.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Aborts and alignment

If a synchronous Data Abort exception is generated by the execution of this instruction:

- Memory is not updated.
- `ws` is not updated.

Accessing an address that is not aligned to the size of the data being accessed causes an Alignment fault Data Abort exception to be generated, subject to the following rules:

- The exception is generated if the Exclusive Monitors for the current PE include all of the addresses associated with the virtual address region of size bytes starting at address. The immediately following memory write must be to the same addresses.
- Otherwise, it is **IMPLEMENTATION DEFINED** whether the exception is generated.

Whether the detection of memory aborts happens before or after the check on the local Exclusive Monitor depends on the implementation. As a result a failure of the local monitor can occur on some implementations even if the memory access would give a memory abort.

## Usage

Store Exclusive Register stores a 32-bit word or a 64-bit doubleword from a register to memory if the PE has exclusive access to the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



**Note** For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly `STXR`.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.132 STXRB (A64)

Store Exclusive Register Byte.

### Syntax

`STXRB Ws, Wt, [Xn|SP,#0]`

Where:

**Ws**

Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is written. The value returned is:

**0**

If the operation updates memory.

**1**

If the operation fails to update memory.

**Wt**

Is the 32-bit name of the general-purpose register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Aborts**

If a synchronous Data Abort exception is generated by the execution of this instruction:

- Memory is not updated.
- *ws* is not updated.

Whether the detection of memory aborts happens before or after the check on the local Exclusive Monitor depends on the implementation. As a result a failure of the local monitor can occur on some implementations even if the memory access would give a memory abort.

**Usage**

Store Exclusive Register Byte stores a byte from a register to memory if the PE has exclusive access to the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). The memory access is atomic.

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).



Note

For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly **STXRB**.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.133 STXRH (A64)

Store Exclusive Register Halfword.

**Syntax**

STXRH *Ws*, *Wt*, [Xn|SP, #0]

Where:

**Ws**

Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is written. The value returned is:

0

If the operation updates memory.

**1**

If the operation fails to update memory.

***Wt***

Is the 32-bit name of the general-purpose register to be transferred.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

### Aborts and alignment

If a synchronous Data Abort exception is generated by the execution of this instruction:

- Memory is not updated.
- *Ws* is not updated.

A non halfword-aligned memory address causes an Alignment fault Data Abort exception to be generated, subject to the following rules:

- The exception is generated if the Exclusive Monitors for the current PE include all of the addresses associated with the virtual address region of size bytes starting at address. The immediately following memory write must be to the same addresses.
- Otherwise, it is **IMPLEMENTATION DEFINED** whether the exception is generated.

Whether the detection of memory aborts happens before or after the check on the local Exclusive Monitor depends on the implementation. As a result a failure of the local monitor can occur on some implementations even if the memory access would give a memory abort.

### Usage

Store Exclusive Register Halfword stores a halfword from a register to memory if the PE has exclusive access to the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See *Synchronization and semaphores* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#). The memory access is atomic.

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

### Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 18.134 SWPA, SWPAL, SWP, SWPL, SWPAL, SWP, SWPL (A64)

Swap word or doubleword in memory.

### Syntax

`SWPA Ws, Wt, [Xn|SP] ; 32-bit, acquire general registers`

`SWPAL Ws, Wt, [Xn|SP] ; 32-bit, acquire and release general registers`

`SWP Ws, Wt, [Xn|SP] ; 32-bit, no memory ordering general registers`

`SWPL Ws, Wt, [Xn|SP] ; 32-bit, release general registers`

`SWPA Xs, Xt, [Xn|SP] ; 64-bit, acquire general registers`

`SWPAL Xs, Xt, [Xn|SP] ; 64-bit, acquire and release general registers`

`SWP Xs, Xt, [Xn|SP] ; 64-bit, no memory ordering general registers`

`SWPL Xs, Xt, [Xn|SP] ; 64-bit, release general registers`

Where:

#### **Ws**

Is the 32-bit name of the general-purpose register to be stored.

#### **Wt**

Is the 32-bit name of the general-purpose register to be loaded.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

#### **Xs**

Is the 64-bit name of the general-purpose register to be stored.

#### **Xt**

Is the 64-bit name of the general-purpose register to be loaded.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Swap word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from a memory location, and stores the value held in a register back to the same memory location. The value initially loaded from memory is returned in the destination register.

- If the destination register is not one of `WZR` or `XZR`, `SWPA` and `SWPAL` load from memory with acquire semantics.
- `SWPL` and `SWPAL` store to memory with release semantics.
- `SWP` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.135 SWPAB, SWPALB, SWPB, SWPLB (A64)

Swap byte in memory.

## Syntax

`SWPAB Ws, Wt, [Xn|SP] ; Acquire general registers`

`SWPALB Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`SWPB Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`SWPLB Ws, Wt, [Xn|SP] ; Release general registers`

Where:

### ***Ws***

Is the 32-bit name of the general-purpose register to be stored.

### ***Wt***

Is the 32-bit name of the general-purpose register to be loaded.

### ***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Swap byte in memory atomically loads an 8-bit byte from a memory location, and stores the value held in a register back to the same memory location. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `SWPAB` and `SWPALB` load from memory with acquire semantics.
- `SWPLB` and `SWPALB` store to memory with release semantics.
- `SWPB` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 18.136 SWPAH, SWPALH, SWPH, SWPLH (A64)

Swap halfword in memory.

## Syntax

`SWPAH Ws, Wt, [Xn|SP] ; Acquire general registers`

`SWPALH Ws, Wt, [Xn|SP] ; Acquire and release general registers`

`SWPH Ws, Wt, [Xn|SP] ; No memory ordering general registers`

`SWPLH Ws, Wt, [Xn|SP] ; Release general registers`

Where:

**Ws**

Is the 32-bit name of the general-purpose register to be stored.

**Wt**

Is the 32-bit name of the general-purpose register to be loaded.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Architectures supported

Supported in the Arm®v8.1 architecture and later.

## Usage

Swap halfword in memory atomically loads a 16-bit halfword from a memory location, and stores the value held in a register back to the same memory location. The value initially loaded from memory is returned in the destination register.

- If the destination register is not `WZR`, `SWPAH` and `SWPALH` load from memory with acquire semantics.
- `SWPLH` and `SWPALH` store to memory with release semantics.
- `SWPH` has no memory ordering requirements.

For more information about memory ordering semantics see *Load-Acquire, Store-Release* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

For information about memory accesses see *Load/Store addressing modes* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

# 19. A64 Floating-point Instructions

Describes the A64 floating-point instructions.

## 19.1 A64 floating-point instructions in alphabetical order

A summary of the A64 floating-point instructions that are supported.

**Table 19-1: Summary of A64 floating-point instructions**

| Mnemonic                     | Brief description                                                                          | See                                                |
|------------------------------|--------------------------------------------------------------------------------------------|----------------------------------------------------|
| FABS (scalar)                | Floating-point Absolute value (scalar)                                                     | <a href="#">FABS (scalar) (A64 FP)</a>             |
| FADD (scalar)                | Floating-point Add (scalar)                                                                | <a href="#">FADD (scalar) (A64)</a>                |
| FCCMP                        | Floating-point Conditional quiet Compare (scalar)                                          | <a href="#">FCCMP (A64)</a>                        |
| FCCMPE                       | Floating-point Conditional signaling Compare (scalar)                                      | <a href="#">FCCMPE (A64)</a>                       |
| FCMP                         | Floating-point quiet Compare (scalar)                                                      | <a href="#">FCMP (A64)</a>                         |
| FCMPE                        | Floating-point signaling Compare (scalar)                                                  | <a href="#">FCMPE (A64)</a>                        |
| FCSEL                        | Floating-point Conditional Select (scalar)                                                 | <a href="#">FCSEL (A64)</a>                        |
| FCVT                         | Floating-point Convert precision (scalar)                                                  | <a href="#">FCVT (A64)</a>                         |
| FCVTAS (scalar)              | Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar)   | <a href="#">FCVTAS (scalar) (A64)</a>              |
| FCVTAU (scalar)              | Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar) | <a href="#">FCVTAU (scalar) (A64)</a>              |
| FCVTMS (scalar)              | Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar)          | <a href="#">FCVTMS (scalar) (A64)</a>              |
| FCVTMU (scalar)              | Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar)        | <a href="#">FCVTMU (scalar) (A64)</a>              |
| FCVTNS (scalar)              | Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar)   | <a href="#">FCVTNS (scalar) (A64)</a>              |
| FCVTNU (scalar)              | Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar) | <a href="#">FCVTNU (scalar) (A64)</a>              |
| FCVTPS (scalar)              | Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar)           | <a href="#">FCVTPS (scalar) (A64)</a>              |
| FCVTPU (scalar)              | Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar)         | <a href="#">FCVTPU (scalar) (A64)</a>              |
| FCVTZS (scalar, fixed-point) | Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar)                | <a href="#">FCVTZS (scalar, fixed-point) (A64)</a> |
| FCVTZS (scalar, integer)     | Floating-point Convert to Signed integer, rounding toward Zero (scalar)                    | <a href="#">FCVTZS (scalar, integer) (A64)</a>     |
| FCVTZU (scalar, fixed-point) | Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar)              | <a href="#">FCVTZU (scalar, fixed-point) (A64)</a> |

| Mnemonic                     | Brief description                                                             | See                                |
|------------------------------|-------------------------------------------------------------------------------|------------------------------------|
| FCVTZU (scalar, integer)     | Floating-point Convert to Unsigned integer, rounding toward Zero (scalar)     | FCVTZU (scalar, integer) (A64)     |
| FDIV (scalar)                | Floating-point Divide (scalar)                                                | FDIV (scalar) (A64)                |
| FJCVTZS                      | Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero | FJCVTZS (A64)                      |
| FMADD                        | Floating-point fused Multiply-Add (scalar)                                    | FMADD (A64)                        |
| FMAX (scalar)                | Floating-point Maximum (scalar)                                               | FMAX (scalar) (A64)                |
| FMAXNM (scalar)              | Floating-point Maximum Number (scalar)                                        | FMAXNM (scalar) (A64)              |
| FMIN (scalar)                | Floating-point Minimum (scalar)                                               | FMIN (scalar) (A64)                |
| FMINNM (scalar)              | Floating-point Minimum Number (scalar)                                        | FMINNM (scalar) (A64)              |
| FMOV (register)              | Floating-point Move register without conversion                               | FMOV (register) (A64)              |
| FMOV (general)               | Floating-point Move to or from general-purpose register without conversion    | FMOV (general) (A64)               |
| FMOV (scalar, immediate)     | Floating-point move immediate (scalar)                                        | FMOV (scalar, immediate) (A64)     |
| FMSUB                        | Floating-point Fused Multiply-Subtract (scalar)                               | FMSUB (A64)                        |
| FMUL (scalar)                | Floating-point Multiply (scalar)                                              | FMUL (scalar) (A64)                |
| FNEG (scalar)                | Floating-point Negate (scalar)                                                | FNEG (scalar) (A64)                |
| FNMADD                       | Floating-point Negated fused Multiply-Add (scalar)                            | FNMADD (A64)                       |
| FNMSUB                       | Floating-point Negated fused Multiply-Subtract (scalar)                       | FNMSUB (A64)                       |
| FNMUL (scalar)               | Floating-point Multiply-Negate (scalar)                                       | FNMUL (scalar) (A64)               |
| FRINTA (scalar)              | Floating-point Round to Integral, to nearest with ties to Away (scalar)       | FRINTA (scalar) (A64)              |
| FRINTI (scalar)              | Floating-point Round to Integral, using current rounding mode (scalar)        | FRINTI (scalar) (A64)              |
| FRINTM (scalar)              | Floating-point Round to Integral, toward Minus infinity (scalar)              | FRINTM (scalar) (A64)              |
| FRINTN (scalar)              | Floating-point Round to Integral, to nearest with ties to even (scalar)       | FRINTN (scalar) (A64)              |
| FRINTP (scalar)              | Floating-point Round to Integral, toward Plus infinity (scalar)               | FRINTP (scalar) (A64)              |
| FRINTX (scalar)              | Floating-point Round to Integral exact, using current rounding mode (scalar)  | FRINTX (scalar) (A64)              |
| FRINTZ (scalar)              | Floating-point Round to Integral, toward Zero (scalar)                        | FRINTZ (scalar) (A64)              |
| FSQRT (scalar)               | Floating-point Square Root (scalar)                                           | FSQRT (scalar) (A64)               |
| FSUB (scalar)                | Floating-point Subtract (scalar)                                              | FSUB (scalar) (A64)                |
| LDNP (SIMD and FP)           | Load Pair of SIMD and FP registers, with Non-temporal hint                    | LDNP (SIMD and FP) (A64)           |
| LDP (SIMD and FP)            | Load Pair of SIMD and FP registers                                            | LDP (SIMD and FP) (A64)            |
| LDR (immediate, SIMD and FP) | Load SIMD and FP Register (immediate offset)                                  | LDR (immediate, SIMD and FP) (A64) |

| Mnemonic                     | Brief description                                           | See                                                |
|------------------------------|-------------------------------------------------------------|----------------------------------------------------|
| LDR (literal, SIMD and FP)   | Load SIMD and FP Register (PC-relative literal)             | <a href="#">LDR (literal, SIMD and FP) (A64)</a>   |
| LDR (register, SIMD and FP)  | Load SIMD and FP Register (register offset)                 | <a href="#">LDR (register, SIMD and FP) (A64)</a>  |
| LDUR (SIMD and FP)           | Load SIMD and FP Register (unscaled offset)                 | <a href="#">LDUR (SIMD and FP) (A64)</a>           |
| SCVTF (scalar, fixed-point)  | Signed fixed-point Convert to Floating-point (scalar)       | <a href="#">SCVTF (scalar, fixed-point) (A64)</a>  |
| SCVTF (scalar, integer)      | Signed integer Convert to Floating-point (scalar)           | <a href="#">SCVTF (scalar, integer) (A64)</a>      |
| STNP (SIMD and FP)           | Store Pair of SIMD and FP registers, with Non-temporal hint | <a href="#">STNP (SIMD and FP) (A64)</a>           |
| STP (SIMD and FP)            | Store Pair of SIMD and FP registers                         | <a href="#">STP (SIMD and FP) (A64)</a>            |
| STR (immediate, SIMD and FP) | Store SIMD and FP register (immediate offset)               | <a href="#">STR (immediate, SIMD and FP) (A64)</a> |
| STR (register, SIMD and FP)  | Store SIMD and FP register (register offset)                | <a href="#">STR (register, SIMD and FP) (A64)</a>  |
| STUR (SIMD and FP)           | Store SIMD and FP register (unscaled offset)                | <a href="#">STUR (SIMD and FP) (A64)</a>           |
| UCVTF (scalar, fixed-point)  | Unsigned fixed-point Convert to Floating-point (scalar)     | <a href="#">UCVTF (scalar, fixed-point) (A64)</a>  |
| UCVTF (scalar, integer)      | Unsigned integer Convert to Floating-point (scalar)         | <a href="#">UCVTF (scalar, integer) (A64)</a>      |

## 19.2 FABS (scalar) (A64 FP)

Floating-point Absolute value (scalar).

### Syntax

`FABS Hd, Hn ; Half-precision`

`FABS Sd, Sn ; Single-precision`

`FABS Dd, Dn ; Double-precision`

Where:

#### **Hd**

Is the 16-bit name of the SIMD and FP destination register.

#### **Hn**

Is the 16-bit name of the SIMD and FP source register.

#### **Sd**

Is the 32-bit name of the SIMD and FP destination register.

#### **Sn**

Is the 32-bit name of the SIMD and FP source register.

#### **Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Absolute value (scalar). This instruction calculates the absolute value in the SIMD and FP source register and writes the result to the SIMD and FP destination register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$Vd = \text{abs}(Vn)$ .

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.3 FADD (scalar) (A64)

Floating-point Add (scalar).

### Syntax

`FADD Hd, Hn, Hm ; Half-precision`

`FADD Sd, Sn, Sm ; Single-precision`

`FADD Dd, Dn, Dm ; Double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register.

**Hm**

Is the 16-bit name of the second SIMD and FP source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the first SIMD and FP source register.

**Sm**

Is the 32-bit name of the second SIMD and FP source register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the first SIMD and FP source register.

**Dm**

Is the 64-bit name of the second SIMD and FP source register.

**Operation**

Floating-point Add (scalar). This instruction adds the floating-point values of the two source SIMD and FP registers, and writes the result to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$Vd = Vn +Vm$ .

**Related information**

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.4 FCCMP (A64)

Floating-point Conditional quiet Compare (scalar).

**Syntax**

```
FCCMP Hn, Hm, #nzcv, cond ; Half-precision
FCCMP Sn, Sm, #nzcv, cond ; Single-precision
FCCMP Dn, Dm, #nzcv, cond ; Double-precision
```

Where:

**Hn**

Is the 16-bit name of the first SIMD and FP source register.

**Hm**

Is the 16-bit name of the second SIMD and FP source register.

**Sn**

Is the 32-bit name of the first SIMD and FP source register.

**Sm**

Is the 32-bit name of the second SIMD and FP source register.

**D<sub>n</sub>**

Is the 64-bit name of the first SIMD and FP source register.

**D<sub>m</sub>**

Is the 64-bit name of the second SIMD and FP source register.

**#nzcv**

Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit NZCV condition flags.

**cond**

Is one of the standard conditions.

**NaNs**

The IEEE 754 standard specifies that the result of a comparison is precisely one of <, ==, > or unordered. If either or both of the operands are NaNs, they are unordered, and all three of (Operand1 < Operand2), (Operand1 == Operand2) and (Operand1 > Operand2) are false. This case results in the FPSCR flags being set to N=0, Z=0, C=1, and V=1.

**Operation**

Floating-point Conditional quiet Compare (scalar). This instruction compares the two SIMD and FP source register values and writes the result to the PSTATE.{N, Z, C, V} flags. If the condition does not pass then the PSTATE.{N, Z, C, V} flags are set to the flag bit specifier.

It raises an Invalid Operation exception only if either operand is a signaling NaN.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

```
flags = if cond then compareQuiet(vn, vm) else #nzcv.
```

**Related information**

[Condition code suffixes and related flags](#) on page 139

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.5 FCCMPE (A64)

Floating-point Conditional signaling Compare (scalar).

**Syntax**

```
FCCMPE Hn, Hm, #nzcv, cond ; Half-precision
```

`FCCMPE Sn, Sm, #nzcv, cond ; Single-precision`

`FCCMPE Dn, Dm, #nzcv, cond ; Double-precision`

Where:

***Hn***

Is the 16-bit name of the first SIMD and FP source register.

***Hm***

Is the 16-bit name of the second SIMD and FP source register.

***Sn***

Is the 32-bit name of the first SIMD and FP source register.

***Sm***

Is the 32-bit name of the second SIMD and FP source register.

***Dn***

Is the 64-bit name of the first SIMD and FP source register.

***Dm***

Is the 64-bit name of the second SIMD and FP source register.

***nzcv***

Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit NZCV condition flags.

***cond***

Is one of the standard conditions.

## NaNs

The IEEE 754 standard specifies that the result of a comparison is precisely one of <, ==, > or unordered. If either or both of the operands are NaNs, they are unordered, and all three of (Operand1 < Operand2), (Operand1 == Operand2) and (Operand1 > Operand2) are false. This case results in the FPSCR flags being set to N=0, Z=0, C=1, and V=1.

`FCCMPE` raises an Invalid Operation exception if either operand is any type of NaN, and is suitable for testing for <, <=, >, >=, and other predicates that raise an exception when the operands are unordered.

## Operation

Floating-point Conditional signaling Compare (scalar). This instruction compares the two SIMD and FP source register values and writes the result to the PSTATE.{N, Z, C, V} flags. If the condition does not pass then the PSTATE.{N, Z, C, V} flags are set to the flag bit specifier.

If either operand is any type of NaN, or if either operand is a signaling NaN, the instruction raises an Invalid Operation exception.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated.

For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

```
flags = if cond then compareSignaling(Vn, Vm) else #ncv.
```

## Related information

[Condition code suffixes and related flags](#) on page 139

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.6 FCMP (A64)

Floating-point quiet Compare (scalar).

### Syntax

```
FCMP Hn, Hm ; Half-precision
```

```
FCMP Hn, #0.0 ; Half-precision, zero
```

```
FCMP Sn, Sm ; Single-precision
```

```
FCMP Sn, #0.0 ; Single-precision, zero
```

```
FCMP Dn, Dm ; Double-precision
```

```
FCMP Dn, #0.0 ; Double-precision, zero
```

Where:

#### Hn

Depends on the instruction variant:

**Half-precision** Is the 16-bit name of the first SIMD and FP source register

**Half-precision, zero** Is the 16-bit name of the SIMD and FP source register

#### Hm

Is the 16-bit name of the second SIMD and FP source register.

#### Sn

Depends on the instruction variant:

**Single-precision** Is the 32-bit name of the first SIMD and FP source register.

**Single-precision, zero** Is the 32-bit name of the SIMD and FP source register.

***S<sub>m</sub>***

Is the 32-bit name of the second SIMD and FP source register.

***D<sub>n</sub>***

Depends on the instruction variant:

**Double-precision** Is the 64-bit name of the first SIMD and FP source register.

**Double-precision, zero** Is the 64-bit name of the SIMD and FP source register.

***D<sub>m</sub>***

Is the 64-bit name of the second SIMD and FP source register.

**NaNs**

The IEEE 754 standard specifies that the result of a comparison is precisely one of <, ==, > or unordered. If either or both of the operands are NaNs, they are unordered, and all three of (Operand1 < Operand2), (Operand1 == Operand2) and (Operand1 > Operand2) are false. This case results in the FPSCR flags being set to N=0, Z=0, C=1, and V=1.

**Usage**

Floating-point quiet Compare (scalar). This instruction compares the two SIMD and FP source register values, or the first SIMD and FP source register value and zero. It writes the result to the PSTATE.{N, Z, C, V} flags.

It raises an Invalid Operation exception only if either operand is a signaling NaN.

This instruction can generate a floating-point exception. Depending on the settings in FPCR in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), the exception results in either a flag being set in FPSR in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.7 FCMPE (A64)

Floating-point signaling Compare (scalar).

**Syntax**

`FCMPE Hn, Hm ; Half-precision`

`FCMPE Hn, #0.0 ; Half-precision, zero`

```

FCMPE Sn, Sm ; Single-precision

FCMPE Sn, #0.0 ; Single-precision, zero

FCMPE Dn, Dm ; Double-precision

FCMPE Dn, #0.0 ; Double-precision, zero

```

Where:

***Hn***

Depends on the instruction variant:

**Half-precision** Is the 16-bit name of the first SIMD and FP source register

**Half-precision, zero** Is the 16-bit name of the SIMD and FP source register

***Hm***

Is the 16-bit name of the second SIMD and FP source register.

***Sn***

Depends on the instruction variant:

**Single-precision** Is the 32-bit name of the first SIMD and FP source register.

**Single-precision, zero** Is the 32-bit name of the SIMD and FP source register.

***Sm***

Is the 32-bit name of the second SIMD and FP source register.

***Dn***

Depends on the instruction variant:

**Double-precision** Is the 64-bit name of the first SIMD and FP source register.

**Double-precision, zero** Is the 64-bit name of the SIMD and FP source register.

***Dm***

Is the 64-bit name of the second SIMD and FP source register.

**Nan**s

The IEEE 754 standard specifies that the result of a comparison is precisely one of <, ==, > or unordered. If either or both of the operands are NaNs, they are unordered, and all three of (Operand1 < Operand2), (Operand1 == Operand2) and (Operand1 > Operand2) are false. This case results in the FPSCR flags being set to N=0, Z=0, C=1, and V=1.

**FCMPE** raises an Invalid Operation exception if either operand is any type of NaN, and is suitable for testing for <, <=, >, >=, and other predicates that raise an exception when the operands are unordered.

## Usage

Floating-point signaling Compare (scalar). This instruction compares the two SIMD and FP source register values, or the first SIMD and FP source register value and zero. It writes the result to the PSTATE.{N, Z, C, V} flags.

If either operand is any type of NaN, or if either operand is a signaling NaN, the instruction raises an Invalid Operation exception.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.8 FCSEL (A64)

Floating-point Conditional Select (scalar).

### Syntax

FCSEL *Hd*, *Hn*, *Hm*, *cond* ; Half-precision

FCSEL *Sd*, *Sn*, *Sm*, *cond* ; Single-precision

FCSEL *Dd*, *Dn*, *Dm*, *cond* ; Double-precision

Where:

#### *Hd*

Is the 16-bit name of the SIMD and FP destination register.

#### *Hn*

Is the 16-bit name of the first SIMD and FP source register.

#### *Hm*

Is the 16-bit name of the second SIMD and FP source register.

#### *Sd*

Is the 32-bit name of the SIMD and FP destination register.

#### *Sn*

Is the 32-bit name of the first SIMD and FP source register.

#### *Sm*

Is the 32-bit name of the second SIMD and FP source register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the first SIMD and FP source register.

**Dm**

Is the 64-bit name of the second SIMD and FP source register.

**cond**

Is one of the standard conditions.

## Operation

Floating-point Conditional Select (scalar). This instruction allows the SIMD and FP destination register to take the value from either one or the other of two SIMD and FP source registers. If the condition passes, the first SIMD and FP source register value is taken, otherwise the second SIMD and FP source register value is taken.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Vd = if cond then Vn elseVm.`

## Related information

[Condition code suffixes and related flags](#) on page 139

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.9 FCVT (A64)

Floating-point Convert precision (scalar).

### Syntax

`FCVT Sd, Hn ; Half-precision to single-precision`

`FCVT Dd, Hn ; Half-precision to double-precision`

`FCVT Hd, Sn ; Single-precision to half-precision`

`FCVT Dd, Sn ; Single-precision to double-precision`

`FCVT Hd, Dn ; Double-precision to half-precision`

`FCVT Sd, Dn ; Double-precision to single-precision`

Where:

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***Dd***

Is the 64-bit name of the SIMD and FP destination register.

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Sn***

Is the 32-bit name of the SIMD and FP source register.

***Dn***

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Convert precision (scalar). This instruction converts the floating-point value in the SIMD and FP source register to the precision for the destination register data type using the rounding mode that is determined by the FPCR and writes the result to the SIMD and FP destination register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

*Vd* = convertFormat(*Vn*), where *v* is D, H, or S.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.10 FCVTAS (scalar) (A64)

Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar).

### Syntax

```
FCVTAS Wd, Hn ; Half-precision to 32-bit
```

```
FCVTAS Xd, Hn ; Half-precision to 64-bit
```

```
FCVTAS Wd, Sn ; Single-precision to 32-bit
```

```
FCVTAS Xd, Sn ; Single-precision to 64-bit
```

```
FCVTAS Wd, Dn ; Double-precision to 32-bit
```

```
FCVTAS Xd, Dn ; Double-precision to 64-bit
```

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Sn***

Is the 32-bit name of the SIMD and FP source register.

***Dn***

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar). This instruction converts the floating-point value in the SIMD and FP source register to a 32-bit or 64-bit signed integer using the Round to Nearest with Ties to Away rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

*Rd* = `signed_convertToIntegerExactTiesToAway(Vn)`, where *R* is either *w* or *x*.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.11 FCVTAU (scalar) (A64)

Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar).

### Syntax

```
FCVTAU Wd, Hn ; Half-precision to 32-bit
```

```
FCVTAU Xd, Hn ; Half-precision to 64-bit
```

```
FCVTAU Wd, Sn ; Single-precision to 32-bit
```

```
FCVTAU Xd, Sn ; Single-precision to 64-bit
```

```
FCVTAU Wd, Dn ; Double-precision to 32-bit
```

`FCVTAU Xd, Dn ; Double-precision to 64-bit`

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Sn***

Is the 32-bit name of the SIMD and FP source register.

***Dn***

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar). This instruction converts the floating-point value in the SIMD and FP source register to a 32-bit or 64-bit unsigned integer using the Round to Nearest with Ties to Away rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Rd = unsigned_convertToIntegerExactTiesToAway(Vn)`, where *R* is either *w* or *x*.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.12 FCVTMS (scalar) (A64)

Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar).

### Syntax

`FCVTMS Wd, Hn ; Half-precision to 32-bit`

`FCVTMS Xd, Hn ; Half-precision to 64-bit`

`FCVTMS Wd, Sn ; Single-precision to 32-bit`

`FCVTMS Rd, Sn ; Single-precision to 64-bit`

`FCVTMS Rd, Dn ; Double-precision to 32-bit`

`FCVTMS Rd, Dn ; Double-precision to 64-bit`

Where:

***Rd***

Is the 32-bit name of the general-purpose destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Sn***

Is the 32-bit name of the SIMD and FP source register.

***Dn***

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar). This instruction converts the floating-point value in the SIMD and FP source register to a 32-bit or 64-bit signed integer using the Round towards Minus Infinity rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Rd = signed_convertToIntegerExactTowardNegative(Vn)`, where *R* is either *w* or *x*.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.13 FCVTMU (scalar) (A64)

Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar).

### Syntax

`FCVTMU Rd, Hn ; Half-precision to 32-bit`

```
FCVTMU Xd, Hn ; Half-precision to 64-bit
FCVTMU Wd, Sn ; Single-precision to 32-bit
FCVTMU Xd, Sn ; Single-precision to 64-bit
FCVTMU Wd, Dn ; Double-precision to 32-bit
FCVTMU Xd, Dn ; Double-precision to 64-bit
```

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Sn**

Is the 32-bit name of the SIMD and FP source register.

**Dn**

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar). This instruction converts the floating-point value in the SIMD and FP source register to a 32-bit or 64-bit unsigned integer using the Round towards Minus Infinity rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

Rd = `unsigned_convertToIntegerExactTowardNegative(Vn)`, where R is either w or x.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.14 FCVTNS (scalar) (A64)

Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar).

### Syntax

```
FCVTNS Wd, Hn ; Half-precision to 32-bit
```

```
FCVTNS Xd, Hn ; Half-precision to 64-bit
```

```
FCVTNS Wd, Sn ; Single-precision to 32-bit
```

```
FCVTNS Xd, Sn ; Single-precision to 64-bit
```

```
FCVTNS Wd, Dn ; Double-precision to 32-bit
```

```
FCVTNS Xd, Dn ; Double-precision to 64-bit
```

Where:

#### ***Wd***

Is the 32-bit name of the general-purpose destination register.

#### ***Hn***

Is the 16-bit name of the SIMD and FP source register.

#### ***Xd***

Is the 64-bit name of the general-purpose destination register.

#### ***Sn***

Is the 32-bit name of the SIMD and FP source register.

#### ***Dn***

Is the 64-bit name of the SIMD and FP source register.

### Operation

Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar). This instruction converts the floating-point value in the SIMD and FP source register to a 32-bit or 64-bit signed integer using the Round to Nearest rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

*Rd* = `signed_convertToIntegerExactTiesToEven(Vn)`, where *R* is either *w* or *x*.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.15 FCVTNU (scalar) (A64)

Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar).

### Syntax

```
FCVTNU Wd, Hn ; Half-precision to 32-bit
FCVTNU Xd, Hn ; Half-precision to 64-bit
FCVTNU Wd, Sn ; Single-precision to 32-bit
FCVTNU Xd, Sn ; Single-precision to 64-bit
FCVTNU Wd, Dn ; Double-precision to 32-bit
FCVTNU Xd, Dn ; Double-precision to 64-bit
```

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Sn***

Is the 32-bit name of the SIMD and FP source register.

***Dn***

Is the 64-bit name of the SIMD and FP source register.

### Operation

Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar). This instruction converts the floating-point value in the SIMD and FP source register to a 32-bit or 64-bit unsigned integer using the Round to Nearest rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Rd = unsigned_convertToIntegerExactTiesToEven (Vn)`, where `R` is either `w` or `x`.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.16 FCVTPS (scalar) (A64)

Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar).

### Syntax

`FCVTPS Wd, Hn ; Half-precision to 32-bit`

`FCVTPS Xd, Hn ; Half-precision to 64-bit`

`FCVTPS Wd, Sn ; Single-precision to 32-bit`

`FCVTPS Xd, Sn ; Single-precision to 64-bit`

`FCVTPS Wd, Dn ; Double-precision to 32-bit`

`FCVTPS Xd, Dn ; Double-precision to 64-bit`

Where:

**`Wd`**

Is the 32-bit name of the general-purpose destination register.

**`Hn`**

Is the 16-bit name of the SIMD and FP source register.

**`Xd`**

Is the 64-bit name of the general-purpose destination register.

**`Sn`**

Is the 32-bit name of the SIMD and FP source register.

**`Dn`**

Is the 64-bit name of the SIMD and FP source register.

### Operation

Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar). This instruction converts the floating-point value in the SIMD and FP source register to a 32-bit or 64-bit signed integer using the Round towards Plus Infinity rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Rd = signed_convertToIntegerExactTowardPositive(vn)`, where `R` is either `w` or `x`.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.17 FCVTPU (scalar) (A64)

Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar).

### Syntax

`FCVTPU Wd, Hn ; Half-precision to 32-bit`

`FCVTPU Xd, Hn ; Half-precision to 64-bit`

`FCVTPU Wd, Sn ; Single-precision to 32-bit`

`FCVTPU Xd, Sn ; Single-precision to 64-bit`

`FCVTPU Wd, Dn ; Double-precision to 32-bit`

`FCVTPU Xd, Dn ; Double-precision to 64-bit`

Where:

#### ***Wd***

Is the 32-bit name of the general-purpose destination register.

#### ***Hn***

Is the 16-bit name of the SIMD and FP source register.

#### ***Xd***

Is the 64-bit name of the general-purpose destination register.

#### ***Sn***

Is the 32-bit name of the SIMD and FP source register.

#### ***Dn***

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar). This instruction converts the floating-point value in the SIMD and FP source register to a 32-bit or 64-bit unsigned integer using the Round towards Plus Infinity rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Rd = unsigned_convertToIntegerExactTowardPositive(Vn)`, where `R` is either `w` or `x`.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.18 FCVTZS (scalar, fixed-point) (A64)

Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar).

### Syntax

```
FCVTZS Wd, Hn, #fbits ; Half-precision to 32-bit
FCVTZS Xd, Hn, #fbits ; Half-precision to 64-bit
FCVTZS Wd, Sn, #fbits ; Single-precision to 32-bit
FCVTZS Xd, Sn, #fbits ; Single-precision to 64-bit
FCVTZS Wd, Dn, #fbits ; Double-precision to 32-bit
FCVTZS Xd, Dn, #fbits ; Double-precision to 64-bit
```

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**fbits**

Depends on the instruction variant:

**32-bit** Is the number of bits after the binary point in the fixed-point destination, in the range 1 to 32.

**64-bit** Is the number of bits after the binary point in the fixed-point destination, in the range 1 to 64.

#### ***Xd***

Is the 64-bit name of the general-purpose destination register.

#### ***Sn***

Is the 32-bit name of the SIMD and FP source register.

#### ***Dn***

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar). This instruction converts the floating-point value in the SIMD and FP source register to a 32-bit or 64-bit fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

*Rd* = `signed_convertToIntegerExactTowardZero(Vn * (2^fbits))`, where *R* is either *w* or *x*.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.19 FCVTZS (scalar, integer) (A64)

Floating-point Convert to Signed integer, rounding toward Zero (scalar).

### Syntax

```
FCVTZS Wd, Hn ; Half-precision to 32-bit
```

```
FCVTZS Xd, Hn ; Half-precision to 64-bit
```

```
FCVTZS Wd, Sn ; Single-precision to 32-bit
```

```
FCVTZS Xd, Sn ; Single-precision to 64-bit
```

`FCVTZS Wd, Dn ; Double-precision to 32-bit`

`FCVTZS Xd, Dn ; Double-precision to 64-bit`

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Sn**

Is the 32-bit name of the SIMD and FP source register.

**Dn**

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Convert to Signed integer, rounding toward Zero (scalar). This instruction converts the floating-point value in the SIMD and FP source register to a 32-bit or 64-bit signed integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Rd = signed_convertToIntegerExactTowardZero(Vn)`, where R is either w or x.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.20 FCVTZU (scalar, fixed-point) (A64)

Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar).

### Syntax

`FCVTZU Wd, Hn, #fbits ; Half-precision to 32-bit`

`FCVTZU Xd, Hn, #fbits ; Half-precision to 64-bit`

`FCVTZU Wd, Sn, #fbits ; Single-precision to 32-bit`

`FCVTZU Xd, Sn, #fbits ; Single-precision to 64-bit`

`FCVTZU Wd, Dn, #fbits ; Double-precision to 32-bit`

`FCVTZU Xd, Dn, #fbits ; Double-precision to 64-bit`

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Xn**

Is the 16-bit name of the SIMD and FP source register.

**fbits**

Depends on the instruction variant:

**32-bit** Is the number of bits after the binary point in the fixed-point destination, in the range 1 to 32.

**64-bit** Is the number of bits after the binary point in the fixed-point destination, in the range 1 to 64.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Sn**

Is the 32-bit name of the SIMD and FP source register.

**Dn**

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar). This instruction converts the floating-point value in the SIMD and FP source register to a 32-bit or 64-bit fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

`Rd = unsigned_convertToIntegerExactTowardZero(Vn*(2^fbits))`, where `R` is either `w` or `x`.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.21 FCVTZU (scalar, integer) (A64)

Floating-point Convert to Unsigned integer, rounding toward Zero (scalar).

### Syntax

```
FCVTZU Wd, Hn ; Half-precision to 32-bit
FCVTZU Xd, Hn ; Half-precision to 64-bit
FCVTZU Wd, Sn ; Single-precision to 32-bit
FCVTZU Xd, Sn ; Single-precision to 64-bit
FCVTZU Wd, Dn ; Double-precision to 32-bit
FCVTZU Xd, Dn ; Double-precision to 64-bit
```

Where:

***Wd***

Is the 32-bit name of the general-purpose destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***Xd***

Is the 64-bit name of the general-purpose destination register.

***Sn***

Is the 32-bit name of the SIMD and FP source register.

***Dn***

Is the 64-bit name of the SIMD and FP source register.

### Operation

Floating-point Convert to Unsigned integer, rounding toward Zero (scalar). This instruction converts the floating-point value in the SIMD and FP source register to a 32-bit or 64-bit unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Rd = unsigned_convertToIntegerExactTowardZero(vn)`, where `R` is either `w` or `x`.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.22 FDIV (scalar) (A64)

Floating-point Divide (scalar).

### Syntax

`FDIV Hd, Hn, Hm ; Half-precision`

`FDIV Sd, Sn, Sm ; Single-precision`

`FDIV Dd, Dn, Dm ; Double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register.

**Hm**

Is the 16-bit name of the second SIMD and FP source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the first SIMD and FP source register.

**Sm**

Is the 32-bit name of the second SIMD and FP source register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the first SIMD and FP source register.

**Dm**

Is the 64-bit name of the second SIMD and FP source register.

## Operation

Floating-point Divide (scalar). This instruction divides the floating-point value of the first source SIMD and FP register by the floating-point value of the second source SIMD and FP register, and writes the result to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$$V_d = V_n / V_m.$$

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.23 FJCVTZS (A64)

Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero.

### Syntax

`FJCVTZS Wd, Dn`

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Dn**

Is the 64-bit name of the SIMD and FP source register.

### Architectures supported

Supported in the Arm®v8.3-A architecture and later.

### Usage

Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero. This instruction converts the double-precision floating-point value in the SIMD and FP source register to a 32-bit signed integer using the Round towards Zero rounding mode, and write the result to the general-purpose destination register. If the result is too large to be held as a 32-bit signed integer, then the result is the integer modulo  $2^{32}$ , as held in a 32-bit signed integer.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.24 FMADD (A64)

Floating-point fused Multiply-Add (scalar).

### Syntax

FMADD *Hd*, *Hn*, *Hm*, *Ha* ; Half-precision

FMADD *Sd*, *Sn*, *Sm*, *Sa* ; Single-precision

FMADD *Dd*, *Dn*, *Dm*, *Da* ; Double-precision

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the first SIMD and FP source register holding the multiplicand.

***Hm***

Is the 16-bit name of the second SIMD and FP source register holding the multiplier.

***Ha***

Is the 16-bit name of the third SIMD and FP source register holding the addend.

***Sd***

Is the 32-bit name of the SIMD and FP destination register.

***Sn***

Is the 32-bit name of the first SIMD and FP source register holding the multiplicand.

***Sm***

Is the 32-bit name of the second SIMD and FP source register holding the multiplier.

***Sa***

Is the 32-bit name of the third SIMD and FP source register holding the addend.

***Dd***

Is the 64-bit name of the SIMD and FP destination register.

***Dn***

Is the 64-bit name of the first SIMD and FP source register holding the multiplicand.

***Dm***

Is the 64-bit name of the second SIMD and FP source register holding the multiplier.

***Da***

Is the 64-bit name of the third SIMD and FP source register holding the addend.

## Operation

Floating-point fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD and FP source registers, adds the product to the value of the third SIMD and FP source register, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$$V_d = V_a + V_n \cdot V_m.$$

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.25 FMAX (scalar) (A64)

Floating-point Maximum (scalar).

### Syntax

`FMAX Hd, Hn, Hm ; Half-precision`

`FMAX Sd, Sn, Sm ; Single-precision`

`FMAX Dd, Dn, Dm ; Double-precision`

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the first SIMD and FP source register.

***Hm***

Is the 16-bit name of the second SIMD and FP source register.

***Sd***

Is the 32-bit name of the SIMD and FP destination register.

***Sn***

Is the 32-bit name of the first SIMD and FP source register.

***S<sub>m</sub>***

Is the 32-bit name of the second SIMD and FP source register.

***D<sub>d</sub>***

Is the 64-bit name of the SIMD and FP destination register.

***D<sub>n</sub>***

Is the 64-bit name of the first SIMD and FP source register.

***D<sub>m</sub>***

Is the 64-bit name of the second SIMD and FP source register.

## Operation

Floating-point Maximum (scalar). This instruction compares the two source SIMD and FP registers, and writes the larger of the two floating-point values to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$V_d = \max(V_n, V_m)$ .

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.26 FMAXNM (scalar) (A64)

Floating-point Maximum Number (scalar).

### Syntax

FMAXNM *Hd, Hn, Hm ; Half-precision*

FMAXNM *Sd, Sn, Sm ; Single-precision*

FMAXNM *Dd, Dn, Dm ; Double-precision*

Where:

***H<sub>d</sub>***

Is the 16-bit name of the SIMD and FP destination register.

***H<sub>n</sub>***

Is the 16-bit name of the first SIMD and FP source register.

**Hm**

Is the 16-bit name of the second SIMD and FP source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the first SIMD and FP source register.

**Sm**

Is the 32-bit name of the second SIMD and FP source register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the first SIMD and FP source register.

**Dm**

Is the 64-bit name of the second SIMD and FP source register.

## Operation

Floating-point Maximum Number (scalar). This instruction compares the first and second source SIMD and FP register values, and writes the larger of the two floating-point values to the destination SIMD and FP register.

NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result that is placed in the vector is the numerical value, otherwise the result is identical to *FMAX* (scalar).

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$V_d = \maxNum(V_n, V_m)$ .

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.27 FMIN (scalar) (A64)

Floating-point Minimum (scalar).

### Syntax

`FMIN Hd, Hn, Hm ; Half-precision`

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

**FMIN Sd, Sn, Sm ; Single-precision**

**FMIN Dd, Dn, Dm ; Double-precision**

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register.

**Hm**

Is the 16-bit name of the second SIMD and FP source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the first SIMD and FP source register.

**Sm**

Is the 32-bit name of the second SIMD and FP source register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the first SIMD and FP source register.

**Dm**

Is the 64-bit name of the second SIMD and FP source register.

## Operation

Floating-point Minimum (scalar). This instruction compares the first and second source SIMD and FP register values, and writes the smaller of the two floating-point values to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$V_d = \min(V_n, V_m)$ .

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.28 FMINNM (scalar) (A64)

Floating-point Minimum Number (scalar).

### Syntax

`FMINNM Hd, Hn, Hm ; Half-precision`

`FMINNM Sd, Sn, Sm ; Single-precision`

`FMINNM Dd, Dn, Dm ; Double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register.

**Hm**

Is the 16-bit name of the second SIMD and FP source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the first SIMD and FP source register.

**Sm**

Is the 32-bit name of the second SIMD and FP source register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the first SIMD and FP source register.

**Dm**

Is the 64-bit name of the second SIMD and FP source register.

### Operation

Floating-point Minimum Number (scalar). This instruction compares the first and second source SIMD and FP register values, and writes the smaller of the two floating-point values to the destination SIMD and FP register.

Nans are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result that is placed in the vector is the numerical value, otherwise the result is identical to `FMIN (scalar)`.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated.

For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$V_d = \min(\nu_n, \nu_m)$ .

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.29 FMOV (register) (A64)

Floating-point Move register without conversion.

### Syntax

FMOV *Hd*, *Hn* ; Half-precision

FMOV *Sd*, *Sn* ; Single-precision

FMOV *Dd*, *Dn* ; Double-precision

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***Sd***

Is the 32-bit name of the SIMD and FP destination register.

***Sn***

Is the 32-bit name of the SIMD and FP source register.

***Dd***

Is the 64-bit name of the SIMD and FP destination register.

***Dn***

Is the 64-bit name of the SIMD and FP source register.

### Operation

Floating-point Move register without conversion. This instruction copies the floating-point value in the SIMD and FP source register to the SIMD and FP destination register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$Vd = Vn$ .

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.30 FMOV (general) (A64)

Floating-point Move to or from general-purpose register without conversion.

### Syntax

```
FMOV Wd, Hn ; Half-precision to 32-bit
FMOV Xd, Hn ; Half-precision to 64-bit
FMOV Hd, Wn ; 32-bit to half-precision
FMOV Sd, Wn ; 32-bit to single-precision
FMOV Wd, Sn ; Single-precision to 32-bit
FMOV Hd, Xn ; 64-bit to half-precision
FMOV Dd, Xn ; 64-bit to double-precision
FMOV Vd.D[1], Xn ; 64-bit to top half of 128-bit
FMOV Xd, Dn ; Double-precision to 64-bit
FMOV Xd, Vn.D[1] ; Top half of 128-bit to 64-bit
```

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Wn**

Is the 32-bit name of the general-purpose source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

***Sn***

Is the 32-bit name of the SIMD and FP source register.

***Xn***

Is the 64-bit name of the general-purpose source register.

***Dd***

Is the 64-bit name of the SIMD and FP destination register.

***Vd***

Is the name of the SIMD and FP destination register.

***Dn***

Is the 64-bit name of the SIMD and FP source register.

***Vn***

Is the name of the SIMD and FP source register.

**Usage**

Floating-point Move to or from general-purpose register without conversion. This instruction transfers the contents of a SIMD and FP register to a general-purpose register, or the contents of a general-purpose register to a SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.31 FMOV (scalar, immediate) (A64)

Floating-point move immediate (scalar).

**Syntax**

`FMOV Hd, #imm ; Half-precision`

`FMOV Sd, #imm ; Single-precision`

`FMOV Dd, #imm ; Double-precision`

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Sd***

Is the 32-bit name of the SIMD and FP destination register.

***Dd***

Is the 64-bit name of the SIMD and FP destination register.

***imm***

Is a signed floating-point constant with 3-bit exponent and normalized 4 bits of precision. For details of the range of constants available and the encoding of *imm*, see Modified immediate constants in A64 floating-point instructions in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Usage**

Floating-point move immediate (scalar). This instruction copies a floating-point immediate constant into the SIMD and FP destination register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

Vd=# *imm*.

**Related information**

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.32 FMSUB (A64)

Floating-point Fused Multiply-Subtract (scalar).

**Syntax**

FMSUB *Hd*, *Hn*, *Hm*, *Ha* ; Half-precision

FMSUB *Sd*, *Sn*, *Sm*, *Sa* ; Single-precision

FMSUB *Dd*, *Dn*, *Dm*, *Da* ; Double-precision

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the first SIMD and FP source register holding the multiplicand.

***Hm***

Is the 16-bit name of the second SIMD and FP source register holding the multiplier.

***Ha***

Is the 16-bit name of the third SIMD and FP source register holding the minuend.

***Sd***

Is the 32-bit name of the SIMD and FP destination register.

***Sn***

Is the 32-bit name of the first SIMD and FP source register holding the multiplicand.

***S<sub>m</sub>***

Is the 32-bit name of the second SIMD and FP source register holding the multiplier.

***S<sub>a</sub>***

Is the 32-bit name of the third SIMD and FP source register holding the minuend.

***D<sub>d</sub>***

Is the 64-bit name of the SIMD and FP destination register.

***D<sub>n</sub>***

Is the 64-bit name of the first SIMD and FP source register holding the multiplicand.

***D<sub>m</sub>***

Is the 64-bit name of the second SIMD and FP source register holding the multiplier.

***D<sub>a</sub>***

Is the 64-bit name of the third SIMD and FP source register holding the minuend.

## Operation

Floating-point Fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two SIMD and FP source registers, negates the product, adds that to the value of the third SIMD and FP source register, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$$V_d = V_a + (-V_n) * V_m.$$

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.33 FMUL (scalar) (A64)

Floating-point Multiply (scalar).

### Syntax

`FMUL Hd, Hn, Hm ; Half-precision`

`FMUL Sd, Sn, Sm ; Single-precision`

`FMUL Dd, Dn, Dm ; Double-precision`

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the first SIMD and FP source register.

***Hm***

Is the 16-bit name of the second SIMD and FP source register.

***Sd***

Is the 32-bit name of the SIMD and FP destination register.

***Sn***

Is the 32-bit name of the first SIMD and FP source register.

***Sm***

Is the 32-bit name of the second SIMD and FP source register.

***Dd***

Is the 64-bit name of the SIMD and FP destination register.

***Dn***

Is the 64-bit name of the first SIMD and FP source register.

***Dm***

Is the 64-bit name of the second SIMD and FP source register.

## Operation

Floating-point Multiply (scalar). This instruction multiplies the floating-point values of the two source SIMD and FP registers, and writes the result to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$$V_d = V_n * V_m.$$

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.34 FNEG (scalar) (A64)

Floating-point Negate (scalar).

### Syntax

`FNEG Hd, Hn ; Half-precision`

`FNEG Sd, Sn ; Single-precision`

`FNEG Dd, Dn ; Double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the SIMD and FP source register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the SIMD and FP source register.

### Operation

Floating-point Negate (scalar). This instruction negates the value in the SIMD and FP source register and writes the result to the SIMD and FP destination register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$Vd = -Vn.$

### Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.35 FNMADD (A64)

Floating-point Negated fused Multiply-Add (scalar).

### Syntax

`FNMADD Hd, Hn, Hm, Ha ; Half-precision`

`FNMADD Sd, Sn, Sm, Sa ; Single-precision`

`FNMADD Dd, Dn, Dm, Da ; Double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register holding the multiplicand.

**Hm**

Is the 16-bit name of the second SIMD and FP source register holding the multiplier.

**Ha**

Is the 16-bit name of the third SIMD and FP source register holding the addend.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the first SIMD and FP source register holding the multiplicand.

**Sm**

Is the 32-bit name of the second SIMD and FP source register holding the multiplier.

**Sa**

Is the 32-bit name of the third SIMD and FP source register holding the addend.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the first SIMD and FP source register holding the multiplicand.

**Dm**

Is the 64-bit name of the second SIMD and FP source register holding the multiplier.

**Da**

Is the 64-bit name of the third SIMD and FP source register holding the addend.

### Operation

Floating-point Negated fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD and FP source registers, negates the product, subtracts the value of the third SIMD and FP source register, and writes the result to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$$Vd = (-Va) + (-Vn) * Vm.$$

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.36 FNMSUB (A64)

Floating-point Negated fused Multiply-Subtract (scalar).

### Syntax

`FNMSUB Hd, Hn, Hm, Ha ; Half-precision`

`FNMSUB Sd, Sn, Sm, Sa ; Single-precision`

`FNMSUB Dd, Dn, Dm, Da ; Double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register holding the multiplicand.

**Hm**

Is the 16-bit name of the second SIMD and FP source register holding the multiplier.

**Ha**

Is the 16-bit name of the third SIMD and FP source register holding the minuend.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the first SIMD and FP source register holding the multiplicand.

**Sm**

Is the 32-bit name of the second SIMD and FP source register holding the multiplier.

**Sa**

Is the 32-bit name of the third SIMD and FP source register holding the minuend.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the first SIMD and FP source register holding the multiplicand.

**Dm**

Is the 64-bit name of the second SIMD and FP source register holding the multiplier.

**Da**

Is the 64-bit name of the third SIMD and FP source register holding the minuend.

## Operation

Floating-point Negated fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two SIMD and FP source registers, subtracts the value of the third SIMD and FP source register, and writes the result to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$$V_d = (-V_a) + V_n * V_m.$$

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.37 FNMUL (scalar) (A64)

Floating-point Multiply-Negate (scalar).

### Syntax

`FNMUL Hd, Hn, Hm ; Half-precision`

`FNMUL Sd, Sn, Sm ; Single-precision`

`FNMUL Dd, Dn, Dm ; Double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register.

**Hm**

Is the 16-bit name of the second SIMD and FP source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the first SIMD and FP source register.

**Sm**

Is the 32-bit name of the second SIMD and FP source register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the first SIMD and FP source register.

**Dm**

Is the 64-bit name of the second SIMD and FP source register.

## Operation

Floating-point Multiply-Negate (scalar). This instruction multiplies the floating-point values of the two source SIMD and FP registers, and writes the negation of the result to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$$V_d = - (V_n * V_m).$$

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.38 FRINTA (scalar) (A64)

Floating-point Round to Integral, to nearest with ties to Away (scalar).

### Syntax

FRINTA *Hd*, *Hn* ; Half-precision

FRINTA *Sd*, *Sn* ; Single-precision

FRINTA *Dd*, *Dn* ; Double-precision

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***Sd***

Is the 32-bit name of the SIMD and FP destination register.

***Sn***

Is the 32-bit name of the SIMD and FP source register.

***Dd***

Is the 64-bit name of the SIMD and FP destination register.

***Dn***

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Round to Integral, to nearest with ties to Away (scalar). This instruction rounds a floating-point value in the SIMD and FP source register to an integral floating-point value of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD and FP destination register.

A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

```
Vd = roundToIntegralTiesToAway(Vn).
```

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.39 FRINTI (scalar) (A64)

Floating-point Round to Integral, using current rounding mode (scalar).

### Syntax

```
FRINTI Hd, Hn ; Half-precision
```

**FRINTI Sd, Sn ; Single-precision**

**FRINTI Dd, Dn ; Double-precision**

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the SIMD and FP source register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Round to Integral, using current rounding mode (scalar). This instruction rounds a floating-point value in the SIMD and FP source register to an integral floating-point value of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD and FP destination register.

A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

Vd = roundToIntegral(Vn).

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.40 FRINTM (scalar) (A64)

Floating-point Round to Integral, toward Minus infinity (scalar).

### Syntax

`FRINTM Hd, Hn ; Half-precision`

`FRINTM Sd, Sn ; Single-precision`

`FRINTM Dd, Dn ; Double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the SIMD and FP source register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the SIMD and FP source register.

### Operation

Floating-point Round to Integral, toward Minus infinity (scalar). This instruction rounds a floating-point value in the SIMD and FP source register to an integral floating-point value of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD and FP destination register.

A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Vd = roundToIntegralTowardNegative(Vn).`

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.41 FRINTN (scalar) (A64)

Floating-point Round to Integral, to nearest with ties to even (scalar).

### Syntax

FRINTN *Hd*, *Hn* ; Half-precision

FRINTN *Sd*, *Sn* ; Single-precision

FRINTN *Dd*, *Dn* ; Double-precision

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***Sd***

Is the 32-bit name of the SIMD and FP destination register.

***Sn***

Is the 32-bit name of the SIMD and FP source register.

***Dd***

Is the 64-bit name of the SIMD and FP destination register.

***Dn***

Is the 64-bit name of the SIMD and FP source register.

### Operation

Floating-point Round to Integral, to nearest with ties to even (scalar). This instruction rounds a floating-point value in the SIMD and FP source register to an integral floating-point value of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD and FP destination register.

A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$Vd = \text{roundToIntegralTiesToEven}(Vn)$ .

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.42 FRINTP (scalar) (A64)

Floating-point Round to Integral, toward Plus infinity (scalar).

### Syntax

`FRINTP Hd, Hn ; Half-precision`

`FRINTP Sd, Sn ; Single-precision`

`FRINTP Dd, Dn ; Double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the SIMD and FP source register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the SIMD and FP source register.

### Operation

Floating-point Round to Integral, toward Plus infinity (scalar). This instruction rounds a floating-point value in the SIMD and FP source register to an integral floating-point value of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD and FP destination register.

A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Vd = roundToIntegralTowardPositive(Vn).`

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.43 FRINTX (scalar) (A64)

Floating-point Round to Integral exact, using current rounding mode (scalar).

### Syntax

`FRINTX Hd, Hn ; Half-precision`

`FRINTX Sd, Sn ; Single-precision`

`FRINTX Dd, Dn ; Double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the SIMD and FP source register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the SIMD and FP source register.

### Operation

Floating-point Round to Integral exact, using current rounding mode (scalar). This instruction rounds a floating-point value in the SIMD and FP source register to an integral floating-point value of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD and FP destination register.

An Inexact exception is raised when the result value is not numerically equal to the input value. A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

```
Vd = roundToIntegralExact(Vn).
```

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.44 FRINTZ (scalar) (A64)

Floating-point Round to Integral, toward Zero (scalar).

### Syntax

FRINTZ *Hd*, *Hn* ; Half-precision

FRINTZ *Sd*, *Sn* ; Single-precision

FRINTZ *Dd*, *Dn* ; Double-precision

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***Sd***

Is the 32-bit name of the SIMD and FP destination register.

***Sn***

Is the 32-bit name of the SIMD and FP source register.

***Dd***

Is the 64-bit name of the SIMD and FP destination register.

***Dn***

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Round to Integral, toward Zero (scalar). This instruction rounds a floating-point value in the SIMD and FP source register to an integral floating-point value of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD and FP destination register.

A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Vd = roundToIntegralTowardZero(Vn).`

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.45 FSQRT (scalar) (A64)

Floating-point Square Root (scalar).

### Syntax

`FSQRT Hd, Hn ; Half-precision`

`FSQRT Sd, Sn ; Single-precision`

`FSQRT Dd, Dn ; Double-precision`

Where:

#### **Hd**

Is the 16-bit name of the SIMD and FP destination register.

#### **Hn**

Is the 16-bit name of the SIMD and FP source register.

#### **Sd**

Is the 32-bit name of the SIMD and FP destination register.

#### **Sn**

Is the 32-bit name of the SIMD and FP source register.

#### **Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Dn**

Is the 64-bit name of the SIMD and FP source register.

## Operation

Floating-point Square Root (scalar). This instruction calculates the square root of the value in the SIMD and FP source register and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Vd = sqrt(Vn).`

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.46 FSUB (scalar) (A64)

Floating-point Subtract (scalar).

### Syntax

`FSUB Hd, Hn, Hm ; Half-precision`

`FSUB Sd, Sn, Sm ; Single-precision`

`FSUB Dd, Dn, Dm ; Double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register.

**Hm**

Is the 16-bit name of the second SIMD and FP source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Sn**

Is the 32-bit name of the first SIMD and FP source register.

***S<sub>m</sub>***

Is the 32-bit name of the second SIMD and FP source register.

***D<sub>d</sub>***

Is the 64-bit name of the SIMD and FP destination register.

***D<sub>n</sub>***

Is the 64-bit name of the first SIMD and FP source register.

***D<sub>m</sub>***

Is the 64-bit name of the second SIMD and FP source register.

## Operation

Floating-point Subtract (scalar). This instruction subtracts the floating-point value of the second source SIMD and FP register from the floating-point value of the first source SIMD and FP register, and writes the result to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

$$V_d = V_n - V_m.$$

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.47 LDNP (SIMD and FP) (A64)

Load Pair of SIMD and FP registers, with Non-temporal hint.

### Syntax

```
LDNP {St1}, {St2}, [Xn|SP{}, #imm] ; 32-bit FP/SIMD registers, Signed offset
LDNP {Dt1}, {Dt2}, [Xn|SP{}, #imm] ; 64-bit FP/SIMD registers, Signed offset
LDNP {Qt1}, {Qt2}, [Xn|SP{}, #imm] ; 128-bit FP/SIMD registers, Signed offset
```

Where:

***st1***

Is the 32-bit name of the first SIMD and FP register to be transferred.

***st2***

Is the 32-bit name of the second SIMD and FP register to be transferred.

***imm***

Depends on the instruction variant:

**32-bit FP/SIMD registers** Is the optional signed immediate byte offset, a multiple of 4 in the range -256 to 252, defaulting to 0.

**64-bit FP/SIMD registers** Is the optional signed immediate byte offset, a multiple of 8 in the range -512 to 504, defaulting to 0.

**128-bit FP/SIMD registers** Is the optional signed immediate byte offset, a multiple of 16 in the range -1024 to 1008, defaulting to 0.

***Dt1***

Is the 64-bit name of the first SIMD and FP register to be transferred.

***Dt2***

Is the 64-bit name of the second SIMD and FP register to be transferred.

***Qt1***

Is the 128-bit name of the first SIMD and FP register to be transferred.

***Qt2***

Is the 128-bit name of the second SIMD and FP register to be transferred.

***Xn|Sp***

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Load Pair of SIMD and FP registers, with Non-temporal hint. This instruction loads a pair of SIMD and FP registers from memory, issuing a hint to the memory system that the access is non-temporal. The address that is used for the load is calculated from a base register value and an optional immediate offset.

For information about non-temporal pair instructions, see *Load/Store SIMD and Floating-point Non-temporal pair* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#), and particularly LDNP (SIMD and FP).

---

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 19.48 LDP (SIMD and FP) (A64)

Load Pair of SIMD and FP registers.

### Syntax

```
LDP St1, St2, [Xn|SP], #imm ; 32-bit FP/SIMD registers, Post-index
LDP Dt1, Dt2, [Xn|SP], #imm ; 64-bit FP/SIMD registers, Post-index
LDP Qt1, Qt2, [Xn|SP], #imm ; 128-bit FP/SIMD registers, Post-index
LDP St1, St2, [Xn|SP, #imm]! ; 32-bit FP/SIMD registers, Pre-index
LDP Dt1, Dt2, [Xn|SP, #imm]! ; 64-bit FP/SIMD registers, Pre-index
LDP Qt1, Qt2, [Xn|SP, #imm]! ; 128-bit FP/SIMD registers, Pre-index
LDP {St1}, {St2}, [Xn|SP{, #imm}] ; 32-bit FP/SIMD registers, Signed offset
LDP {Dt1}, {Dt2}, [Xn|SP{, #imm}] ; 64-bit FP/SIMD registers, Signed offset
LDP {Qt1}, {Qt2}, [Xn|SP{, #imm}] ; 128-bit FP/SIMD registers, Signed offset
```

Where:

**St1**

Is the 32-bit name of the first SIMD and FP register to be transferred.

**St2**

Is the 32-bit name of the second SIMD and FP register to be transferred.

**imm**

Depends on the instruction variant:

**32-bit FP/SIMD registers** Is the signed immediate byte offset, a multiple of 4 in the range -256 to 252.

**64-bit FP/SIMD registers** Is the signed immediate byte offset, a multiple of 8 in the range -512 to 504.

**128-bit FP/SIMD registers** Is the signed immediate byte offset, a multiple of 16 in the range -1024 to 1008.

**Dt1**

Is the 64-bit name of the first SIMD and FP register to be transferred.

**Dt2**

Is the 64-bit name of the second SIMD and FP register to be transferred.

**Qt1**

Is the 128-bit name of the first SIMD and FP register to be transferred.

**Qt2**

Is the 128-bit name of the second SIMD and FP register to be transferred.

**<Xn>|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load Pair of SIMD and FP registers. This instruction loads a pair of SIMD and FP registers from memory. The address that is used for the load is calculated from a base register value and an optional immediate offset.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.



For information about the **CONstrained UNPREDICTABLE** behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors* in the [Arm Architecture Reference Manual Armv8](#), for [Armv8-A architecture profile](#), and particularly LDP (SIMD and FP).

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 19.49 LDR (immediate, SIMD and FP) (A64)

Load SIMD and FP Register (immediate offset).

**Syntax**

```
LDR <Bt>, [Xn|SP], #simm ; 8-bit FP/SIMD registers, Post-index
```

```
LDR Ht, [Xn|SP], #simm ; 16-bit FP/SIMD registers, Post-index
```

```
LDR St, [Xn|SP], #simm ; 32-bit FP/SIMD registers, Post-index
```

```
LDR Dt, [Xn|SP], #simm ; 64-bit FP/SIMD registers, Post-index
```

```
LDR Qt, [Xn|SP], #simm ; 128-bit FP/SIMD registers, Post-index
```

```
LDR <Bt>, [Xn|SP, #simm]! ; 8-bit FP/SIMD registers, Pre-index
```

```
LDR Ht, [Xn|SP, #simm]! ; 16-bit FP/SIMD registers, Pre-index
```

```
LDR St, [Xn|SP, #simm]! ; 32-bit FP/SIMD registers, Pre-index
```

```
LDR Dt, [Xn|SP, #simm]! ; 64-bit FP/SIMD registers, Pre-index
```

```
LDR Qt, [Xn|SP, #simm]! ; 128-bit FP/SIMD registers, Pre-index
```

```
LDR <Bt>, [Xn|SP{, #pimm}] ; 8-bit FP/SIMD registers
LDR {Ht}, [Xn|SP{, #pimm}] ; 16-bit FP/SIMD registers
LDR {St}, [Xn|SP{, #pimm}] ; 32-bit FP/SIMD registers
LDR {Dt}, [Xn|SP{, #pimm}] ; 64-bit FP/SIMD registers
LDR {Qt}, [Xn|SP{, #pimm}] ; 128-bit FP/SIMD registers
```

Where:

**<Bt>**

Is the 8-bit name of the SIMD and FP register to be transferred.

**simm**

Is the signed immediate byte offset, in the range -256 to 255.

**Ht**

Is the 16-bit name of the SIMD and FP register to be transferred.

**St**

Is the 32-bit name of the SIMD and FP register to be transferred.

**Dt**

Is the 64-bit name of the SIMD and FP register to be transferred.

**Qt**

Is the 128-bit name of the SIMD and FP register to be transferred.

**pimm**

Depends on the instruction variant:

**8-bit FP/SIMD registers** Is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to 0.

**16-bit FP/SIMD registers** Is the optional positive immediate byte offset, a multiple of 2 in the range 0 to 8190, defaulting to 0.

**32-bit FP/SIMD registers** Is the optional positive immediate byte offset, a multiple of 4 in the range 0 to 16380, defaulting to 0.

**64-bit FP/SIMD registers** Is the optional positive immediate byte offset, a multiple of 8 in the range 0 to 32760, defaulting to 0.

**128-bit FP/SIMD registers** Is the optional positive immediate byte offset, a multiple of 16 in the range 0 to 65520, defaulting to 0.

**<Xn>|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Load SIMD and FP Register (immediate offset). This instruction loads an element from memory, and writes the result as a scalar to the SIMD and FP register. The address that is used for the load is calculated from a base register value, a signed immediate offset, and an optional offset that is a multiple of the element size.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 19.50 LDR (literal, SIMD and FP) (A64)

Load SIMD and FP Register (PC-relative literal).

### Syntax

`LDR St, label ; 32-bit FP/SIMD registers`

`LDR Dt, label ; 64-bit FP/SIMD registers`

`LDR Qt, label ; 128-bit FP/SIMD registers`

Where:

**St**

Is the 32-bit name of the SIMD and FP register to be loaded.

**Dt**

Is the 64-bit name of the SIMD and FP register to be loaded.

**Qt**

Is the 128-bit name of the SIMD and FP register to be loaded.

**label**

Is the program label from which the data is to be loaded. Its offset from the address of this instruction, in the range  $\pm 1\text{MB}$ .

## Usage

Load SIMD and FP Register (PC-relative literal). This instruction loads a SIMD and FP register from memory. The address that is used for the load is calculated from the PC value and an immediate offset.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 19.51 LDR (register, SIMD and FP) (A64)

Load SIMD and FP Register (register offset).

### Syntax

```
LDR <Bt>, [Xn|SP, (Wm|Xm), extend {amount}] ; 8-bit FP/SIMD registers
LDR <Bt>, [Xn|SP, Xm{, LSL amount}] ; 8-bit FP/SIMD registers
LDR Ht, [Xn|SP, (Wm|Xm){, extend {amount}}] ; 16-bit FP/SIMD registers
LDR St, [Xn|SP, (Wm|Xm){, extend {amount}}] ; 32-bit FP/SIMD registers
LDR Dt, [Xn|SP, (Wm|Xm){, extend {amount}}] ; 64-bit FP/SIMD registers
LDR Qt, [Xn|SP, (Wm|Xm){, extend {amount}}] ; 128-bit FP/SIMD registers
```

Where:

**<Bt>**

Is the 8-bit name of the SIMD and FP register to be transferred.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Wm**

When "option<0>" is set to 0, is the 32-bit name of the general-purpose index register.

**Xm**

When "option<0>" is set to 1, is the 64-bit name of the general-purpose index register.

**extend**

Is the index extend specifier:

**8-bit FP/SIMD registers**

Can be one of UXTW, SXTW or SXTX.

**16-bit FP/SIMD registers**

Can be one of UXTW, LSL, SXTW or SXTX.

**amount**

Is the index shift amount, it must be.

**Ht**

Is the 16-bit name of the SIMD and FP register to be transferred.

**St**

Is the 32-bit name of the SIMD and FP register to be transferred.

**Dt**

Is the 64-bit name of the SIMD and FP register to be transferred.

***Qt***

Is the 128-bit name of the SIMD and FP register to be transferred.

**Usage**

Load SIMD and FP Register (register offset). This instruction loads a SIMD and FP register from memory. The address that is used for the load is calculated from a base register value and an offset register value. The offset can be optionally shifted and extended.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 19.52 LDUR (SIMD and FP) (A64)

Load SIMD and FP Register (unscaled offset).

**Syntax**

```
LDUR <Bt>, [Xn|SP{, #simm}] ; 8-bit FP/SIMD registers
```

```
LDUR Ht, [Xn|SP{, #simm}] ; 16-bit FP/SIMD registers
```

```
LDUR St, [Xn|SP{, #simm}] ; 32-bit FP/SIMD registers
```

```
LDUR Dt, [Xn|SP{, #simm}] ; 64-bit FP/SIMD registers
```

```
LDUR Qt, [Xn|SP{, #simm}] ; 128-bit FP/SIMD registers
```

Where:

**<Bt>**

Is the 8-bit name of the SIMD and FP register to be transferred.

**Ht**

Is the 16-bit name of the SIMD and FP register to be transferred.

**St**

Is the 32-bit name of the SIMD and FP register to be transferred.

**Dt**

Is the 64-bit name of the SIMD and FP register to be transferred.

**Qt**

Is the 128-bit name of the SIMD and FP register to be transferred.

**<Xn>|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

***simm***

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

**Usage**

Load SIMD and FP Register (unscaled offset). This instruction loads a SIMD and FP register from memory. The address that is used for the load is calculated from a base register value and an optional immediate offset.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 19.53 SCVTF (scalar, fixed-point) (A64)

Signed fixed-point Convert to Floating-point (scalar).

**Syntax**

```
SCVTF Hd, Wn, #fbits ; 32-bit to half-precision
SCVTF Sd, Wn, #fbits ; 32-bit to single-precision
SCVTF Dd, Wn, #fbits ; 32-bit to double-precision
SCVTF Hd, Xn, #fbits ; 64-bit to half-precision
SCVTF Sd, Xn, #fbits ; 64-bit to single-precision
SCVTF Dd, Xn, #fbits ; 64-bit to double-precision
```

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Wn***

Is the 32-bit name of the general-purpose source register.

***fbits***

Depends on the instruction variant:

**32-bit** Is the number of bits after the binary point in the fixed-point source, in the range 1 to 32.

**64-bit** Is the number of bits after the binary point in the fixed-point source, in the range 1 to 64.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Xn**

Is the 64-bit name of the general-purpose source register.

## Operation

Signed fixed-point Convert to Floating-point (scalar). This instruction converts the signed value in the 32-bit or 64-bit general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

`Vd = signed_convertFromInt(Rn / (2^fbits))`, where R is either w or x.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.54 SCVTF (scalar, integer) (A64)

Signed integer Convert to Floating-point (scalar).

### Syntax

`SCVTF Hd, Wn ; 32-bit to half-precision`

`SCVTF Sd, Wn ; 32-bit to single-precision`

`SCVTF Dd, Wn ; 32-bit to double-precision`

`SCVTF Hd, Xn ; 64-bit to half-precision`

`SCVTF Sd, Xn ; 64-bit to single-precision`

`SCVTF Dd, Xn ; 64-bit to double-precision`

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Wn***

Is the 32-bit name of the general-purpose source register.

***Sd***

Is the 32-bit name of the SIMD and FP destination register.

***Dd***

Is the 64-bit name of the SIMD and FP destination register.

***Xn***

Is the 64-bit name of the general-purpose source register.

## Operation

Signed integer Convert to Floating-point (scalar). This instruction converts the signed integer value in the general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Vd = signed_convertFromInt(Rn)`, where `R` is either `w` or `x`.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.55 STNP (SIMD and FP) (A64)

Store Pair of SIMD and FP registers, with Non-temporal hint.

### Syntax

```
STNP {St1}, {St2}, [Xn|SP{, #imm}] ; 32-bit FP/SIMD registers, Signed offset
STNP {Dt1}, {Dt2}, [Xn|SP{, #imm}] ; 64-bit FP/SIMD registers, Signed offset
STNP {Qt1}, {Qt2}, [Xn|SP{, #imm}] ; 128-bit FP/SIMD registers, Signed offset
```

Where:

***st1***

Is the 32-bit name of the first SIMD and FP register to be transferred.

**st2**

Is the 32-bit name of the second SIMD and FP register to be transferred.

**imm**

Depends on the instruction variant:

**32-bit FP/SIMD registers** Is the optional signed immediate byte offset, a multiple of 4 in the range -256 to 252, defaulting to 0.

**64-bit FP/SIMD registers** Is the optional signed immediate byte offset, a multiple of 8 in the range -512 to 504, defaulting to 0.

**128-bit FP/SIMD registers** Is the optional signed immediate byte offset, a multiple of 16 in the range -1024 to 1008, defaulting to 0.

**Dt1**

Is the 64-bit name of the first SIMD and FP register to be transferred.

**Dt2**

Is the 64-bit name of the second SIMD and FP register to be transferred.

**Qt1**

Is the 128-bit name of the first SIMD and FP register to be transferred.

**Qt2**

Is the 128-bit name of the second SIMD and FP register to be transferred.

**<Xn>|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Store Pair of SIMD and FP registers, with Non-temporal hint. This instruction stores a pair of SIMD and FP registers to memory, issuing a hint to the memory system that the access is non-temporal. The address used for the store is calculated from an address from a base register value and an immediate offset. For information about non-temporal pair instructions, see *Load/Store SIMD and Floating-point Non-temporal pair* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 19.56 STP (SIMD and FP) (A64)

Store Pair of SIMD and FP registers.

### Syntax

```
STP St1, St2, [Xn|SP], #imm ; 32-bit FP/SIMD registers, Post-index
STP Dt1, Dt2, [Xn|SP], #imm ; 64-bit FP/SIMD registers, Post-index
STP Qt1, Qt2, [Xn|SP], #imm ; 128-bit FP/SIMD registers, Post-index
STP St1, St2, [Xn|SP, #imm]! ; 32-bit FP/SIMD registers, Pre-index
STP Dt1, Dt2, [Xn|SP, #imm]! ; 64-bit FP/SIMD registers, Pre-index
STP Qt1, Qt2, [Xn|SP, #imm]! ; 128-bit FP/SIMD registers, Pre-index
STP St1, St2, [Xn|SP{, #imm}] ; 32-bit FP/SIMD registers, Signed offset
STP Dt1, Dt2, [Xn|SP{, #imm}] ; 64-bit FP/SIMD registers, Signed offset
STP Qt1, Qt2, [Xn|SP{, #imm}] ; 128-bit FP/SIMD registers, Signed offset
```

Where:

#### **St1**

Is the 32-bit name of the first SIMD and FP register to be transferred.

#### **St2**

Is the 32-bit name of the second SIMD and FP register to be transferred.

#### **imm**

Depends on the instruction variant:

#### **32-bit FP/SIMD registers**

Is the signed immediate byte offset, a multiple of 4 in the range -256 to 252.

#### **64-bit FP/SIMD registers**

Is the signed immediate byte offset, a multiple of 8 in the range -512 to 504.

#### **128-bit FP/SIMD registers**

Is the signed immediate byte offset, a multiple of 16 in the range -1024 to 1008.

#### **Dt1**

Is the 64-bit name of the first SIMD and FP register to be transferred.

#### **Dt2**

Is the 64-bit name of the second SIMD and FP register to be transferred.

#### **Qt1**

Is the 128-bit name of the first SIMD and FP register to be transferred.

**Qt2**

Is the 128-bit name of the second SIMD and FP register to be transferred.

**<Xn>|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Store Pair of SIMD and FP registers. This instruction stores a pair of SIMD and FP registers to memory. The address used for the store is calculated from a base register value and an immediate offset.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 19.57 STR (immediate, SIMD and FP) (A64)

Store SIMD and FP register (immediate offset).

**Syntax**

```
STR <Bt>, [Xn|SP], #simm ; 8-bit FP/SIMD registers, Post-index
```

```
STR Ht, [Xn|SP], #simm ; 16-bit FP/SIMD registers, Post-index
```

```
STR St, [Xn|SP], #simm ; 32-bit FP/SIMD registers, Post-index
```

```
STR Dt, [Xn|SP], #simm ; 64-bit FP/SIMD registers, Post-index
```

```
STR Qt, [Xn|SP], #simm ; 128-bit FP/SIMD registers, Post-index
```

```
STR <Bt>, [Xn|SP, #simm]! ; 8-bit FP/SIMD registers, Pre-index
```

```
STR Ht, [Xn|SP, #simm]! ; 16-bit FP/SIMD registers, Pre-index
```

```
STR St, [Xn|SP, #simm]! ; 32-bit FP/SIMD registers, Pre-index
```

```
STR Dt, [Xn|SP, #simm]! ; 64-bit FP/SIMD registers, Pre-index
```

```
STR Qt, [Xn|SP, #simm]! ; 128-bit FP/SIMD registers, Pre-index
```

```
STR <Bt>, [Xn|SP{, #pimm}] ; 8-bit FP/SIMD registers
```

```
STR Ht, [Xn|SP{, #pimm}] ; 16-bit FP/SIMD registers
```

```
STR St, [Xn|SP{, #pimm}] ; 32-bit FP/SIMD registers
```

```
STR Dt, [Xn|SP{, #pimm}] ; 64-bit FP/SIMD registers
```

```
STR Qt, [Xn|SP{, #pimm}] ; 128-bit FP/SIMD registers
```

Where:

**<Bt>**

Is the 8-bit name of the SIMD and FP register to be transferred.

**simm**

Is the signed immediate byte offset, in the range -256 to 255.

**Ht**

Is the 16-bit name of the SIMD and FP register to be transferred.

**St**

Is the 32-bit name of the SIMD and FP register to be transferred.

**Dt**

Is the 64-bit name of the SIMD and FP register to be transferred.

**Qt**

Is the 128-bit name of the SIMD and FP register to be transferred.

**pimm**

Depends on the instruction variant:

**8-bit FP/SIMD registers** Is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to 0.

**16-bit FP/SIMD registers** Is the optional positive immediate byte offset, a multiple of 2 in the range 0 to 8190, defaulting to 0.

**32-bit FP/SIMD registers** Is the optional positive immediate byte offset, a multiple of 4 in the range 0 to 16380, defaulting to 0.

**64-bit FP/SIMD registers** Is the optional positive immediate byte offset, a multiple of 8 in the range 0 to 32760, defaulting to 0.

**128-bit FP/SIMD registers** Is the optional positive immediate byte offset, a multiple of 16 in the range 0 to 65520, defaulting to 0.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Store SIMD and FP register (immediate offset). This instruction stores a single SIMD and FP register to memory. The address that is used for the store is calculated from a base register value and an immediate offset.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 data transfer instructions in alphabetical order](#) on page 818

## 19.58 STR (register, SIMD and FP) (A64)

Store SIMD and FP register (register offset).

### Syntax

```
STR <Bt>, [Xn|SP, (Wm|Xm), extend {amount}] ; 8-bit FP/SIMD registers
STR <Bt>, [Xn|SP, Xm{, LSL amount}] ; 8-bit FP/SIMD registers
STR Ht, [Xn|SP, (Wm|Xm){, extend {amount}}] ; 16-bit FP/SIMD registers
STR St, [Xn|SP, (Wm|Xm){, extend {amount}}] ; 32-bit FP/SIMD registers
STR Dt, [Xn|SP, (Wm|Xm){, extend {amount}}] ; 64-bit FP/SIMD registers
STR Qt, [Xn|SP, (Wm|Xm){, extend {amount}}] ; 128-bit FP/SIMD registers
```

Where:

**<Bt>**

Is the 8-bit name of the SIMD and FP register to be transferred.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Wm**

When "option<0>" is set to 0, is the 32-bit name of the general-purpose index register.

**Xm**

When "option<0>" is set to 1, is the 64-bit name of the general-purpose index register.

**extend**

Is the index extend specifier:

**8-bit FP/SIMD registers**

Can be one of UXTW, SXTW or SXTX.

**16-bit FP/SIMD registers**

Can be one of UXTW, LSL, SXTW or SXTX.

**amount**

Is the index shift amount, it must be.

**Ht**

Is the 16-bit name of the SIMD and FP register to be transferred.

***st***

Is the 32-bit name of the SIMD and FP register to be transferred.

***dt***

Is the 64-bit name of the SIMD and FP register to be transferred.

***qt***

Is the 128-bit name of the SIMD and FP register to be transferred.

**Usage**

Store SIMD and FP register (register offset). This instruction stores a single SIMD and FP register to memory. The address that is used for the store is calculated from a base register value and an offset register value. The offset can be optionally shifted and extended.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 19.59 STUR (SIMD and FP) (A64)

Store SIMD and FP register (unscaled offset).

**Syntax**

```
STUR <Bt>, [Xn|SP{, #simm}] ; 8-bit FP/SIMD registers
```

```
STUR Ht, [Xn|SP{, #simm}] ; 16-bit FP/SIMD registers
```

```
STUR St, [Xn|SP{, #simm}] ; 32-bit FP/SIMD registers
```

```
STUR Dt, [Xn|SP{, #simm}] ; 64-bit FP/SIMD registers
```

```
STUR Qt, [Xn|SP{, #simm}] ; 128-bit FP/SIMD registers
```

Where:

**<Bt>**

Is the 8-bit name of the SIMD and FP register to be transferred.

**Ht**

Is the 16-bit name of the SIMD and FP register to be transferred.

**St**

Is the 32-bit name of the SIMD and FP register to be transferred.

**Dt**

Is the 64-bit name of the SIMD and FP register to be transferred.

***Qt***

Is the 128-bit name of the SIMD and FP register to be transferred.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***simm***

Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

**Usage**

Store SIMD and FP register (unscaled offset). This instruction stores a single SIMD and FP register to memory. The address that is used for the store is calculated from a base register value and an optional immediate offset.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 data transfer instructions in alphabetical order](#) on page 818

## 19.60 UCVTF (scalar, fixed-point) (A64)

Unsigned fixed-point Convert to Floating-point (scalar).

**Syntax**

```
UCVTF Hd, Wn, #fbits ; 32-bit to half-precision
UCVTF Sd, Wn, #fbits ; 32-bit to single-precision
UCVTF Dd, Wn, #fbits ; 32-bit to double-precision
UCVTF Hd, Xn, #fbits ; 64-bit to half-precision
UCVTF Sd, Xn, #fbits ; 64-bit to single-precision
UCVTF Dd, Xn, #fbits ; 64-bit to double-precision
```

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Wn***

Is the 32-bit name of the general-purpose source register.

***fbits***

Depends on the instruction variant:

**32-bit** Is the number of bits after the binary point in the fixed-point source, in the range 1 to 32.

**64-bit** Is the number of bits after the binary point in the fixed-point source, in the range 1 to 64.

#### **Sd**

Is the 32-bit name of the SIMD and FP destination register.

#### **Dd**

Is the 64-bit name of the SIMD and FP destination register.

#### **Xn**

Is the 64-bit name of the general-purpose source register.

## Operation

Unsigned fixed-point Convert to Floating-point (scalar). This instruction converts the unsigned value in the 32-bit or 64-bit general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

`Vd = unsigned_convertFromInt (Rn / (2^fbits))`, where R is either w or x.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

## 19.61 UCVTF (scalar, integer) (A64)

Unsigned integer Convert to Floating-point (scalar).

### Syntax

`UCVTF Hd, Wn ; 32-bit to half-precision`

`UCVTF Sd, Wn ; 32-bit to single-precision`

`UCVTF Dd, Wn ; 32-bit to double-precision`

`UCVTF Hd, Xn ; 64-bit to half-precision`

`UCVTF Sd, Xn ; 64-bit to single-precision`

`UCVTF Dd, Xn ; 64-bit to double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Wn**

Is the 32-bit name of the general-purpose source register.

**Sd**

Is the 32-bit name of the SIMD and FP destination register.

**Dd**

Is the 64-bit name of the SIMD and FP destination register.

**Xn**

Is the 64-bit name of the general-purpose source register.

## Operation

Unsigned integer Convert to Floating-point (scalar). This instruction converts the unsigned integer value in the general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

`Vd = unsigned_convertFromInt(Rn)`, where *R* is either *w* or *x*.

## Related information

[A64 floating-point instructions in alphabetical order](#) on page 950

# 20. A64 SIMD Scalar Instructions

Describes the A64 SIMD scalar instructions.

## 20.1 A64 SIMD scalar instructions in alphabetical order

A summary of the A64 SIMD scalar instructions that are supported.

**Table 20-1: Summary of A64 SIMD scalar instructions**

| Mnemonic                 | Brief description                                              | See                                                 |
|--------------------------|----------------------------------------------------------------|-----------------------------------------------------|
| ABS (scalar)             | Absolute value (vector)                                        | <a href="#">ABS (scalar) (A64 SIMD)</a>             |
| ADD (scalar)             | Add (vector)                                                   | <a href="#">ADD (scalar) (A64 SIMD)</a>             |
| ADDP (scalar)            | Add Pair of elements (scalar)                                  | <a href="#">ADDP (scalar) (A64 SIMD)</a>            |
| CMEQ (scalar, register)  | Compare bitwise Equal (vector)                                 | <a href="#">CMEQ (scalar, register) (A64 SIMD)</a>  |
| CMEQ (scalar, zero)      | Compare bitwise Equal to zero (vector)                         | <a href="#">CMEQ (scalar, zero) (A64 SIMD)</a>      |
| CMGE (scalar, register)  | Compare signed Greater than or Equal (vector)                  | <a href="#">CMGE (scalar, register) (A64 SIMD)</a>  |
| CMGE (scalar, zero)      | Compare signed Greater than or Equal to zero (vector)          | <a href="#">CMGE (scalar, zero) (A64 SIMD)</a>      |
| CMGT (scalar, register)  | Compare signed Greater than (vector)                           | <a href="#">CMGT (scalar, register) (A64 SIMD)</a>  |
| CMGT (scalar, zero)      | Compare signed Greater than zero (vector)                      | <a href="#">CMGT (scalar, zero) (A64 SIMD)</a>      |
| CMHI (scalar, register)  | Compare unsigned Higher (vector)                               | <a href="#">CMHI (scalar, register) (A64 SIMD)</a>  |
| CMHS (scalar, register)  | Compare unsigned Higher or Same (vector)                       | <a href="#">CMHS (scalar, register) (A64 SIMD)</a>  |
| CMLE (scalar, zero)      | Compare signed Less than or Equal to zero (vector)             | <a href="#">CMLE (scalar, zero) (A64 SIMD)</a>      |
| CMLT (scalar, zero)      | Compare signed Less than zero (vector)                         | <a href="#">CMLT (scalar, zero) (A64 SIMD)</a>      |
| CMTST (scalar)           | Compare bitwise Test bits nonzero (vector)                     | <a href="#">CMTST (scalar) (A64 SIMD)</a>           |
| DUP (scalar, element)    | Duplicate vector element to scalar                             | <a href="#">DUP (scalar, element) (A64 SIMD)</a>    |
| FABD (scalar)            | Floating-point Absolute Difference (vector)                    | <a href="#">FABD (scalar) (A64 SIMD)</a>            |
| FACGE (scalar)           | Floating-point Absolute Compare Greater than or Equal (vector) | <a href="#">FACGE (scalar) (A64 SIMD)</a>           |
| FACGT (scalar)           | Floating-point Absolute Compare Greater than (vector)          | <a href="#">FACGT (scalar) (A64 SIMD)</a>           |
| FADDP (scalar)           | Floating-point Add Pair of elements (scalar)                   | <a href="#">FADDP (scalar) (A64 SIMD)</a>           |
| FCMEQ (scalar, register) | Floating-point Compare Equal (vector)                          | <a href="#">FCMEQ (scalar, register) (A64 SIMD)</a> |
| FCMEQ (scalar, zero)     | Floating-point Compare Equal to zero (vector)                  | <a href="#">FCMEQ (scalar, zero) (A64 SIMD)</a>     |
| FCMGE (scalar, register) | Floating-point Compare Greater than or Equal (vector)          | <a href="#">FCMGE (scalar, register) (A64 SIMD)</a> |
| FCMGE (scalar, zero)     | Floating-point Compare Greater than or Equal to zero (vector)  | <a href="#">FCMGE (scalar, zero) (A64 SIMD)</a>     |
| FCMGT (scalar, register) | Floating-point Compare Greater than (vector)                   | <a href="#">FCMGT (scalar, register) (A64 SIMD)</a> |

| Mnemonic                     | Brief description                                                                          | See                                                     |
|------------------------------|--------------------------------------------------------------------------------------------|---------------------------------------------------------|
| FCMGT (scalar, zero)         | Floating-point Compare Greater than zero (vector)                                          | <a href="#">FCMGT (scalar, zero) (A64 SIMD)</a>         |
| FCMLA (scalar, by element)   | Floating-point Complex Multiply Accumulate (by element)                                    | <a href="#">FCMLA (scalar, by element) (A64 SIMD)</a>   |
| FCMLE (scalar, zero)         | Floating-point Compare Less than or Equal to zero (vector)                                 | <a href="#">FCMLE (scalar, zero) (A64 SIMD)</a>         |
| FCMLT (scalar, zero)         | Floating-point Compare Less than zero (vector)                                             | <a href="#">FCMLT (scalar, zero) (A64 SIMD)</a>         |
| FCVTAS (scalar)              | Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector)   | <a href="#">FCVTAS (scalar) (A64 SIMD)</a>              |
| FCVTAU (scalar)              | Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector) | <a href="#">FCVTAU (scalar) (A64 SIMD)</a>              |
| FCVTMS (scalar)              | Floating-point Convert to Signed integer, rounding toward Minus infinity (vector)          | <a href="#">FCVTMS (scalar) (A64 SIMD)</a>              |
| FCVTMU (scalar)              | Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector)        | <a href="#">FCVTMU (scalar) (A64 SIMD)</a>              |
| FCVTNS (scalar)              | Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector)   | <a href="#">FCVTNS (scalar) (A64 SIMD)</a>              |
| FCVTNU (scalar)              | Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector) | <a href="#">FCVTNU (scalar) (A64 SIMD)</a>              |
| FCVTPS (scalar)              | Floating-point Convert to Signed integer, rounding toward Plus infinity (vector)           | <a href="#">FCVTPS (scalar) (A64 SIMD)</a>              |
| FCVTPU (scalar)              | Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector)         | <a href="#">FCVTPU (scalar) (A64 SIMD)</a>              |
| FCVTXN (scalar)              | Floating-point Convert to lower precision Narrow, rounding to odd (vector)                 | <a href="#">FCVTXN (scalar) (A64 SIMD)</a>              |
| FCVTZS (scalar, fixed-point) | Floating-point Convert to Signed fixed-point, rounding toward Zero (vector)                | <a href="#">FCVTZS (scalar, fixed-point) (A64 SIMD)</a> |
| FCVTZS (scalar, integer)     | Floating-point Convert to Signed integer, rounding toward Zero (vector)                    | <a href="#">FCVTZS (scalar, integer) (A64 SIMD)</a>     |
| FCVTZU (scalar, fixed-point) | Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector)              | <a href="#">FCVTZU (scalar, fixed-point) (A64 SIMD)</a> |
| FCVTZU (scalar, integer)     | Floating-point Convert to Unsigned integer, rounding toward Zero (vector)                  | <a href="#">FCVTZU (scalar, integer) (A64 SIMD)</a>     |
| FMAXNMP (scalar)             | Floating-point Maximum Number of Pair of elements (scalar)                                 | <a href="#">FMAXNMP (scalar) (A64 SIMD)</a>             |
| FMAXP (scalar)               | Floating-point Maximum of Pair of elements (scalar)                                        | <a href="#">FMAXP (scalar) (A64 SIMD)</a>               |
| FMINNMP (scalar)             | Floating-point Minimum Number of Pair of elements (scalar)                                 | <a href="#">FMINNMP (scalar) (A64 SIMD)</a>             |
| FMINP (scalar)               | Floating-point Minimum of Pair of elements (scalar)                                        | <a href="#">FMINP (scalar) (A64 SIMD)</a>               |
| FMLA (scalar, by element)    | Floating-point fused Multiply-Add to accumulator (by element)                              | <a href="#">FMLA (scalar, by element) (A64 SIMD)</a>    |

| Mnemonic                      | Brief description                                                                        | See                                      |
|-------------------------------|------------------------------------------------------------------------------------------|------------------------------------------|
| FMLS (scalar, by element)     | Floating-point fused Multiply-Subtract from accumulator (by element)                     | FMLS (scalar, by element) (A64 SIMD)     |
| FMUL (scalar, by element)     | Floating-point Multiply (by element)                                                     | FMUL (scalar, by element) (A64 SIMD)     |
| FMULX (scalar, by element)    | Floating-point Multiply extended (by element)                                            | FMULX (scalar, by element) (A64 SIMD)    |
| FMULX (scalar)                | Floating-point Multiply extended                                                         | FMULX (scalar) (A64 SIMD)                |
| FRECPE (scalar)               | Floating-point Reciprocal Estimate                                                       | FRECPE (scalar) (A64 SIMD)               |
| FRECPS (scalar)               | Floating-point Reciprocal Step                                                           | FRECPS (scalar) (A64 SIMD)               |
| FRSQRT (scalar)               | Floating-point Reciprocal Square Root Estimate                                           | FRSQRT (scalar) (A64 SIMD)               |
| FRSQRTS (scalar)              | Floating-point Reciprocal Square Root Step                                               | FRSQRTS (scalar) (A64 SIMD)              |
| MOV (scalar)                  | Move vector element to scalar                                                            | MOV (scalar) (A64 SIMD)                  |
| NEG (scalar)                  | Negate (vector)                                                                          | NEG (scalar) (A64 SIMD)                  |
| SCVT (scalar, fixed-point)    | Signed fixed-point Convert to Floating-point (vector)                                    | SCVT (scalar, fixed-point) (A64 SIMD)    |
| SCVT (scalar, integer)        | Signed integer Convert to Floating-point (vector)                                        | SCVT (scalar, integer) (A64 SIMD)        |
| SHL (scalar)                  | Shift Left (immediate)                                                                   | SHL (scalar) (A64 SIMD)                  |
| SLI (scalar)                  | Shift Left and Insert (immediate)                                                        | SLI (scalar) (A64 SIMD)                  |
| SQABS (scalar)                | Signed saturating Absolute value                                                         | SQABS (scalar) (A64 SIMD)                |
| SQADD (scalar)                | Signed saturating Add                                                                    | SQADD (scalar) (A64 SIMD)                |
| SQDMLAL (scalar, by element)  | Signed saturating Doubling Multiply-Add Long (by element)                                | SQDMLAL (scalar, by element) (A64 SIMD)  |
| SQDMLAL (scalar)              | Signed saturating Doubling Multiply-Add Long                                             | SQDMLAL (scalar) (A64 SIMD)              |
| SQDMLSL (scalar, by element)  | Signed saturating Doubling Multiply-Subtract Long (by element)                           | SQDMLSL (scalar, by element) (A64 SIMD)  |
| SQDMLSL (scalar)              | Signed saturating Doubling Multiply-Subtract Long                                        | SQDMLSL (scalar) (A64 SIMD)              |
| SQDMULH (scalar, by element)  | Signed saturating Doubling Multiply returning High half (by element)                     | SQDMULH (scalar, by element) (A64 SIMD)  |
| SQDMULH (scalar)              | Signed saturating Doubling Multiply returning High half                                  | SQDMULH (scalar) (A64 SIMD)              |
| SQDMULL (scalar, by element)  | Signed saturating Doubling Multiply Long (by element)                                    | SQDMULL (scalar, by element) (A64 SIMD)  |
| SQDMULL (scalar)              | Signed saturating Doubling Multiply Long                                                 | SQDMULL (scalar) (A64 SIMD)              |
| SQNEG (scalar)                | Signed saturating Negate                                                                 | SQNEG (scalar) (A64 SIMD)                |
| SQRDMLAH (scalar, by element) | Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element) | SQRDMLAH (scalar, by element) (A64 SIMD) |
| SQRDMLAH (scalar)             | Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector)     | SQRDMLAH (scalar) (A64 SIMD)             |
| SQRDMLSH (scalar, by element) | Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element)   | SQRDMLSH (scalar, by element) (A64 SIMD) |

| Mnemonic                      | Brief description                                                                  | See                                      |
|-------------------------------|------------------------------------------------------------------------------------|------------------------------------------|
| SQRDMLSH (scalar)             | Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector) | SQRDMLSH (scalar) (A64 SIMD)             |
| SQRDMULH (scalar, by element) | Signed saturating Rounding Doubling Multiply returning High half (by element)      | SQRDMULH (scalar, by element) (A64 SIMD) |
| SQRDMULH (scalar)             | Signed saturating Rounding Doubling Multiply returning High half                   | SQRDMULH (scalar) (A64 SIMD)             |
| SQRSHL (scalar)               | Signed saturating Rounding Shift Left (register)                                   | SQRSHL (scalar) (A64 SIMD)               |
| SQRSHRN (scalar)              | Signed saturating Rounded Shift Right Narrow (immediate)                           | SQRSHRN (scalar) (A64 SIMD)              |
| SQRSHRUN (scalar)             | Signed saturating Rounded Shift Right Unsigned Narrow (immediate)                  | SQRSHRUN (scalar) (A64 SIMD)             |
| SQSHL (scalar, immediate)     | Signed saturating Shift Left (immediate)                                           | SQSHL (scalar, immediate) (A64 SIMD)     |
| SQSHL (scalar, register)      | Signed saturating Shift Left (register)                                            | SQSHL (scalar, register) (A64 SIMD)      |
| SQSHLU (scalar)               | Signed saturating Shift Left Unsigned (immediate)                                  | SQSHLU (scalar) (A64 SIMD)               |
| SQSHRN (scalar)               | Signed saturating Shift Right Narrow (immediate)                                   | SQSHRN (scalar) (A64 SIMD)               |
| SQSHRUN (scalar)              | Signed saturating Shift Right Unsigned Narrow (immediate)                          | SQSHRUN (scalar) (A64 SIMD)              |
| SQSUB (scalar)                | Signed saturating Subtract                                                         | SQSUB (scalar) (A64 SIMD)                |
| SQXTN (scalar)                | Signed saturating extract Narrow                                                   | SQXTN (scalar) (A64 SIMD)                |
| SQXTUN (scalar)               | Signed saturating extract Unsigned Narrow                                          | SQXTUN (scalar) (A64 SIMD)               |
| SRI (scalar)                  | Shift Right and Insert (immediate)                                                 | SRI (scalar) (A64 SIMD)                  |
| SRSHL (scalar)                | Signed Rounding Shift Left (register)                                              | SRSHL (scalar) (A64 SIMD)                |
| SRSHR (scalar)                | Signed Rounding Shift Right (immediate)                                            | SRSHR (scalar) (A64 SIMD)                |
| SRSRA (scalar)                | Signed Rounding Shift Right and Accumulate (immediate)                             | SRSRA (scalar) (A64 SIMD)                |
| SSHLD (scalar)                | Signed Shift Left (register)                                                       | SSHLD (scalar) (A64 SIMD)                |
| SSHR (scalar)                 | Signed Shift Right (immediate)                                                     | SSHR (scalar) (A64 SIMD)                 |
| SSRA (scalar)                 | Signed Shift Right and Accumulate (immediate)                                      | SSRA (scalar) (A64 SIMD)                 |
| SUB (scalar)                  | Subtract (vector)                                                                  | SUB (scalar) (A64 SIMD)                  |
| SUQADD (scalar)               | Signed saturating Accumulate of Unsigned value                                     | SUQADD (scalar) (A64 SIMD)               |
| UCVTF (scalar, fixed-point)   | Unsigned fixed-point Convert to Floating-point (vector)                            | UCVTF (scalar, fixed-point) (A64 SIMD)   |
| UCVTI (scalar, integer)       | Unsigned integer Convert to Floating-point (vector)                                | UCVTI (scalar, integer) (A64 SIMD)       |
| UQADD (scalar)                | Unsigned saturating Add                                                            | UQADD (scalar) (A64 SIMD)                |
| UQRSHL (scalar)               | Unsigned saturating Rounding Shift Left (register)                                 | UQRSHL (scalar) (A64 SIMD)               |
| UQRSHRN (scalar)              | Unsigned saturating Rounded Shift Right Narrow (immediate)                         | UQRSHRN (scalar) (A64 SIMD)              |

| Mnemonic                  | Brief description                                        | See                                                  |
|---------------------------|----------------------------------------------------------|------------------------------------------------------|
| UQSHL (scalar, immediate) | Unsigned saturating Shift Left (immediate)               | <a href="#">UQSHL (scalar, immediate) (A64 SIMD)</a> |
| UQSHL (scalar, register)  | Unsigned saturating Shift Left (register)                | <a href="#">UQSHL (scalar, register) (A64 SIMD)</a>  |
| UQSHRN (scalar)           | Unsigned saturating Shift Right Narrow (immediate)       | <a href="#">UQSHRN (scalar) (A64 SIMD)</a>           |
| UQSUB (scalar)            | Unsigned saturating Subtract                             | <a href="#">UQSUB (scalar) (A64 SIMD)</a>            |
| UQXTN (scalar)            | Unsigned saturating extract Narrow                       | <a href="#">UQXTN (scalar) (A64 SIMD)</a>            |
| URSHL (scalar)            | Unsigned Rounding Shift Left (register)                  | <a href="#">URSHL (scalar) (A64 SIMD)</a>            |
| URSHR (scalar)            | Unsigned Rounding Shift Right (immediate)                | <a href="#">URSHR (scalar) (A64 SIMD)</a>            |
| URSRA (scalar)            | Unsigned Rounding Shift Right and Accumulate (immediate) | <a href="#">URSRA (scalar) (A64 SIMD)</a>            |
| USHL (scalar)             | Unsigned Shift Left (register)                           | <a href="#">USHL (scalar) (A64 SIMD)</a>             |
| USHR (scalar)             | Unsigned Shift Right (immediate)                         | <a href="#">USHR (scalar) (A64 SIMD)</a>             |
| USQADD (scalar)           | Unsigned saturating Accumulate of Signed value           | <a href="#">USQADD (scalar) (A64 SIMD)</a>           |
| USRA (scalar)             | Unsigned Shift Right and Accumulate (immediate)          | <a href="#">USRA (scalar) (A64 SIMD)</a>             |

## 20.2 ABS (scalar) (A64 SIMD)

Absolute value (vector).

### Syntax

`ABS Vd, Vn`

Where:

**v**

Is a width specifier, d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Usage

Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD and FP register, puts the result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.3 ADD (scalar) (A64 SIMD)

Add (vector).

### Syntax

`ADD Vd, Vn, Vm`

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

### Usage

Add (vector). This instruction adds corresponding elements in the two source SIMD and FP registers, places the results into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.4 ADDP (scalar) (A64 SIMD)

Add Pair of elements (scalar).

### Syntax

`ADDP Vd, Vn.T`

Where:

**v**

Is the destination width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

***Vn***

Is the name of the SIMD and FP source register.

***T***

Is the source arrangement specifier,  $2D$ .

## Usage

Add Pair of elements (scalar). This instruction adds two vector elements in the source SIMD and FP register and writes the scalar result into the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.5 CMEQ (scalar, register) (A64 SIMD)

Compare bitwise Equal (vector).

### Syntax

CMEQ *Vd*, *Vn*, *Vm*

Where:

***V***

Is a width specifier,  $D$ .

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

## Usage

Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD and FP register with the corresponding vector element from the second source SIMD and FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.6 CMEQ (scalar, zero) (A64 SIMD)

Compare bitwise Equal to zero (vector).

**Syntax**

CMEQ *Vd*, *Vn*, #0

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

**Usage**

Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD and FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.7 CMGE (scalar, register) (A64 SIMD)

Compare signed Greater than or Equal (vector).

**Syntax**

CMGE *Vd*, *Vn*, *Vm*

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD and FP register with the corresponding vector element in the second source SIMD and FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.8 CMGE (scalar, zero) (A64 SIMD)

Compare signed Greater than or Equal to zero (vector).

### Syntax

```
CMGE vd, vn, #0
```

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

## Usage

Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD and FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.9 CMGT (scalar, register) (A64 SIMD)

Compare signed Greater than (vector).

### Syntax

CMGT *Vd*, *Vn*, *Vm*

Where:

**v**

Is a width specifier, *D*.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

### Usage

Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD and FP register with the corresponding vector element in the second source SIMD and FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.10 CMGT (scalar, zero) (A64 SIMD)

Compare signed Greater than zero (vector).

### Syntax

```
CMGT Vd, Vn, #0
```

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Usage

Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD and FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.11 CMHI (scalar, register) (A64 SIMD)

Compare unsigned Higher (vector).

### Syntax

```
CMHI Vd, Vn, Vm
```

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD and FP register with the corresponding vector element in the second source SIMD and FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.12 CMHS (scalar, register) (A64 SIMD)

Compare unsigned Higher or Same (vector).

### Syntax

CMHS *Vd*, *Vn*, *Vm*

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD and FP register with the corresponding vector element in the second source SIMD and FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.13 CMLE (scalar, zero) (A64 SIMD)

Compare signed Less than or Equal to zero (vector).

**Syntax**

```
CMLE Vd, Vn, #0
```

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

**Usage**

Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD and FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.14 CMLT (scalar, zero) (A64 SIMD)

Compare signed Less than zero (vector).

**Syntax**

```
CMLT Vd, Vn, #0
```

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

## Usage

Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD and FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.15 CMTST (scalar) (A64 SIMD)

Compare bitwise Test bits nonzero (vector).

### Syntax

```
CMTST Vd, Vn,Vm
```

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD and FP register, performs an AND with the corresponding vector element in the second source SIMD and FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.16 DUP (scalar, element) (A64 SIMD)

Duplicate vector element to scalar.

This instruction is used by the alias `mov` (scalar).

### Syntax

`DUP Vd, Vn.T[index]`

Where:

**V**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**T**

Is the element width specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**index**

Is the element index, in the range shown in Usage.

### Usage

Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD and FP register into a scalar or each element in a vector, and writes the result to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-2: DUP (Scalar) specifier combinations**

| <b>V</b> | <b>T</b> | <b>index</b> |
|----------|----------|--------------|
| B        | B        | 0 to 15      |
| H        | H        | 0 to 7       |
| S        | S        | 0 to 3       |
| D        | D        | 0 or 1       |

## Related information

[MOV \(scalar\) \(A64 SIMD\)](#) on page 1080

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.17 FABD (scalar) (A64 SIMD)

Floating-point Absolute Difference (vector).

### Syntax

`FABD Hd, Hn, Hm ; Scalar half precision`

`FABD Vd, Vn, Vm ; Scalar single-precision and double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register.

**Hm**

Is the 16-bit name of the second SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD and FP register, from the corresponding floating-point values in the elements of the first source SIMD and FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.18 FACGE (scalar) (A64 SIMD)

Floating-point Absolute Compare Greater than or Equal (vector).

### Syntax

`FACGE Hd, Hn, Hm ; Scalar half precision`

`FACGE Vd, Vn, Vm ; Scalar single-precision and double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register.

**Hm**

Is the 16-bit name of the second SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD and FP register with the absolute value of the corresponding floating-point value in the second source SIMD and FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.19 FACGT (scalar) (A64 SIMD)

Floating-point Absolute Compare Greater than (vector).

### Syntax

FACGT *Hd*, *Hn*, *Hm* ; Scalar half precision

FACGT *Vd*, *Vn*, *Vm* ; Scalar single-precision and double-precision

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the first SIMD and FP source register.

***Hm***

Is the 16-bit name of the second SIMD and FP source register.

***V***

Is a width specifier, and can be either **s** or **d**.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD and FP register with the absolute value

of the corresponding vector element in the second source SIMD and FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.20 FADDP (scalar) (A64 SIMD)

Floating-point Add Pair of elements (scalar).

### Syntax

`FADDP Vd, Vn.T ; Half-precision`

`FADDP Vd, Vn.T ; Single-precision and double-precision`

Where:

**v**

Is the destination width specifier:

#### Half-precision

Must be `H`.

#### Single-precision and double-precision

Can be one of `s` or `d`.

**t**

Is the source arrangement specifier:

#### Half-precision

Must be `2H`.

#### Single-precision and double-precision

Can be one of `2s` or `2D`.

**d**

Is the number of the SIMD and FP destination register.

***Vn***

Is the name of the SIMD and FP source register.

**Architectures supported (scalar)**

Supported in the Arm®v8.2 architecture and later.

**Usage**

Floating-point Add Pair of elements (scalar). This instruction adds two floating-point vector elements in the source SIMD and FP register and writes the scalar result into the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.21 FCMEQ (scalar, register) (A64 SIMD)

Floating-point Compare Equal (vector).

**Syntax**

```
FCMEQ Hd, Hn, Hm ; Scalar half precision
```

```
FCMEQ Vd, Vn, Vm ; Scalar single-precision and double-precision
```

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register.

**Hm**

Is the 16-bit name of the second SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD and FP register, with the corresponding floating-point value from the second source SIMD and FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.22 FCMEQ (scalar, zero) (A64 SIMD)

Floating-point Compare Equal to zero (vector).

### Syntax

```
FCMEQ Hd, Hn, #0.0 ; Scalar half precision
```

```
FCMEQ Vd, Vn, #0.0 ; Scalar single-precision and double-precision
```

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***V***

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD and FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.23 FCMGE (scalar, register) (A64 SIMD)

Floating-point Compare Greater than or Equal (vector).

### Syntax

FCMGE *Hd*, *Hn*, *Hm* ; Scalar half precision

FCMGE *Vd*, *Vn*, *Vm* ; Scalar single-precision and double-precision

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register.

**Hm**

Is the 16-bit name of the second SIMD and FP source register.

**v**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD and FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD and FP register sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.24 FCMGE (scalar, zero) (A64 SIMD)

Floating-point Compare Greater than or Equal to zero (vector).

### Syntax

```
FCMGE Hd, Hn, #0.0 ; Scalar half precision
```

```
FCMGE Vd, Vn, #0.0 ; Scalar single-precision and double-precision
```

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**v**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD and FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.25 FCMGT (scalar, register) (A64 SIMD)

Floating-point Compare Greater than (vector).

### Syntax

FCMGT Hd, Hn, Hm ; Scalar half precision

FCMGT Vd, Vn, Vm ; Scalar single-precision and double-precision

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the first SIMD and FP source register.

***Hm***

Is the 16-bit name of the second SIMD and FP source register.

***V***

Is a width specifier, and can be either **s** or **d**.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

## Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD and FP register and if the value is greater than the corresponding floating-point value in the second source SIMD and FP register sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.26 FCMGT (scalar, zero) (A64 SIMD)

Floating-point Compare Greater than zero (vector).

### Syntax

```
FCMGT Hd, Hn, #0.0 ; Scalar half precision
FCMGT Vd, Vn, #0.0 ; Scalar single-precision and double-precision
```

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***v***

Is a width specifier, and can be either s or d.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the SIMD and FP source register.

## Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD and FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.27 FCMLA (scalar, by element) (A64 SIMD)

Floating-point Complex Multiply Accumulate (by element).

### Syntax

```
FCMLA Vd.T, Vn.T, Vm.Ts[index], #rotate ;
```

Where:

***vd***

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register in the range 0 to 31.

**Ts**

Is an element size specifier, and can be either H or S.

**index**

Is the element index, in the range shown in Usage.

**rotate**

Is the rotation, and can be one of the values shown in Usage.

## Architectures supported (scalar)

Supported in the Arm®v8.3-A architecture and later.

## Usage

This instruction multiplies the two source complex numbers from the  $v_m$  and  $v_n$  the vector registers and adds the result to the corresponding complex number in the destination vector register. The number of complex numbers that can be stored in the  $v_m$ , the  $v_n$ , and the  $v_d$  registers is calculated as the vector register size divided by the length of each complex number. These lengths are 16 for half-precision, 32 for single-precision, and 64 for double-precision. Each complex number is represented in a SIMD and FP register as a pair of elements with the imaginary part of the number being placed in the more significant element, and the real part of the number being placed in the less significant element. Both real and imaginary parts of the source and the resulting complex number are represented as floating-point values.

None, one, or both of the two vector elements that are read from each of the numbers in the source SIMD and FP register can be negated based on the rotation value:

- If the rotation is 0, none of the vector elements are negated.
- If the rotation is 90, the odd-numbered vector elements are negated.
- If the rotation is 180, both vector elements are negated.
- If the rotation is 270, the even-numbered vector elements are negated.

The indexed element variant of this instruction is available for half-precision and single-precision number values. For this variant, the index value determines the position in the source vector register of the single source value that is used to multiply each of the complex numbers in the source vector register. The index value is encoded as H:L for half-precision values, or H for single-precision values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated.

For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-3: FCMLA (Scalar) specifier combinations**

| <b>T</b> | <b>Ts</b> |
|----------|-----------|
| 4H       | H         |
| 8H       | H         |
| 4S       | S         |

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.28 FCMLE (scalar, zero) (A64 SIMD)

Floating-point Compare Less than or Equal to zero (vector).

### Syntax

FCMLE *Hd*, *Hn*, #0.0 ; Scalar half precision

FCMLE *Vd*, *Vn*, #0.0 ; Scalar single-precision and double-precision

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD and FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.29 FCMLT (scalar, zero) (A64 SIMD)

Floating-point Compare Less than zero (vector).

### Syntax

```
FCMLT Hd, Hn, #0.0 ; Scalar half precision
FCMLT Vd, Vn, #0.0 ; Scalar single-precision and double-precision
```

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD and FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.30 FCVTAS (scalar) (A64 SIMD)

Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector).

### Syntax

FCVTAS *Hd*, *Hn* ; Scalar half precision

FCVTAS *Vd*, *Vn* ; Scalar single-precision and double-precision

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***V***

Is a width specifier, and can be either s or d.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.31 FCVTAU (scalar) (A64 SIMD)

Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector).

### Syntax

FCVTAU *Hd*, *Hn* ; Scalar half precision

FCVTAU *Vd*, *Vn* ; Scalar single-precision and double-precision

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***V***

Is a width specifier, and can be either s or d.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.32 FCVTMS (scalar) (A64 SIMD)

Floating-point Convert to Signed integer, rounding toward Minus infinity (vector).

### Syntax

```
FCVTMS Hd, Hn ; Scalar half precision
FCVTMS Vd, Vn ; Scalar single-precision and double-precision
```

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.33 FCVTMU (scalar) (A64 SIMD)

Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector).

### Syntax

```
FCVTMU Hd, Hn ; Scalar half precision
FCVTMU Vd, Vn ; Scalar single-precision and double-precision
```

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.34 FCVTNS (scalar) (A64 SIMD)

Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector).

### Syntax

FCVTNS *Hd, Hn* ; Scalar half precision

FCVTNS *Vd, Vn* ; Scalar single-precision and double-precision

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.35 FCVTNU (scalar) (A64 SIMD)

Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector).

### Syntax

FCVTNU *Hd, Hn* ; Scalar half precision

FCVTNU *Vd, Vn* ; Scalar single-precision and double-precision

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.36 FCVTPS (scalar) (A64 SIMD)

Floating-point Convert to Signed integer, rounding toward Plus infinity (vector).

### Syntax

```
FCVTPS Hd, Hn ; Scalar half precision
FCVTPS Vd, Vn ; Scalar single-precision and double-precision
```

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.37 FCVTPU (scalar) (A64 SIMD)

Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector).

### Syntax

```
FCVTPU Hd, Hn ; Scalar half precision
FCVTPU Vd, Vn ; Scalar single-precision and double-precision
```

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.38 FCVTXN (scalar) (A64 SIMD)

Floating-point Convert to lower precision Narrow, rounding to odd (vector).

### Syntax

FCVTXN *Vbd*, *Van*

Where:

**v<sub>b</sub>**

Is the destination width specifier, s.

**d**

Is the number of the SIMD and FP destination register.

**v<sub>a</sub>**

Is the source width specifier, d.

**n**

Is the number of the SIMD and FP source register.

## Usage

Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element in the source SIMD and FP register, narrows each value to half the precision of the source element using the Round to Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD and FP register.

**Note**

This instruction uses the Round to Odd rounding mode which is not defined by the IEEE 754-2008 standard. This rounding mode ensures that if the result of the conversion is inexact the least significant bit of the mantissa is forced to 1. This rounding mode enables a floating-point value to be converted to a lower precision format via an intermediate precision format while avoiding double rounding errors. For example, a 64-bit floating-point value can be converted to a correctly rounded 16-bit floating-point value by first using this instruction to produce a 32-bit value and then using another instruction with the wanted rounding mode to convert the 32-bit value to the final 16-bit floating-point value.

The `FCVTXN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `FCVTXN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.39 FCVTZS (scalar, fixed-point) (A64 SIMD)

Floating-point Convert to Signed fixed-point, rounding toward Zero (vector).

### Syntax

`FCVTZS vd, vn, #fbits`

Where:

**v**

Is a width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**fbits**

Is the number of fractional bits, in the range 1 to the operand width.

## Usage

Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-4: FCVTZS (Scalar) specifier combinations**

| v | fbits   |
|---|---------|
| H | -       |
| S | 1 to 32 |
| D | 1 to 64 |

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.40 FCVTZS (scalar, integer) (A64 SIMD)

Floating-point Convert to Signed integer, rounding toward Zero (vector).

### Syntax

FCVTZS *Hd, Hn* ; Scalar half precision

FCVTZS *Vd, Vn* ; Scalar single-precision and double-precision

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Convert to Signed integer, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Zero rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.41 FCVTZU (scalar, fixed-point) (A64 SIMD)

Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector).

### Syntax

`FCVTZU vd, vn, #fbits`

Where:

**v**

Is a width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**fbits**

Is the number of fractional bits, in the range 1 to the operand width.

## Usage

Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-5: FCVTZU (Scalar) specifier combinations**

| v | fbits   |
|---|---------|
| H | -       |
| S | 1 to 32 |
| D | 1 to 64 |

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.42 FCVTZU (scalar, integer) (A64 SIMD)

Floating-point Convert to Unsigned integer, rounding toward Zero (vector).

### Syntax

FCVTZU *Hd, Hn* ; Scalar half precision

FCVTZU *Vd, Vn* ; Scalar single-precision and double-precision

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Convert to Unsigned integer, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Zero rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.43 FMAXNMP (scalar) (A64 SIMD)

Floating-point Maximum Number of Pair of elements (scalar).

### Syntax

`FMAXNMP Vd, Vn.T ; Half-precision`

`FMAXNMP Vd, Vn.T ; Single-precision and double-precision`

Where:

**v**

Is the destination width specifier:

#### Half-precision

Must be `H`.

#### Single-precision and double-precision

Can be one of `s` or `d`.

**T**

Is the source arrangement specifier:

#### **Half-precision**

Must be `2H`.

#### **Single-precision and double-precision**

Can be one of `2S` or `2D`.

**d**

Is the number of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

### **Architectures supported (scalar)**

Supported in the Arm®v8.2 architecture and later.

### **Usage**

Floating-point Maximum Number of Pair of elements (scalar). This instruction compares two vector elements in the source SIMD and FP register and writes the largest of the floating-point values as a scalar to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### **Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## **20.44 FMAXP (scalar) (A64 SIMD)**

Floating-point Maximum of Pair of elements (scalar).

### **Syntax**

`FMAXP Vd, Vn.T ; Half-precision`

`FMAXP Vd, Vn.T ; Single-precision and double-precision`

Where:

**V**

Is the destination width specifier:

**Half-precision**

Must be `H`.

**Single-precision and double-precision**

Can be one of `s` or `d`.

**`T`**

Is the source arrangement specifier:

**Half-precision**

Must be `2H`.

**Single-precision and double-precision**

Can be one of `2s` or `2d`.

**`d`**

Is the number of the SIMD and FP destination register.

**`Vn`**

Is the name of the SIMD and FP source register.

**Architectures supported (scalar)**

Supported in the Arm®v8.2 architecture and later.

**Usage**

Floating-point Maximum of Pair of elements (scalar). This instruction compares two vector elements in the source SIMD and FP register and writes the largest of the floating-point values as a scalar to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.45 FMINNMP (scalar) (A64 SIMD)

Floating-point Minimum Number of Pair of elements (scalar).

**Syntax**

`FMINNMP Vd, Vn.T ; Half-precision`

**FMINNMP Vd, Vn.T ; Single-precision and double-precision**

Where:

**v**

Is the destination width specifier:

**Half-precision**

Must be **H**.

**Single-precision and double-precision**

Can be one of **s** or **d**.

**t**

Is the source arrangement specifier:

**Half-precision**

Must be **2H**.

**Single-precision and double-precision**

Can be one of **2s** or **2d**.

**d**

Is the number of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

## Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Minimum Number of Pair of elements (scalar). This instruction compares two vector elements in the source SIMD and FP register and writes the smallest of the floating-point values as a scalar to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.46 FMINP (scalar) (A64 SIMD)

Floating-point Minimum of Pair of elements (scalar).

### Syntax

`FMINP Vd, Vn.T ; Half-precision`

`FMINP Vd, Vn.T ; Single-precision and double-precision`

Where:

**v**

Is the destination width specifier:

#### Half-precision

Must be `H`.

#### Single-precision and double-precision

Can be one of `s` or `d`.

**T**

Is the source arrangement specifier:

#### Half-precision

Must be `2H`.

#### Single-precision and double-precision

Can be one of `2s` or `2d`.

**d**

Is the number of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Minimum of Pair of elements (scalar). This instruction compares two vector elements in the source SIMD and FP register and writes the smallest of the floating-point values as a scalar to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.47 FMLA (scalar, by element) (A64 SIMD)

Floating-point fused Multiply-Add to accumulator (by element).

### Syntax

`FMLA Vd, Vn, Vm.Ts[index]`

Where:

**v**

Is a width specifier, and can be H, S, or D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**Ts**

Is an element size specifier, and can be H, S, or D.

**index**

Is the element index, H.

**Vm**

Is the name of the second SIMD and FP source register in the range 0 to 31.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point fused Multiply-Add to accumulator (by element). This instruction multiplies the vector elements in the first source SIMD and FP register by the specified value in the second source SIMD and FP register, and accumulates the results in the vector elements of the destination SIMD and FP register. All the values in this instruction are floating-point values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-6: FMLA (Scalar, single-precision and double-precision) specifier combinations**

| <b>v</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| S        | S         | 0 to 3       |
| D        | D         | 0 or 1       |

**Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.48 FMLS (scalar, by element) (A64 SIMD)

Floating-point fused Multiply-Subtract from accumulator (by element).

### Syntax

`FMLS Vd, Vn, Vm.Ts[index]`

Where:

**v**

Is a width specifier, and can be H, S, or D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**Ts**

Is an element size specifier, and can be H, S, or D.

**index**

Is the element index, H.

**Vm**

Is the name of the second SIMD and FP source register in the range 0 to 31.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point fused Multiply-Subtract from accumulator (by element). This instruction multiplies the vector elements in the first source SIMD and FP register by the specified value in the second source SIMD and FP register, and subtracts the results from the vector elements of the destination SIMD and FP register. All the values in this instruction are floating-point values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see [Floating-point exception traps](#) in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-7: FMLS (Scalar, single-precision and double-precision) specifier combinations**

| <b>v</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| S        | S         | 0 to 3       |
| D        | D         | 0 or 1       |

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.49 FMUL (scalar, by element) (A64 SIMD)

Floating-point Multiply (by element).

### Syntax

`FMUL Vd, Vn, Vm.Ts[index]`

Where:

**v**

Is a width specifier, and can be H, S, or D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**Ts**

Is an element size specifier, and can be H, S, or D.

**index**

Is the element index, H.

**Vm**

Is the name of the second SIMD and FP source register in the range 0 to 31.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Multiply (by element). This instruction multiplies the vector elements in the first source SIMD and FP register by the specified value in the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are floating-point values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-8: FMUL (Scalar, single-precision and double-precision) specifier combinations**

| <b>v</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| S        | S         | 0 to 3       |
| D        | D         | 0 or 1       |

#### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.50 FMULX (scalar, by element) (A64 SIMD)

Floating-point Multiply extended (by element).

### Syntax

`FMULX Vd, Vn, Vm.Ts[index]`

Where:

**v**

Is a width specifier, and can be H, S, or D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**Ts**

Is an element size specifier, and can be H, S, or D.

**index**

Is the element index, H.

**Vm**

Is the name of the second SIMD and FP source register in the range 0 to 31.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Multiply extended (by element). This instruction multiplies the floating-point values in the vector elements in the first source SIMD and FP register by the specified floating-point value in the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

Before each multiplication, a check is performed for whether one value is infinite and the other is zero. In this case, if only one of the values is negative, the result is 2.0, otherwise the result is -2.0.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-9: FMULX (Scalar, single-precision and double-precision) specifier combinations**

| v | ts | index  |
|---|----|--------|
| S | S  | 0 to 3 |
| D | D  | 0 or 1 |

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.51 FMULX (scalar) (A64 SIMD)

Floating-point Multiply extended.

### Syntax

FMULX *Hd*, *Hn*, *Hm* ; Scalar half precision

FMULX *Vd*, *Vn*, *Vm* ; Scalar single-precision and double-precision

Where:

#### *Hd*

Is the 16-bit name of the SIMD and FP destination register.

#### *Hn*

Is the 16-bit name of the first SIMD and FP source register.

#### *Hm*

Is the 16-bit name of the second SIMD and FP source register.

**v**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD and FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD and FP register.

If one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the values is negative, otherwise the result is positive.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.52 FRECPE (scalar) (A64 SIMD)

Floating-point Reciprocal Estimate.

### Syntax

```
FRECPE Hd, Hn ; Scalar half precision
```

```
FRECPE Vd, Vn ; Scalar single-precision and double-precision
```

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***V***

Is a width specifier, and can be either **s** or **d**.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD and FP register, places the result in a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.53 FRECPS (scalar) (A64 SIMD)

Floating-point Reciprocal Step.

### Syntax

```
FRECPS Hd, Hn, Hm ; Scalar half precision
```

```
FRECPS Vd, Vn, Vm ; Scalar single-precision and double-precision
```

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the first SIMD and FP source register.

***Hm***

Is the 16-bit name of the second SIMD and FP source register.

***V***

Is a width specifier, and can be either s or d.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

## Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD and FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.54 FRSQRTE (scalar) (A64 SIMD)

Floating-point Reciprocal Square Root Estimate.

### Syntax

```
FRSQRTE Hd, Hn ; Scalar half precision
```

```
FRSQRTE Vd, Vn ; Scalar single-precision and double-precision
```

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***V***

Is a width specifier, and can be either **s** or **d**.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the SIMD and FP source register.

## Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD and FP register, places the result in a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.55 FRSQRTS (scalar) (A64 SIMD)

Floating-point Reciprocal Square Root Step.

### Syntax

```
FRSQRTS Hd, Hn, Hm ; Scalar half precision
```

```
FRSQRTS Vd, Vn, Vm ; Scalar single-precision and double-precision
```

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the first SIMD and FP source register.

***Hm***

Is the 16-bit name of the second SIMD and FP source register.

***v***

Is a width specifier, and can be either **s** or **d**.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

## Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD and FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.56 MOV (scalar) (A64 SIMD)

Move vector element to scalar.

This instruction is an alias of **DUP** (element).

The equivalent instruction is **DUP Vd, Vn.T[index]**.

## Syntax

**MOV Vd, Vn.T[index]**

Where:

**v**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**vn**

Is the name of the SIMD and FP source register.

**t**

Is the element width specifier, and can be one of the values shown in Usage.

**index**

Is the element index, in the range shown in Usage.

## Usage

Move vector element to scalar. This instruction duplicates the specified vector element in the SIMD and FP source register into a scalar, and writes the result to the SIMD and FP destination register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-10: MOV (Scalar) specifier combinations**

| v | t | index   |
|---|---|---------|
| B | B | 0 to 15 |
| H | H | 0 to 7  |
| S | S | 0 to 3  |
| D | D | 0 or 1  |

## Related information

[DUP \(scalar, element\) \(A64 SIMD\)](#) on page 1038

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.57 NEG (scalar) (A64 SIMD)

Negate (vector).

### Syntax

NEG *vd*, *vn*

Where:

**v**

Is a width specifier, *d*.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

## Usage

Negate (vector). This instruction reads each vector element from the source SIMD and FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.58 SCVTF (scalar, fixed-point) (A64 SIMD)

Signed fixed-point Convert to Floating-point (vector).

### Syntax

`SCVTF Vd, vn, #fbits`

Where:

**v**

Is a width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**fbits**

Is the number of fractional bits, in the range 1 to the operand width.

## Usage

Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-11: SCVTF (Scalar) specifier combinations**

| <b>v</b> | <b>fbits</b> |
|----------|--------------|
| H        | -            |
| S        | 1 to 32      |
| D        | 1 to 64      |

#### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.59 SCVTF (scalar, integer) (A64 SIMD)

Signed integer Convert to Floating-point (vector).

#### Syntax

SCVTF *Hd*, *Hn* ; Scalar half precision

SCVTF *Vd*, *Vn* ; Scalar single-precision and double-precision

Where:

***Hd***

Is the 16-bit name of the SIMD and FP destination register.

***Hn***

Is the 16-bit name of the SIMD and FP source register.

***V***

Is a width specifier, and can be either **s** or **d**.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the SIMD and FP source register.

#### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

## Usage

Signed integer Convert to Floating-point (vector). This instruction converts each element in a vector from signed integer to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.60 SHL (scalar) (A64 SIMD)

Shift Left (immediate).

### Syntax

`SHL Vd, Vn, #shift`

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**shift**

Is the left shift amount, in the range 0 to 63.

## Usage

Shift Left (immediate). This instruction reads each value from a vector, right shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.61 SLI (scalar) (A64 SIMD)

Shift Left and Insert (immediate).

### Syntax

```
SLI Vd, Vn, #shift
```

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**shift**

Is the left shift amount, in the range 0 to 63.

### Usage

Shift Left and Insert (immediate). This instruction reads each vector element in the source SIMD and FP register, left shifts each vector element by an immediate value, and inserts the result into the corresponding vector element in the destination SIMD and FP register such that the new zero bits created by the shift are not inserted but retain their existing value. Bits shifted out of the left of each vector element in the source register are lost.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.62 SQABS (scalar) (A64 SIMD)

Signed saturating Absolute value.

### Syntax

```
SQABS Vd, Vn
```

Where:

**v**

Is a width specifier, and can be one of B, H, S or D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

## Usage

Signed saturating Absolute value. This instruction reads each vector element from the source SIMD and FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are signed integer values.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.63 SQADD (scalar) (A64 SIMD)

Signed saturating Add.

### Syntax

SQADD *Vd*, *Vn*, *Vm*

Where:

**v**

Is a width specifier, and can be one of **B**, **H**, **S** or **D**.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD and FP registers, places the results into a vector, and writes the vector to the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.64 SQDMLAL (scalar, by element) (A64 SIMD)

Signed saturating Doubling Multiply-Add Long (by element).

### Syntax

`SQDMLAL Vad, Vbn, Vm.Ts[index]`

Where:

**Va**

Is the destination width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**Vb**

Is the source width specifier, and can be either h or s.

**n**

Is the number of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register:

- If *Ts* is h, then *Vm* must be in the range V0 to V15.
- If *Ts* is s, then *Vm* must be in the range V0 to V31.

**Ts**

Is an element size specifier, and can be either h or s.

**index**

Is the element index, in the range shown in Usage.

### Usage

Signed saturating Doubling Multiply-Add Long (by element). This instruction multiplies each vector element in the lower or upper half of the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register, doubles the results, and accumulates the final results with the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `SQDMLAL` instruction extracts vector elements from the lower half of the first source register, while the `SQDMLAL2` instruction extracts vector elements from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-12: SQDMLAL (Scalar) specifier combinations**

| <b><i>Va</i></b> | <b><i>Vb</i></b> | <b><i>Ts</i></b> | <b><i>index</i></b> |
|------------------|------------------|------------------|---------------------|
| S                | H                | H                | 0 to 7              |
| D                | S                | S                | 0 to 3              |

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.65 SQDMLAL (scalar) (A64 SIMD)

Signed saturating Doubling Multiply-Add Long.

### Syntax

`SQDMLAL Vad, Vbn, Vbm`

Where:

***Va***

Is the destination width specifier, and can be either `s` or `d`.

***d***

Is the number of the SIMD and FP destination register.

***Vb***

Is the source width specifier, and can be either `H` or `S`.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

### Usage

Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD and FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `SQDMLAL` instruction extracts each source vector from the lower half of each source register, while the `SQDMLAL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-13: SQDMLAL (Scalar) specifier combinations**

| <b><i>Va</i></b> | <b><i>Vb</i></b> |
|------------------|------------------|
| S                | H                |
| D                | S                |

#### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.66 SQDMLSL (scalar, by element) (A64 SIMD)

Signed saturating Doubling Multiply-Subtract Long (by element).

#### Syntax

`SQDMLSL Vad, Vbn, Vm.Ts[index]`

Where:

***Va***

Is the destination width specifier, and can be either `s` or `d`.

***d***

Is the number of the SIMD and FP destination register.

***Vb***

Is the source width specifier, and can be either `H` or `s`.

***n***

Is the number of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register:

- If `Ts` is `H`, then `Vm` must be in the range V0 to V15.
- If `Ts` is `s`, then `Vm` must be in the range V0 to V31.

***Ts***

Is an element size specifier, and can be either `H` or `s`.

***index***

Is the element index, in the range shown in Usage.

**Usage**

Signed saturating Doubling Multiply-Subtract Long (by element). This instruction multiplies each vector element in the lower or upper half of the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register, doubles the results, and subtracts the final results from the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are signed integer values.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `SQDMLSL` instruction extracts vector elements from the lower half of the first source register, while the `SQDMLSL2` instruction extracts vector elements from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-14: SQDMLSL (Scalar) specifier combinations**

| <b><i>Va</i></b> | <b><i>Vb</i></b> | <b><i>Ts</i></b> | <b><i>index</i></b> |
|------------------|------------------|------------------|---------------------|
| S                | H                | H                | 0 to 7              |
| D                | S                | S                | 0 to 3              |

**Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.67 SQDMLSL (scalar) (A64 SIMD)

Signed saturating Doubling Multiply-Subtract Long.

**Syntax**

`SQDMLSL Vad, Vbn, Vbm`

Where:

***Va***

Is the destination width specifier, and can be either `s` or `d`.

***d***

Is the number of the SIMD and FP destination register.

***v<sub>b</sub>***

Is the source width specifier, and can be either `H` or `S`.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

## Usage

Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD and FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `SQDMILSL` instruction extracts each source vector from the lower half of each source register, while the `SQDMILSL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-15: SQDMILSL (Scalar) specifier combinations**

| <b><i>V<sub>a</sub></i></b> | <b><i>V<sub>b</sub></i></b> |
|-----------------------------|-----------------------------|
| S                           | H                           |
| D                           | S                           |

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.68 SQDMULH (scalar, by element) (A64 SIMD)

Signed saturating Doubling Multiply returning High half (by element).

### Syntax

`SQDMULH Vd, Vn, Vm.Ts[index]`

Where:

***v***

Is a width specifier, and can be either `H` or `S`.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register:

- If  $\tau_s$  is H, then  $v_m$  must be in the range V0 to V15.
- If  $\tau_s$  is S, then  $v_m$  must be in the range V0 to V31.

***Ts***

Is an element size specifier, and can be either H or S.

***index***

Is the element index, in the range shown in Usage.

## Usage

Signed saturating Doubling Multiply returning High half (by element). This instruction multiplies each vector element in the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD and FP register.

The results are truncated. For rounded results, see [SQRDMULH \(scalar, by element\) \(A64 SIMD\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-16: SQDMULH (Scalar) specifier combinations**

| <b><i>V</i></b> | <b><i>Ts</i></b> | <b><i>index</i></b> |
|-----------------|------------------|---------------------|
| H               | H                | 0 to 7              |
| S               | S                | 0 to 3              |

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.69 SQDMULH (scalar) (A64 SIMD)

Signed saturating Doubling Multiply returning High half.

### Syntax

SQDMULH  $Vd, Vn, Vm$

Where:

**v**

Is a width specifier, and can be either `H` or `S`.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD and FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD and FP register.

The results are truncated. For rounded results, see [SQRDMULH \(scalar\) \(A64 SIMD\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.70 SQDMULL (scalar, by element) (A64 SIMD)

Signed saturating Doubling Multiply Long (by element).

### Syntax

`SQDMULL Vad, Vbn, Vm.Ts[index]`

Where:

**Va**

Is the destination width specifier, and can be either `S` or `D`.

**d**

Is the number of the SIMD and FP destination register.

**Vb**

Is the source width specifier, and can be either `H` or `S`.

**n**

Is the number of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register:

- If  $\tau_s$  is  $\mathbf{H}$ , then  $v_m$  must be in the range V0 to V15.
- If  $\tau_s$  is  $\mathbf{s}$ , then  $v_m$  must be in the range V0 to V31.

***Ts***

Is an element size specifier, and can be either  $\mathbf{H}$  or  $\mathbf{s}$ .

***index***

Is the element index, in the range shown in Usage.

**Usage**

Signed saturating Doubling Multiply Long (by element). This instruction multiplies each vector element in the lower or upper half of the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are signed integer values.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The SQDMULL instruction extracts the first source vector from the lower half of the first source register, while the SQDMULL2 instruction extracts the first source vector from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-17: SQDMULL (Scalar) specifier combinations**

| <b><i>Va</i></b> | <b><i>Vb</i></b> | <b><i>Ts</i></b> | <b><i>index</i></b> |
|------------------|------------------|------------------|---------------------|
| S                | H                | H                | 0 to 7              |
| D                | S                | S                | 0 to 3              |

**Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.71 SQDMULL (scalar) (A64 SIMD)

Signed saturating Doubling Multiply Long.

**Syntax**

SQDMULL  $Vad, Vbn, Vbm$

Where:

***v<sub>a</sub>***

Is the destination width specifier, and can be either s or d.

***d***

Is the number of the SIMD and FP destination register.

***v<sub>b</sub>***

Is the source width specifier, and can be either h or s.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

## Usage

Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD and FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The SQDMULL instruction extracts each source vector from the lower half of each source register, while the SQDMULL2 instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-18: SQDMULL (Scalar) specifier combinations**

| <b><i>v<sub>a</sub></i></b> | <b><i>v<sub>b</sub></i></b> |
|-----------------------------|-----------------------------|
| S                           | H                           |
| D                           | S                           |

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.72 SQNEG (scalar) (A64 SIMD)

Signed saturating Negate.

### Syntax

SQNEG *Vd, Vn*

Where:

**v**

Is a width specifier, and can be one of **b**, **h**, **s** or **d**.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

## Usage

Signed saturating Negate. This instruction reads each vector element from the source SIMD and FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are signed integer values.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.73 SQRDMLAH (scalar, by element) (A64 SIMD)

Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element).

### Syntax

`SQRDMLAH vd, vn, vm.Ts[index]`

Where:

**v**

Is a width specifier, and can be either **h** or **s**.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**vm**

Is the name of the second SIMD and FP source register:

- If  $ts$  is **h**, then  $vm$  must be in the range V0 to V15.
- If  $ts$  is **s**, then  $vm$  must be in the range V0 to V31.

***Ts***

Is an element size specifier, and can be either `H` or `S`.

***index***

Is the element index, in the range shown in Usage.

**Architectures supported (scalar)**

Supported in the Arm®v8.1 architecture and later.

**Usage**

Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element). This instruction multiplies the vector elements of the first source SIMD and FP register with the value of a vector element of the second source SIMD and FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD and FP register. The results are rounded.

If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-19: SQRDMLAH (Scalar) specifier combinations**

| <b>V</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| H        | H         | 0 to 7       |
| S        | S         | 0 to 3       |

**Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.74 SQRDMLAH (scalar) (A64 SIMD)

Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector).

**Syntax**

`SQRDMLAH Vd, Vn,Vm`

Where:

**v**

Is a width specifier, and can be either `H` or `S`.

**d**

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.1 architecture and later.

### Usage

Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD and FP register with the corresponding vector elements of the second source SIMD and FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD and FP register. The results are rounded.

If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.75 SQRDMLSH (scalar, by element) (A64 SIMD)

Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element).

### Syntax

`SQRDMLSH Vd, Vn, Vm.Ts[index]`

Where:

***V***

Is a width specifier, and can be either `H` or `S`.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register:

- If `Ts` is `H`, then `Vm` must be in the range V0 to V15.
- If `Ts` is `S`, then `Vm` must be in the range V0 to V31.

***Ts***

Is an element size specifier, and can be either `H` or `S`.

***index***

Is the element index, in the range shown in Usage.

**Architectures supported (scalar)**

Supported in the Arm®v8.1 architecture and later.

**Usage**

Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element). This instruction multiplies the vector elements of the first source SIMD and FP register with the value of a vector element of the second source SIMD and FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD and FP register. The results are rounded.

If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-20: SQRDMLSH (Scalar) specifier combinations**

| <b>V</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| H        | H         | 0 to 7       |
| S        | S         | 0 to 3       |

**Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.76 SQRDMLSH (scalar) (A64 SIMD)

Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector).

**Syntax**

`SQRDMLSH Vd, Vn,Vm`

Where:

**v**

Is a width specifier, and can be either `H` or `S`.

**d**

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.1 architecture and later.

### Usage

Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD and FP register with the corresponding vector elements of the second source SIMD and FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD and FP register. The results are rounded.

If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.77 SQRDMLH (scalar, by element) (A64 SIMD)

Signed saturating Rounding Doubling Multiply returning High half (by element).

### Syntax

`SQRDMLH Vd, Vn, Vm.Ts[index]`

Where:

***V***

Is a width specifier, and can be either `H` or `S`.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register:

- If `Ts` is `H`, then `Vm` must be in the range V0 to V15.
- If `Ts` is `S`, then `Vm` must be in the range V0 to V31.

***Ts***

Is an element size specifier, and can be either `H` or `S`.

***index***

Is the element index, in the range shown in Usage.

**Usage**

Signed saturating Rounding Doubling Multiply returning High half (by element). This instruction multiplies each vector element in the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD and FP register.

The results are rounded. For truncated results, see [SQDMULH \(scalar, by element\) \(A64 SIMD\)](#).

If any of the results overflows, they are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-21: SQRDMULH (Scalar) specifier combinations**

| <b>v</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| H        | H         | 0 to 7       |
| S        | S         | 0 to 3       |

**Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.78 SQRDMULH (scalar) (A64 SIMD)

Signed saturating Rounding Doubling Multiply returning High half.

**Syntax**

`SQRDMULH Vd, Vn,Vm`

Where:

**v**

Is a width specifier, and can be either `H` or `S`.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD and FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD and FP register.

The results are rounded. For truncated results, see [SQDMULH \(scalar\) \(A64 SIMD\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.79 SQRSHL (scalar) (A64 SIMD)

Signed saturating Rounding Shift Left (register).

### Syntax

`SQRSHL vd, vn, vm`

Where:

**v**

Is a width specifier, and can be one of `B`, `H`, `S` or `D`.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD and FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. For truncated results, see [SQSHL \(scalar, register\) \(A64 SIMD\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.80 SQRSHRN (scalar) (A64 SIMD)

Signed saturating Rounded Shift Right Narrow (immediate).

### Syntax

`SQRSHRN Vbd, Van, #shift`

Where:

***v<sub>b</sub>***

Is the destination width specifier, and can be one of the values shown in Usage.

***d***

Is the number of the SIMD and FP destination register.

***v<sub>a</sub>***

Is the source width specifier, and can be one of the values shown in Usage.

***n***

Is the number of the first SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to the destination operand width in bits, and can be one of the values shown in Usage.

### Usage

Signed saturating Rounded Shift Right Narrow (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, saturates each shifted result to a value that is half the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. All the values in this instruction are signed integer values. The destination vector elements are half as long as the source vector elements. The results are rounded. For truncated results, see [SQSHRN \(scalar\) \(A64 SIMD\)](#).

The `SQRSHRN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `SQRSHRN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-22: SQRSHRN (Scalar) specifier combinations**

| <b><i>Vb</i></b> | <b><i>Va</i></b> | <b><i>shift</i></b> |
|------------------|------------------|---------------------|
| B                | H                | 1 to 8              |
| H                | S                | 1 to 16             |
| S                | D                | 1 to 32             |

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.81 SQRSHRUN (scalar) (A64 SIMD)

Signed saturating Rounded Shift Right Unsigned Narrow (immediate).

### Syntax

`SQRSHRUN Vbd, Van, #shift`

Where:

***Vb***

Is the destination width specifier, and can be one of the values shown in Usage.

***d***

Is the number of the SIMD and FP destination register.

***Va***

Is the source width specifier, and can be one of the values shown in Usage.

***n***

Is the number of the first SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to the destination operand width in bits, and can be one of the values shown in Usage.

### Usage

Signed saturating Rounded Shift Right Unsigned Narrow (immediate). This instruction reads each signed integer value in the vector of the source SIMD and FP register, right shifts each value by an

immediate value, saturates the result to an unsigned integer value that is half the original width, places the final result into a vector, and writes the vector to the destination SIMD and FP register. The results are rounded. For truncated results, see [SQSHRUN \(scalar\) \(A64 SIMD\)](#).

The `SQRSHRUN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `SQRSHRUN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-23: SQRSHRUN (Scalar) specifier combinations**

| <b><i>vb</i></b> | <b><i>va</i></b> | <b><i>shift</i></b> |
|------------------|------------------|---------------------|
| B                | H                | 1 to 8              |
| H                | S                | 1 to 16             |
| S                | D                | 1 to 32             |

#### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.82 SQSHL (scalar, immediate) (A64 SIMD)

Signed saturating Shift Left (immediate).

#### Syntax

`SQSHL Vd, Vn, #shift`

Where:

**v**

Is a width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**shift**

Is the left shift amount, in the range 0 to the operand width in bits minus 1, and can be one of the values shown in Usage.

## Usage

Signed saturating Shift Left (immediate). This instruction reads each vector element in the source SIMD and FP register, shifts each result by an immediate value, places the final result in a vector, and writes the vector to the destination SIMD and FP register. The results are truncated. For rounded results, see [UQRSHL \(scalar\) \(A64 SIMD\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-24: SQSHL (Scalar) specifier combinations**

| v | shift   |
|---|---------|
| B | 0 to 7  |
| H | 0 to 15 |
| S | 0 to 31 |
| D | 0 to 63 |

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.83 SQSHL (scalar, register) (A64 SIMD)

Signed saturating Shift Left (register).

### Syntax

`SQSHL Vd, Vn,Vm`

Where:

**v**

Is a width specifier, and can be one of `B`, `H`, `S` or `D`.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD and FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. For rounded results, see [SQRSHL \(scalar\) \(A64 SIMD\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.84 SQSHLU (scalar) (A64 SIMD)

Signed saturating Shift Left Unsigned (immediate).

### Syntax

`SQSHLU vd, vn, #shift`

Where:

**v**

Is a width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**shift**

Is the left shift amount, in the range 0 to the operand width in bits minus 1, and can be one of the values shown in Usage.

## Usage

Signed saturating Shift Left Unsigned (immediate). This instruction reads each signed integer value in the vector of the source SIMD and FP register, shifts each value by an immediate value, saturates the shifted result to an unsigned integer value, places the result in a vector, and writes the vector to the destination SIMD and FP register. The results are truncated. For rounded results, see [UQRSHL \(scalar\) \(A64 SIMD\)](#).

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-25: SQSHLU (Scalar) specifier combinations**

| <b>v</b> | <b>shift</b> |
|----------|--------------|
| B        | 0 to 7       |
| H        | 0 to 15      |
| S        | 0 to 31      |
| D        | 0 to 63      |

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.85 SQSHRN (scalar) (A64 SIMD)

Signed saturating Shift Right Narrow (immediate).

### Syntax

`SQSHRN Vbd, Van, #shift`

Where:

**v<sub>b</sub>**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**v<sub>a</sub>**

Is the source width specifier, and can be one of the values shown in Usage.

**n**

Is the number of the first SIMD and FP source register.

**shift**

Is the right shift amount, in the range 1 to the destination operand width in bits, and can be one of the values shown in Usage.

### Usage

Signed saturating Shift Right Narrow (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts and truncates each result by an immediate value, saturates each shifted result to a value that is half the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. All the values in this instruction are signed integer values. The destination vector elements are half as long as the source vector elements. For rounded results, see [SQRSHRN \(scalar\) \(A64 SIMD\)](#).

The `SQSHRN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `SQSHRN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-26: SQSHRN (Scalar) specifier combinations**

| <b><i>vb</i></b> | <b><i>va</i></b> | <b><i>shift</i></b> |
|------------------|------------------|---------------------|
| B                | H                | 1 to 8              |
| H                | S                | 1 to 16             |
| S                | D                | 1 to 32             |

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.86 SQSHRUN (scalar) (A64 SIMD)

Signed saturating Shift Right Unsigned Narrow (immediate).

### Syntax

`SQSHRUN Vbd, Van, #shift`

Where:

***vb***

Is the destination width specifier, and can be one of the values shown in Usage.

***d***

Is the number of the SIMD and FP destination register.

***va***

Is the source width specifier, and can be one of the values shown in Usage.

***n***

Is the number of the first SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to the destination operand width in bits, and can be one of the values shown in Usage.

### Usage

Signed saturating Shift Right Unsigned Narrow (immediate). This instruction reads each signed integer value in the vector of the source SIMD and FP register, right shifts each value by an

immediate value, saturates the result to an unsigned integer value that is half the original width, places the final result into a vector, and writes the vector to the destination SIMD and FP register. The results are truncated. For rounded results, see [SQRSHRUN \(scalar\) \(A64 SIMD\)](#).

The `SQSHRUN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `SQSHRUN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-27: SQSHRUN (Scalar) specifier combinations**

| <b><i>vb</i></b> | <b><i>va</i></b> | <b><i>shift</i></b> |
|------------------|------------------|---------------------|
| B                | H                | 1 to 8              |
| H                | S                | 1 to 16             |
| S                | D                | 1 to 32             |

#### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.87 SQSUB (scalar) (A64 SIMD)

Signed saturating Subtract.

#### Syntax

`SQSUB Vd, Vn, Vm`

Where:

**v**

Is a width specifier, and can be one of `B`, `H`, `S` or `D`.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD and FP register from the corresponding element values of the first source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.88 SQXTN (scalar) (A64 SIMD)

Signed saturating extract Narrow.

### Syntax

`SQXTN Vbd, Van`

Where:

**Vb**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**Va**

Is the source width specifier, and can be one of the values shown in Usage.

**n**

Is the number of the SIMD and FP source register.

## Usage

Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD and FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `SQXTN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `SQXTN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-28: SQXTN (Scalar) specifier combinations**

| <b><i>Vb</i></b> | <b><i>Va</i></b> |
|------------------|------------------|
| B                | H                |
| H                | S                |
| S                | D                |

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.89 SQXTUN (scalar) (A64 SIMD)

Signed saturating extract Unsigned Narrow.

### Syntax

`SQXTUN Vbd, Van`

Where:

***Vb***

Is the destination width specifier, and can be one of the values shown in Usage.

***d***

Is the number of the SIMD and FP destination register.

***Va***

Is the source width specifier, and can be one of the values shown in Usage.

***n***

Is the number of the SIMD and FP source register.

### Usage

Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD and FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. The destination vector elements are half as long as the source vector elements.

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `SQXTUN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `SQXTUN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-29: SQXTUN (Scalar) specifier combinations**

| <b><i>Vb</i></b> | <b><i>Va</i></b> |
|------------------|------------------|
| B                | H                |
| H                | S                |
| S                | D                |

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.90 SRI (scalar) (A64 SIMD)

Shift Right and Insert (immediate).

### Syntax

`SRI Vd, Vn, #shift`

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to 64.

### Usage

Shift Right and Insert (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each vector element by an immediate value, and inserts the result into the corresponding vector element in the destination SIMD and FP register such that the new zero bits created by the shift are not inserted but retain their existing value. Bits shifted out of the right of each vector element of the source register are lost.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.91 SRSHL (scalar) (A64 SIMD)

Signed Rounding Shift Left (register).

### Syntax

`SRSHL Vd, Vn, Vm`

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

### Usage

Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD and FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift. For a truncating shift, see [SSHLL \(scalar\) \(A64 SIMD\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.92 SRSHR (scalar) (A64 SIMD)

Signed Rounding Shift Right (immediate).

### Syntax

`SRSHR Vd, Vn, #shift`

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**shift**

Is the right shift amount, in the range 1 to 64.

### Usage

Signed Rounding Shift Right (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, places the final result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are signed integer values. The results are rounded. For truncated results, see [SSHR \(scalar\) \(A64 SIMD\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.93 SRSRA (scalar) (A64 SIMD)

Signed Rounding Shift Right and Accumulate (immediate).

### Syntax

`SRSRA Vd, Vn, #shift`

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to 64.

## Usage

Signed Rounding Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of the destination SIMD and FP register. All the values in this instruction are signed integer values. The results are rounded. For truncated results, see [SSRA \(scalar\) \(A64 SIMD\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.94 SSHL (scalar) (A64 SIMD)

Signed Shift Left (register).

### Syntax

`SSHL Vd, Vn, Vm`

Where:

***v***

Is a width specifier, D.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

## Usage

Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD and FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift. For a rounding shift, see [SRSHL \(scalar\) \(A64 SIMD\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.95 SSHR (scalar) (A64 SIMD)

Signed Shift Right (immediate).

### Syntax

`SSHR Vd, Vn, #shift`

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**shift**

Is the right shift amount, in the range 1 to 64.

### Usage

Signed Shift Right (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, places the final result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are signed integer values. The results are truncated. For rounded results, see [SRSHR \(scalar\) \(A64 SIMD\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.96 SSRA (scalar) (A64 SIMD)

Signed Shift Right and Accumulate (immediate).

### Syntax

`SSRA Vd, Vn, #shift`

Where:

**v**

Is a width specifier, **d**.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**shift**

Is the right shift amount, in the range 1 to 64.

## Usage

Signed Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of the destination SIMD and FP register. All the values in this instruction are signed integer values. The results are truncated. For rounded results, see [SRSRA \(scalar\) \(A64 SIMD\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.97 SUB (scalar) (A64 SIMD)

Subtract (vector).

### Syntax

SUB **vd**, **vn**, **vm**

Where:

**v**

Is a width specifier, **d**.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Subtract (vector). This instruction subtracts each vector element in the second source SIMD and FP register from the corresponding vector element in the first source SIMD and FP register, places the result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.98 SUQADD (scalar) (A64 SIMD)

Signed saturating Accumulate of Unsigned value.

### Syntax

`SUQADD vd, vn`

Where:

**v**

Is a width specifier, and can be one of `B`, `H`, `S` or `D`.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

## Usage

Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD and FP register to corresponding signed integer values of the vector elements in the destination SIMD and FP register, and writes the resulting signed integer values to the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.99 UCVTF (scalar, fixed-point) (A64 SIMD)

Unsigned fixed-point Convert to Floating-point (vector).

### Syntax

```
UCVTF Vd, Vn, #fbits
```

Where:

**v**

Is a width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**fbits**

Is the number of fractional bits, in the range 1 to the operand width.

### Usage

Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-30: UCVTF (Scalar) specifier combinations**

| <b>v</b> | <b>fbits</b> |
|----------|--------------|
| H        | -            |
| S        | 1 to 32      |
| D        | 1 to 64      |

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.100 UCVTF (scalar, integer) (A64 SIMD)

Unsigned integer Convert to Floating-point (vector).

### Syntax

`UCVTF Hd, Hn ; Scalar half precision`

`UCVTF Vd, Vn ; Scalar single-precision and double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**V**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Architectures supported (scalar)

Supported in the Arm®v8.2 architecture and later.

### Usage

Unsigned integer Convert to Floating-point (vector). This instruction converts each element in a vector from an unsigned integer value to a floating-point value using the rounding mode that is specified by the FPCR, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.101 UQADD (scalar) (A64 SIMD)

Unsigned saturating Add.

### Syntax

`UQADD Vd, Vn, Vm`

Where:

**v**

Is a width specifier, and can be one of B, H, S or D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

### Usage

Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD and FP registers, places the results into a vector, and writes the vector to the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.102 UQRSHL (scalar) (A64 SIMD)

Unsigned saturating Rounding Shift Left (register).

### Syntax

`UQRSHL Vd, Vn, Vm`

Where:

**v**

Is a width specifier, and can be one of B, H, S or D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD and FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. For truncated results, see [UQSHL \(scalar, immediate\) \(A64 SIMD\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.103 UQRSHRN (scalar) (A64 SIMD)

Unsigned saturating Rounded Shift Right Narrow (immediate).

### Syntax

`UQRSHRN Vbd, Van, #shift`

Where:

**Vb**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**Va**

Is the source width specifier, and can be one of the values shown in Usage.

**n**

Is the number of the first SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to the destination operand width in bits, and can be one of the values shown in Usage.

**Usage**

Unsigned saturating Rounded Shift Right Narrow (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, puts the final result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. All the values in this instruction are unsigned integer values. The results are rounded. For truncated results, see [UQSHRN \(scalar\) \(A64 SIMD\)](#).

The `UQRSHRN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `UQRSHRN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-31: UQRSHRN (Scalar) specifier combinations**

| <b><i>vb</i></b> | <b><i>va</i></b> | <b><i>shift</i></b> |
|------------------|------------------|---------------------|
| B                | H                | 1 to 8              |
| H                | S                | 1 to 16             |
| S                | D                | 1 to 32             |

**Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.104 UQSHL (scalar, immediate) (A64 SIMD)

Unsigned saturating Shift Left (immediate).

**Syntax**

`UQSHL Vd, Vn, #shift`

Where:

**v**

Is a width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***shift***

Is the left shift amount, in the range 0 to the operand width in bits minus 1, and can be one of the values shown in Usage.

## Usage

Unsigned saturating Shift Left (immediate). This instruction takes each vector element in the source SIMD and FP register, shifts it by an immediate value, places the results in a vector, and writes the vector to the destination SIMD and FP register. The results are truncated. For rounded results, see [UQRSHL \(scalar\) \(A64 SIMD\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-32: UQSHL (Scalar) specifier combinations**

| <b>v</b> | <b>shift</b> |
|----------|--------------|
| B        | 0 to 7       |
| H        | 0 to 15      |
| S        | 0 to 31      |
| D        | 0 to 63      |

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.105 UQSHL (scalar, register) (A64 SIMD)

Unsigned saturating Shift Left (register).

### Syntax

`UQSHL Vd, Vn, Vm`

Where:

***v***

Is a width specifier, and can be one of `B`, `H`, `S` or `D`.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***m***

Is the number of the second SIMD and FP source register.

## Usage

Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD and FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. For rounded results, see [UQRSHL \(scalar\) \(A64 SIMD\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.106 UQSHRN (scalar) (A64 SIMD)

Unsigned saturating Shift Right Narrow (immediate).

### Syntax

`UQSHRN Vbd, Van, #shift`

Where:

***Vb***

Is the destination width specifier, and can be one of the values shown in Usage.

***d***

Is the number of the SIMD and FP destination register.

***Va***

Is the source width specifier, and can be one of the values shown in Usage.

***n***

Is the number of the first SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to the destination operand width in bits, and can be one of the values shown in Usage.

## Usage

Unsigned saturating Shift Right Narrow (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, saturates each shifted result to a value that is half the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. All the values in this instruction are unsigned integer values. The results are truncated. For rounded results, see [UQRSHRN \(scalar\) \(A64 SIMD\)](#).

The `UQSHRN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `UQSHRN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-33: UQSHRN (Scalar) specifier combinations**

| <b><i>Vb</i></b> | <b><i>Va</i></b> | <b><i>shift</i></b> |
|------------------|------------------|---------------------|
| B                | H                | 1 to 8              |
| H                | S                | 1 to 16             |
| S                | D                | 1 to 32             |

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.107 UQSUB (scalar) (A64 SIMD)

Unsigned saturating Subtract.

### Syntax

`UQSUB Vd, Vn,Vm`

Where:

**v**

Is a width specifier, and can be one of `B`, `H`, `S` or `D`.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD and FP register from the corresponding element values of the first source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.108 UQXTN (scalar) (A64 SIMD)

Unsigned saturating extract Narrow.

### Syntax

`UQXTN Vbd, Van`

Where:

**v<sub>b</sub>**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**v<sub>a</sub>**

Is the source width specifier, and can be one of the values shown in Usage.

**n**

Is the number of the SIMD and FP source register.

## Usage

Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD and FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are unsigned integer values.

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `UQXTN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `UQXTN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 20-34: UQXTN (Scalar) specifier combinations**

| <b><i>Vb</i></b> | <b><i>Va</i></b> |
|------------------|------------------|
| B                | H                |
| H                | S                |
| S                | D                |

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.109 URSHL (scalar) (A64 SIMD)

Unsigned Rounding Shift Left (register).

### Syntax

`URSHL Vd, Vn,Vm`

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

### Usage

Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD and FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.110 URSHR (scalar) (A64 SIMD)

Unsigned Rounding Shift Right (immediate).

### Syntax

`URSHR Vd, Vn, #shift`

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**shift**

Is the right shift amount, in the range 1 to 64.

### Usage

Unsigned Rounding Shift Right (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are unsigned integer values. The results are rounded. For truncated results, see [USHR \(scalar\) \(A64 SIMD\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.111 URSRA (scalar) (A64 SIMD)

Unsigned Rounding Shift Right and Accumulate (immediate).

### Syntax

`URSRA Vd, Vn, #shift`

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**shift**

Is the right shift amount, in the range 1 to 64.

## Usage

Unsigned Rounding Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of the destination SIMD and FP register. All the values in this instruction are unsigned integer values. The results are rounded. For truncated results, see [USRA \(scalar\) \(A64 SIMD\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.112 USHL (scalar) (A64 SIMD)

Unsigned Shift Left (register).

### Syntax

`USHL vd, vn, vm`

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**m**

Is the number of the second SIMD and FP source register.

## Usage

Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD and FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift. For a rounding shift, see [URSHL \(scalar\) \(A64 SIMD\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.113 USHR (scalar) (A64 SIMD)

Unsigned Shift Right (immediate).

### Syntax

`USHR Vd, Vn, #shift`

Where:

**v**

Is a width specifier, D.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the first SIMD and FP source register.

**shift**

Is the right shift amount, in the range 1 to 64.

## Usage

Unsigned Shift Right (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are unsigned integer values. The results are truncated. For rounded results, see [URSHR \(scalar\) \(A64 SIMD\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.114 USQADD (scalar) (A64 SIMD)

Unsigned saturating Accumulate of Signed value.

### Syntax

`USQADD Vd, Vn`

Where:

**v**

Is a width specifier, and can be one of `B`, `H`, `S` or `D`.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

### Usage

Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD and FP register to corresponding unsigned integer values of the vector elements in the destination SIMD and FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 20.115 USRA (scalar) (A64 SIMD)

Unsigned Shift Right and Accumulate (immediate).

### Syntax

`USRA Vd, Vn, #shift`

Where:

**v**

Is a width specifier, `D`.

***d***

Is the number of the SIMD and FP destination register.

***n***

Is the number of the first SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to 64.

## Usage

Unsigned Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of the destination SIMD and FP register. All the values in this instruction are unsigned integer values. The results are truncated. For rounded results, see [URSRA \(scalar\) \(A64 SIMD\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

# 21. A64 SIMD Vector Instructions

Describes the A64 SIMD vector instructions.

## 21.1 A64 SIMD Vector instructions in alphabetical order

A summary of the A64 SIMD Vector instructions that are supported.

**Table 21-1: Summary of A64 SIMD Vector instructions**

| Mnemonic                | Brief description                                     | See                                           |
|-------------------------|-------------------------------------------------------|-----------------------------------------------|
| ABS (vector)            | Absolute value (vector)                               | <a href="#">ABS (vector) (A64)</a>            |
| ADD (vector)            | Add (vector)                                          | <a href="#">ADD (vector) (A64)</a>            |
| ADDHN, ADDHN2 (vector)  | Add returning High Narrow                             | <a href="#">ADDHN, ADDHN2 (vector) (A64)</a>  |
| ADDP (vector)           | Add Pairwise (vector)                                 | <a href="#">ADDP (vector) (A64)</a>           |
| ADDV (vector)           | Add across Vector                                     | <a href="#">ADDV (vector) (A64)</a>           |
| AND (vector)            | Bitwise AND (vector)                                  | <a href="#">AND (vector) (A64)</a>            |
| BIC (vector, immediate) | Bitwise bit Clear (vector, immediate)                 | <a href="#">BIC (vector, immediate) (A64)</a> |
| BIC (vector, register)  | Bitwise bit Clear (vector, register)                  | <a href="#">BIC (vector, register) (A64)</a>  |
| BIF (vector)            | Bitwise Insert if False                               | <a href="#">BIF (vector) (A64)</a>            |
| BIT (vector)            | Bitwise Insert if True                                | <a href="#">BIT (vector) (A64)</a>            |
| BSL (vector)            | Bitwise Select                                        | <a href="#">BSL (vector) (A64)</a>            |
| CLS (vector)            | Count Leading Sign bits (vector)                      | <a href="#">CLS (vector) (A64)</a>            |
| CLZ (vector)            | Count Leading Zero bits (vector)                      | <a href="#">CLZ (vector) (A64)</a>            |
| CMEQ (vector, register) | Compare bitwise Equal (vector)                        | <a href="#">CMEQ (vector, register) (A64)</a> |
| CMEQ (vector, zero)     | Compare bitwise Equal to zero (vector)                | <a href="#">CMEQ (vector, zero) (A64)</a>     |
| CMGE (vector, register) | Compare signed Greater than or Equal (vector)         | <a href="#">CMGE (vector, register) (A64)</a> |
| CMGE (vector, zero)     | Compare signed Greater than or Equal to zero (vector) | <a href="#">CMGE (vector, zero) (A64)</a>     |
| CMGT (vector, register) | Compare signed Greater than (vector)                  | <a href="#">CMGT (vector, register) (A64)</a> |
| CMGT (vector, zero)     | Compare signed Greater than zero (vector)             | <a href="#">CMGT (vector, zero) (A64)</a>     |
| CMHI (vector, register) | Compare unsigned Higher (vector)                      | <a href="#">CMHI (vector, register) (A64)</a> |
| CMHS (vector, register) | Compare unsigned Higher or Same (vector)              | <a href="#">CMHS (vector, register) (A64)</a> |
| CMLE (vector, zero)     | Compare signed Less than or Equal to zero (vector)    | <a href="#">CMLE (vector, zero) (A64)</a>     |
| CMLT (vector, zero)     | Compare signed Less than zero (vector)                | <a href="#">CMLT (vector, zero) (A64)</a>     |
| CMTST (vector)          | Compare bitwise Test bits nonzero (vector)            | <a href="#">CMTST (vector) (A64)</a>          |
| CNT (vector)            | Population Count per byte                             | <a href="#">CNT (vector) (A64)</a>            |
| DUP (vector, element)   | vector                                                | <a href="#">DUP (vector, element) (A64)</a>   |
| DUP (vector, general)   | Duplicate general-purpose register to vector          | <a href="#">DUP (vector, general) (A64)</a>   |
| EOR (vector)            | Bitwise Exclusive OR (vector)                         | <a href="#">EOR (vector) (A64)</a>            |

| Mnemonic                 | Brief description                                                                          | See                                            |
|--------------------------|--------------------------------------------------------------------------------------------|------------------------------------------------|
| EXT (vector)             | Extract vector from pair of vectors                                                        | <a href="#">EXT (vector) (A64)</a>             |
| FABD (vector)            | Floating-point Absolute Difference (vector)                                                | <a href="#">FABD (vector) (A64)</a>            |
| FABS (vector)            | Floating-point Absolute value (vector)                                                     | <a href="#">FABS (vector) (A64)</a>            |
| FACGE (vector)           | Floating-point Absolute Compare Greater than or Equal (vector)                             | <a href="#">FACGE (vector) (A64)</a>           |
| FACGT (vector)           | Floating-point Absolute Compare Greater than (vector)                                      | <a href="#">FACGT (vector) (A64)</a>           |
| FADD (vector)            | Floating-point Add (vector)                                                                | <a href="#">FADD (vector) (A64)</a>            |
| FADDP (vector)           | Floating-point Add Pairwise (vector)                                                       | <a href="#">FADDP (vector) (A64)</a>           |
| FCADD (vector)           | Floating-point Complex Add                                                                 | <a href="#">FCADD (vector) (A64)</a>           |
| FCMEQ (vector, register) | Floating-point Compare Equal (vector)                                                      | <a href="#">FCMEQ (vector, register) (A64)</a> |
| FCMQ (vector, zero)      | Floating-point Compare Equal to zero (vector)                                              | <a href="#">FCMQ (vector, zero) (A64)</a>      |
| FCMGE (vector, register) | Floating-point Compare Greater than or Equal (vector)                                      | <a href="#">FCMGE (vector, register) (A64)</a> |
| FCMGE (vector, zero)     | Floating-point Compare Greater than or Equal to zero (vector)                              | <a href="#">FCMGE (vector, zero) (A64)</a>     |
| FCMGT (vector, register) | Floating-point Compare Greater than (vector)                                               | <a href="#">FCMGT (vector, register) (A64)</a> |
| FCMGT (vector, zero)     | Floating-point Compare Greater than zero (vector)                                          | <a href="#">FCMGT (vector, zero) (A64)</a>     |
| FCMLA (vector)           | Floating-point Complex Multiply Accumulate                                                 | <a href="#">FCMLA (vector) (A64)</a>           |
| FCMLE (vector, zero)     | Floating-point Compare Less than or Equal to zero (vector)                                 | <a href="#">FCMLE (vector, zero) (A64)</a>     |
| FCMLT (vector, zero)     | Floating-point Compare Less than zero (vector)                                             | <a href="#">FCMLT (vector, zero) (A64)</a>     |
| FCVTAS (vector)          | Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector)   | <a href="#">FCVTAS (vector) (A64)</a>          |
| FCVTAU (vector)          | Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector) | <a href="#">FCVTAU (vector) (A64)</a>          |
| FCVTL, FCVTL2 (vector)   | Floating-point Convert to higher precision Long (vector)                                   | <a href="#">FCVTL, FCVTL2 (vector) (A64)</a>   |
| FCVTMS (vector)          | Floating-point Convert to Signed integer, rounding toward Minus infinity (vector)          | <a href="#">FCVTMS (vector) (A64)</a>          |
| FCVTMU (vector)          | Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector)        | <a href="#">FCVTMU (vector) (A64)</a>          |
| FCVTN, FCVTN2 (vector)   | Floating-point Convert to lower precision Narrow (vector)                                  | <a href="#">FCVTN, FCVTN2 (vector) (A64)</a>   |
| FCVTNS (vector)          | Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector)   | <a href="#">FCVTNS (vector) (A64)</a>          |
| FCVTNU (vector)          | Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector) | <a href="#">FCVTNU (vector) (A64)</a>          |

| Mnemonic                     | Brief description                                                                  | See                                                |
|------------------------------|------------------------------------------------------------------------------------|----------------------------------------------------|
| FCVTPS (vector)              | Floating-point Convert to Signed integer, rounding toward Plus infinity (vector)   | <a href="#">FCVTPS (vector) (A64)</a>              |
| FCVTPU (vector)              | Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector) | <a href="#">FCVTPU (vector) (A64)</a>              |
| FCVTXN, FCVTXN2 (vector)     | Floating-point Convert to lower precision Narrow, rounding to odd (vector)         | <a href="#">FCVTXN, FCVTXN2 (vector) (A64)</a>     |
| FCVTZS (vector, fixed-point) | Floating-point Convert to Signed fixed-point, rounding toward Zero (vector)        | <a href="#">FCVTZS (vector, fixed-point) (A64)</a> |
| FCVTZS (vector, integer)     | Floating-point Convert to Signed integer, rounding toward Zero (vector)            | <a href="#">FCVTZS (vector, integer) (A64)</a>     |
| FCVTZU (vector, fixed-point) | Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector)      | <a href="#">FCVTZU (vector, fixed-point) (A64)</a> |
| FCVTZU (vector, integer)     | Floating-point Convert to Unsigned integer, rounding toward Zero (vector)          | <a href="#">FCVTZU (vector, integer) (A64)</a>     |
| FDIV (vector)                | Floating-point Divide (vector)                                                     | <a href="#">FDIV (vector) (A64)</a>                |
| FMAX (vector)                | Floating-point Maximum (vector)                                                    | <a href="#">FMAX (vector) (A64)</a>                |
| FMAXNM (vector)              | Floating-point Maximum Number (vector)                                             | <a href="#">FMAXNM (vector) (A64)</a>              |
| FMAXNMP (vector)             | Floating-point Maximum Number Pairwise (vector)                                    | <a href="#">FMAXNMP (vector) (A64)</a>             |
| FMAXNMV (vector)             | Floating-point Maximum Number across Vector                                        | <a href="#">FMAXNMV (vector) (A64)</a>             |
| FMAXP (vector)               | Floating-point Maximum Pairwise (vector)                                           | <a href="#">FMAXP (vector) (A64)</a>               |
| FMAXV (vector)               | Floating-point Maximum across Vector                                               | <a href="#">FMAXV (vector) (A64)</a>               |
| FMIN (vector)                | Floating-point minimum (vector)                                                    | <a href="#">FMIN (vector) (A64)</a>                |
| FMINNM (vector)              | Floating-point Minimum Number (vector)                                             | <a href="#">FMINNM (vector) (A64)</a>              |
| FMINNMP (vector)             | Floating-point Minimum Number Pairwise (vector)                                    | <a href="#">FMINNMP (vector) (A64)</a>             |
| FMINNMV (vector)             | Floating-point Minimum Number across Vector                                        | <a href="#">FMINNMV (vector) (A64)</a>             |
| FMINP (vector)               | Floating-point Minimum Pairwise (vector)                                           | <a href="#">FMINP (vector) (A64)</a>               |
| FMINV (vector)               | Floating-point Minimum across Vector                                               | <a href="#">FMINV (vector) (A64)</a>               |
| FMLA (vector, by element)    | Floating-point fused Multiply-Add to accumulator (by element)                      | <a href="#">FMLA (vector, by element) (A64)</a>    |
| FMLA (vector)                | Floating-point fused Multiply-Add to accumulator (vector)                          | <a href="#">FMLA (vector) (A64)</a>                |
| FMLS (vector, by element)    | Floating-point fused Multiply-Subtract from accumulator (by element)               | <a href="#">FMLS (vector, by element) (A64)</a>    |
| FMLS (vector)                | Floating-point fused Multiply-Subtract from accumulator (vector)                   | <a href="#">FMLS (vector) (A64)</a>                |
| FMOV (vector, immediate)     | Floating-point move immediate (vector)                                             | <a href="#">FMOV (vector, immediate) (A64)</a>     |
| FMUL (vector, by element)    | Floating-point Multiply (by element)                                               | <a href="#">FMUL (vector, by element) (A64)</a>    |
| FMUL (vector)                | Floating-point Multiply (vector)                                                   | <a href="#">FMUL (vector) (A64)</a>                |
| FMULX (vector, by element)   | Floating-point Multiply extended (by element)                                      | <a href="#">FMULX (vector, by element) (A64)</a>   |
| FMULX (vector)               | Floating-point Multiply extended                                                   | <a href="#">FMULX (vector) (A64)</a>               |

| Mnemonic                          | Brief description                                                              | See                                                     |
|-----------------------------------|--------------------------------------------------------------------------------|---------------------------------------------------------|
| FNEG (vector)                     | Floating-point Negate (vector)                                                 | <a href="#">FNEG (vector) (A64)</a>                     |
| FRECPE (vector)                   | Floating-point Reciprocal Estimate                                             | <a href="#">FRECPE (vector) (A64)</a>                   |
| FRECPS (vector)                   | Floating-point Reciprocal Step                                                 | <a href="#">FRECPS (vector) (A64)</a>                   |
| FRECPX (vector)                   | Floating-point Reciprocal exponent (scalar)                                    | <a href="#">FRECPX (vector) (A64)</a>                   |
| FRINTA (vector)                   | Floating-point Round to Integral, to nearest with ties to Away (vector)        | <a href="#">FRINTA (vector) (A64)</a>                   |
| FRINTI (vector)                   | Floating-point Round to Integral, using current rounding mode (vector)         | <a href="#">FRINTI (vector) (A64)</a>                   |
| FRINTM (vector)                   | Floating-point Round to Integral, toward Minus infinity (vector)               | <a href="#">FRINTM (vector) (A64)</a>                   |
| FRINTN (vector)                   | Floating-point Round to Integral, to nearest with ties to even (vector)        | <a href="#">FRINTN (vector) (A64)</a>                   |
| FRINTP (vector)                   | Floating-point Round to Integral, toward Plus infinity (vector)                | <a href="#">FRINTP (vector) (A64)</a>                   |
| FRINTX (vector)                   | Floating-point Round to Integral exact, using current rounding mode (vector)   | <a href="#">FRINTX (vector) (A64)</a>                   |
| FRINTZ (vector)                   | Floating-point Round to Integral, toward Zero (vector)                         | <a href="#">FRINTZ (vector) (A64)</a>                   |
| FRSQRTE (vector)                  | Floating-point Reciprocal Square Root Estimate                                 | <a href="#">FRSQRTE (vector) (A64)</a>                  |
| FRSQRTS (vector)                  | Floating-point Reciprocal Square Root Step                                     | <a href="#">FRSQRTS (vector) (A64)</a>                  |
| FSQRT (vector)                    | Floating-point Square Root (vector)                                            | <a href="#">FSQRT (vector) (A64)</a>                    |
| FSUB (vector)                     | Floating-point Subtract (vector)                                               | <a href="#">FSUB (vector) (A64)</a>                     |
| INS (vector, element)             | Insert vector element from another vector element                              | <a href="#">INS (vector, element) (A64)</a>             |
| INS (vector, general)             | Insert vector element from general-purpose register                            | <a href="#">INS (vector, general) (A64)</a>             |
| LD1 (vector, multiple structures) | Load multiple single-element structures to one, two, three, or four registers  | <a href="#">LD1 (vector, multiple structures) (A64)</a> |
| LD1 (vector, single structure)    | Load one single-element structure to one lane of one register                  | <a href="#">LD1 (vector, single structure) (A64)</a>    |
| LD1R (vector)                     | Load one single-element structure and Replicate to all lanes (of one register) | <a href="#">LD1R (vector) (A64)</a>                     |
| LD2 (vector, multiple structures) | Load multiple 2-element structures to two registers                            | <a href="#">LD2 (vector, multiple structures) (A64)</a> |
| LD2 (vector, single structure)    | Load single 2-element structure to one lane of two registers                   | <a href="#">LD2 (vector, single structure) (A64)</a>    |
| LD2R (vector)                     | Load single 2-element structure and Replicate to all lanes of two registers    | <a href="#">LD2R (vector) (A64)</a>                     |
| LD3 (vector, multiple structures) | Load multiple 3-element structures to three registers                          | <a href="#">LD3 (vector, multiple structures) (A64)</a> |
| LD3 (vector, single structure)    | Load single 3-element structure to one lane of three registers                 | <a href="#">LD3 (vector, single structure) (A64)</a>    |
| LD3R (vector)                     | Load single 3-element structure and Replicate to all lanes of three registers  | <a href="#">LD3R (vector) (A64)</a>                     |

| Mnemonic                          | Brief description                                                            | See                                     |
|-----------------------------------|------------------------------------------------------------------------------|-----------------------------------------|
| LD4 (vector, multiple structures) | Load multiple 4-element structures to four registers                         | LD4 (vector, multiple structures) (A64) |
| LD4 (vector, single structure)    | Load single 4-element structure to one lane of four registers                | LD4 (vector, single structure) (A64)    |
| LD4R (vector)                     | Load single 4-element structure and Replicate to all lanes of four registers | LD4R (vector) (A64)                     |
| MLA (vector, by element)          | Multiply-Add to accumulator (vector, by element)                             | MLA (vector, by element) (A64)          |
| MLA (vector)                      | Multiply-Add to accumulator (vector)                                         | MLA (vector) (A64)                      |
| MLS (vector, by element)          | Multiply-Subtract from accumulator (vector, by element)                      | MLS (vector, by element) (A64)          |
| MLS (vector)                      | Multiply-Subtract from accumulator (vector)                                  | MLS (vector) (A64)                      |
| MOV (vector, element)             | Move vector element to another vector element                                | MOV (vector, element) (A64)             |
| MOV (vector, from general)        | Move general-purpose register to a vector element                            | MOV (vector, from general) (A64)        |
| MOV (vector)                      | Move vector                                                                  | MOV (vector) (A64)                      |
| MOV (vector, to general)          | Move vector element to general-purpose register                              | MOV (vector, to general) (A64)          |
| MOVI (vector)                     | Move Immediate (vector)                                                      | MOVI (vector) (A64)                     |
| MUL (vector, by element)          | Multiply (vector, by element)                                                | MUL (vector, by element) (A64)          |
| MUL (vector)                      | Multiply (vector)                                                            | MUL (vector) (A64)                      |
| MVN (vector)                      | Bitwise NOT (vector)                                                         | MVN (vector) (A64)                      |
| MVNI (vector)                     | Move inverted Immediate (vector)                                             | MVNI (vector) (A64)                     |
| NEG (vector)                      | Negate (vector)                                                              | NEG (vector) (A64)                      |
| NOT (vector)                      | Bitwise NOT (vector)                                                         | NOT (vector) (A64)                      |
| ORN (vector)                      | Bitwise inclusive OR NOT (vector)                                            | ORN (vector) (A64)                      |
| ORR (vector, immediate)           | Bitwise inclusive OR (vector, immediate)                                     | ORR (vector, immediate) (A64)           |
| ORR (vector, register)            | Bitwise inclusive OR (vector, register)                                      | ORR (vector, register) (A64)            |
| PMUL (vector)                     | Polynomial Multiply                                                          | PMUL (vector) (A64)                     |
| PMULL, PMULL2 (vector)            | Polynomial Multiply Long                                                     | PMULL, PMULL2 (vector) (A64)            |
| RADDHN, RADDHN2 (vector)          | Rounding Add returning High Narrow                                           | RADDHN, RADDHN2 (vector) (A64)          |
| RBIT (vector)                     | Reverse Bit order (vector)                                                   | RBIT (vector) (A64)                     |
| REV16 (vector)                    | Reverse elements in 16-bit halfwords (vector)                                | REV16 (vector) (A64)                    |
| REV32 (vector)                    | Reverse elements in 32-bit words (vector)                                    | REV32 (vector) (A64)                    |
| REV64 (vector)                    | Reverse elements in 64-bit doublewords (vector)                              | REV64 (vector) (A64)                    |
| RSHRN, RSHRN2 (vector)            | Rounding Shift Right Narrow (immediate)                                      | RSHRN, RSHRN2 (vector) (A64)            |
| RSUBHN, RSUBHN2 (vector)          | Rounding Subtract returning High Narrow                                      | RSUBHN, RSUBHN2 (vector) (A64)          |
| SABA (vector)                     | Signed Absolute difference and Accumulate                                    | SABA (vector) (A64)                     |
| SABAL, SABAL2 (vector)            | Signed Absolute difference and Accumulate Long                               | SABAL, SABAL2 (vector) (A64)            |
| SABD (vector)                     | Signed Absolute Difference                                                   | SABD (vector) (A64)                     |

| Mnemonic                               | Brief description                                              | See                                          |
|----------------------------------------|----------------------------------------------------------------|----------------------------------------------|
| SABDL, SABDL2 (vector)                 | Signed Absolute Difference Long                                | SABDL, SABDL2 (vector) (A64)                 |
| SADALP (vector)                        | Signed Add and Accumulate Long Pairwise                        | SADALP (vector) (A64)                        |
| SADDL, SADDL2 (vector)                 | Signed Add Long (vector)                                       | SADDL, SADDL2 (vector) (A64)                 |
| SADDLP (vector)                        | Signed Add Long Pairwise                                       | SADDLP (vector) (A64)                        |
| SADDLV (vector)                        | Signed Add Long across Vector                                  | SADDLV (vector) (A64)                        |
| SADDW, SADDW2 (vector)                 | Signed Add Wide                                                | SADDW, SADDW2 (vector) (A64)                 |
| SCVTF (vector, fixed-point)            | Signed fixed-point Convert to Floating-point (vector)          | SCVTF (vector, fixed-point) (A64)            |
| SCVTF (vector, integer)                | Signed integer Convert to Floating-point (vector)              | SCVTF (vector, integer) (A64)                |
| SHADD (vector)                         | Signed Halving Add                                             | SHADD (vector) (A64)                         |
| SHL (vector)                           | Shift Left (immediate)                                         | SHL (vector) (A64)                           |
| SHLL, SHLL2 (vector)                   | Shift Left Long (by element size)                              | SHLL, SHLL2 (vector) (A64)                   |
| SHRN, SHRN2 (vector)                   | Shift Right Narrow (immediate)                                 | SHRN, SHRN2 (vector) (A64)                   |
| SHSUB (vector)                         | Signed Halving Subtract                                        | SHSUB (vector) (A64)                         |
| SLI (vector)                           | Shift Left and Insert (immediate)                              | SLI (vector) (A64)                           |
| SMAX (vector)                          | Signed Maximum (vector)                                        | SMAX (vector) (A64)                          |
| SMAXP (vector)                         | Signed Maximum Pairwise                                        | SMAXP (vector) (A64)                         |
| SMAXV (vector)                         | Signed Maximum across Vector                                   | SMAXV (vector) (A64)                         |
| SMIN (vector)                          | Signed Minimum (vector)                                        | SMIN (vector) (A64)                          |
| SMINP (vector)                         | Signed Minimum Pairwise                                        | SMINP (vector) (A64)                         |
| SMINV (vector)                         | Signed Minimum across Vector                                   | SMINV (vector) (A64)                         |
| SMLAL, SMLAL2 (vector, by element)     | Signed Multiply-Add Long (vector, by element)                  | SMLAL, SMLAL2 (vector, by element) (A64)     |
| SMLAL, SMLAL2 (vector)                 | Signed Multiply-Add Long (vector)                              | SMLAL, SMLAL2 (vector) (A64)                 |
| SMLS L, SMLS L2 (vector, by element)   | Signed Multiply-Subtract Long (vector, by element)             | SMLS L, SMLS L2 (vector, by element) (A64)   |
| SMLS L, SMLS L2 (vector)               | Signed Multiply-Subtract Long (vector)                         | SMLS L, SMLS L2 (vector) (A64)               |
| SMOV (vector)                          | Signed Move vector element to general-purpose register         | SMOV (vector) (A64)                          |
| SMULL, SMULL2 (vector, by element)     | Signed Multiply Long (vector, by element)                      | SMULL, SMULL2 (vector, by element) (A64)     |
| SMULL, SMULL2 (vector)                 | Signed Multiply Long (vector)                                  | SMULL, SMULL2 (vector) (A64)                 |
| SQABS (vector)                         | Signed saturating Absolute value                               | SQABS (vector) (A64)                         |
| SQADD (vector)                         | Signed saturating Add                                          | SQADD (vector) (A64)                         |
| SQDMLAL, SQDMLAL2 (vector, by element) | Signed saturating Doubling Multiply-Add Long (by element)      | SQDMLAL, SQDMLAL2 (vector, by element) (A64) |
| SQDMLAL, SQDMLAL2 (vector)             | Signed saturating Doubling Multiply-Add Long                   | SQDMLAL, SQDMLAL2 (vector) (A64)             |
| SQDMLSL, SQDMLSL2 (vector, by element) | Signed saturating Doubling Multiply-Subtract Long (by element) | SQDMLSL, SQDMLSL2 (vector, by element) (A64) |
| SQDMLSL, SQDMLSL2 (vector)             | Signed saturating Doubling Multiply-Subtract Long              | SQDMLSL, SQDMLSL2 (vector) (A64)             |

| Mnemonic                               | Brief description                                                                        | See                                          |
|----------------------------------------|------------------------------------------------------------------------------------------|----------------------------------------------|
| SQDMULH (vector, by element)           | Signed saturating Doubling Multiply returning High half (by element)                     | SQDMULH (vector, by element) (A64)           |
| SQDMULH (vector)                       | Signed saturating Doubling Multiply returning High half                                  | SQDMULH (vector) (A64)                       |
| SQDMULL, SQDMULL2 (vector, by element) | Signed saturating Doubling Multiply Long (by element)                                    | SQDMULL, SQDMULL2 (vector, by element) (A64) |
| SQDMULL, SQDMULL2 (vector)             | Signed saturating Doubling Multiply Long                                                 | SQDMULL, SQDMULL2 (vector) (A64)             |
| SQNEG (vector)                         | Signed saturating Negate                                                                 | SQNEG (vector) (A64)                         |
| SQRDMLAH (vector, by element)          | Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element) | SQRDMLAH (vector, by element) (A64)          |
| SQRDMLAH (vector)                      | Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector)     | SQRDMLAH (vector) (A64)                      |
| SQRDMLSH (vector, by element)          | Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element)   | SQRDMLSH (vector, by element) (A64)          |
| SQRDMLSH (vector)                      | Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector)       | SQRDMLSH (vector) (A64)                      |
| SQRDMULH (vector, by element)          | Signed saturating Rounding Doubling Multiply returning High half (by element)            | SQRDMULH (vector, by element) (A64)          |
| SQRDMULH (vector)                      | Signed saturating Rounding Doubling Multiply returning High half                         | SQRDMULH (vector) (A64)                      |
| SQRSHL (vector)                        | Signed saturating Rounding Shift Left (register)                                         | SQRSHL (vector) (A64)                        |
| SQRSHRN, SQRSHRN2 (vector)             | Signed saturating Rounded Shift Right Narrow (immediate)                                 | SQRSHRN, SQRSHRN2 (vector) (A64)             |
| SQRSHRUN, SQRSHRUN2 (vector)           | Signed saturating Rounded Shift Right Unsigned Narrow (immediate)                        | SQRSHRUN, SQRSHRUN2 (vector) (A64)           |
| SQSHL (vector, immediate)              | Signed saturating Shift Left (immediate)                                                 | SQSHL (vector, immediate) (A64)              |
| SQSHL (vector, register)               | Signed saturating Shift Left (register)                                                  | SQSHL (vector, register) (A64)               |
| SQSHLU (vector)                        | Signed saturating Shift Left Unsigned (immediate)                                        | SQSHLU (vector) (A64)                        |
| SQSHRN, SQSHRN2 (vector)               | Signed saturating Shift Right Narrow (immediate)                                         | SQSHRN, SQSHRN2 (vector) (A64)               |
| SQSHRUN, SQSHRUN2 (vector)             | Signed saturating Shift Right Unsigned Narrow (immediate)                                | SQSHRUN, SQSHRUN2 (vector) (A64)             |
| SQSUB (vector)                         | Signed saturating Subtract                                                               | SQSUB (vector) (A64)                         |
| SQXTN, SQXTN2 (vector)                 | Signed saturating extract Narrow                                                         | SQXTN, SQXTN2 (vector) (A64)                 |
| SQXTUN, SQXTUN2 (vector)               | Signed saturating extract Unsigned Narrow                                                | SQXTUN, SQXTUN2 (vector) (A64)               |
| SRHADD (vector)                        | Signed Rounding Halving Add                                                              | SRHADD (vector) (A64)                        |
| SRI (vector)                           | Shift Right and Insert (immediate)                                                       | SRI (vector) (A64)                           |
| SRSHL (vector)                         | Signed Rounding Shift Left (register)                                                    | SRSHL (vector) (A64)                         |
| SRSHR (vector)                         | Signed Rounding Shift Right (immediate)                                                  | SRSHR (vector) (A64)                         |

| Mnemonic                          | Brief description                                                                | See                                                     |
|-----------------------------------|----------------------------------------------------------------------------------|---------------------------------------------------------|
| SRSRA (vector)                    | Signed Rounding Shift Right and Accumulate (immediate)                           | <a href="#">SRSRA (vector) (A64)</a>                    |
| SSHLL (vector)                    | Signed Shift Left (register)                                                     | <a href="#">SSHLL (vector) (A64)</a>                    |
| SSHLL2 (vector)                   | Signed Shift Left Long (immediate)                                               | <a href="#">SSHLL, SSHLL2 (vector) (A64)</a>            |
| SSHR (vector)                     | Signed Shift Right (immediate)                                                   | <a href="#">SSHR (vector) (A64)</a>                     |
| SSRA (vector)                     | Signed Shift Right and Accumulate (immediate)                                    | <a href="#">SSRA (vector) (A64)</a>                     |
| SSUBL, SSUBL2 (vector)            | Signed Subtract Long                                                             | <a href="#">SSUBL, SSUBL2 (vector) (A64)</a>            |
| SSUBW, SSUBW2 (vector)            | Signed Subtract Wide                                                             | <a href="#">SSUBW, SSUBW2 (vector) (A64)</a>            |
| ST1 (vector, multiple structures) | Store multiple single-element structures from one, two, three, or four registers | <a href="#">ST1 (vector, multiple structures) (A64)</a> |
| ST1 (vector, single structure)    | Store a single-element structure from one lane of one register                   | <a href="#">ST1 (vector, single structure) (A64)</a>    |
| ST2 (vector, multiple structures) | Store multiple 2-element structures from two registers                           | <a href="#">ST2 (vector, multiple structures) (A64)</a> |
| ST2 (vector, single structure)    | Store single 2-element structure from one lane of two registers                  | <a href="#">ST2 (vector, single structure) (A64)</a>    |
| ST3 (vector, multiple structures) | Store multiple 3-element structures from three registers                         | <a href="#">ST3 (vector, multiple structures) (A64)</a> |
| ST3 (vector, single structure)    | Store single 3-element structure from one lane of three registers                | <a href="#">ST3 (vector, single structure) (A64)</a>    |
| ST4 (vector, multiple structures) | Store multiple 4-element structures from four registers                          | <a href="#">ST4 (vector, multiple structures) (A64)</a> |
| ST4 (vector, single structure)    | Store single 4-element structure from one lane of four registers                 | <a href="#">ST4 (vector, single structure) (A64)</a>    |
| SUB (vector)                      | Subtract (vector)                                                                | <a href="#">SUB (vector) (A64)</a>                      |
| SUBHN, SUBHN2 (vector)            | Subtract returning High Narrow                                                   | <a href="#">SUBHN, SUBHN2 (vector) (A64)</a>            |
| SUQADD (vector)                   | Signed saturating Accumulate of Unsigned value                                   | <a href="#">SUQADD (vector) (A64)</a>                   |
| SXTL, SXTL2 (vector)              | Signed extend Long                                                               | <a href="#">SXTL, SXTL2 (vector) (A64)</a>              |
| TBL (vector)                      | Table vector Lookup                                                              | <a href="#">TBL (vector) (A64)</a>                      |
| TBX (vector)                      | Table vector lookup extension                                                    | <a href="#">TBX (vector) (A64)</a>                      |
| TRN1 (vector)                     | Transpose vectors (primary)                                                      | <a href="#">TRN1 (vector) (A64)</a>                     |
| TRN2 (vector)                     | Transpose vectors (secondary)                                                    | <a href="#">TRN2 (vector) (A64)</a>                     |
| UABA (vector)                     | Unsigned Absolute difference and Accumulate                                      | <a href="#">UABA (vector) (A64)</a>                     |
| UABAL, UABAL2 (vector)            | Unsigned Absolute difference and Accumulate Long                                 | <a href="#">UABAL, UABAL2 (vector) (A64)</a>            |
| UABD (vector)                     | Unsigned Absolute Difference (vector)                                            | <a href="#">UABD (vector) (A64)</a>                     |
| UABDL, UABDL2 (vector)            | Unsigned Absolute Difference Long                                                | <a href="#">UABDL, UABDL2 (vector) (A64)</a>            |
| UADALP (vector)                   | Unsigned Add and Accumulate Long Pairwise                                        | <a href="#">UADALP (vector) (A64)</a>                   |
| UADDL, UADDL2 (vector)            | Unsigned Add Long (vector)                                                       | <a href="#">UADDL, UADDL2 (vector) (A64)</a>            |
| UADDLP (vector)                   | Unsigned Add Long Pairwise                                                       | <a href="#">UADDLP (vector) (A64)</a>                   |

| Mnemonic                           | Brief description                                          | See                                                      |
|------------------------------------|------------------------------------------------------------|----------------------------------------------------------|
| UADDLV (vector)                    | Unsigned sum Long across Vector                            | <a href="#">UADDLV (vector) (A64)</a>                    |
| UADDW, UADDW2 (vector)             | Unsigned Add Wide                                          | <a href="#">UADDW, UADDW2 (vector) (A64)</a>             |
| UCVTF (vector, fixed-point)        | Unsigned fixed-point Convert to Floating-point (vector)    | <a href="#">UCVTF (vector, fixed-point) (A64)</a>        |
| UCVTF (vector, integer)            | Unsigned integer Convert to Floating-point (vector)        | <a href="#">UCVTF (vector, integer) (A64)</a>            |
| UHADD (vector)                     | Unsigned Halving Add                                       | <a href="#">UHADD (vector) (A64)</a>                     |
| UHSUB (vector)                     | Unsigned Halving Subtract                                  | <a href="#">UHSUB (vector) (A64)</a>                     |
| UMAX (vector)                      | Unsigned Maximum (vector)                                  | <a href="#">UMAX (vector) (A64)</a>                      |
| UMAXP (vector)                     | Unsigned Maximum Pairwise                                  | <a href="#">UMAXP (vector) (A64)</a>                     |
| UMAXV (vector)                     | Unsigned Maximum across Vector                             | <a href="#">UMAXV (vector) (A64)</a>                     |
| UMIN (vector)                      | Unsigned Minimum (vector)                                  | <a href="#">UMIN (vector) (A64)</a>                      |
| UMINP (vector)                     | Unsigned Minimum Pairwise                                  | <a href="#">UMINP (vector) (A64)</a>                     |
| UMINV (vector)                     | Unsigned Minimum across Vector                             | <a href="#">UMINV (vector) (A64)</a>                     |
| UMLAL, UMLAL2 (vector, by element) | Unsigned Multiply-Add Long (vector, by element)            | <a href="#">UMLAL, UMLAL2 (vector, by element) (A64)</a> |
| UMLAL, UMLAL2 (vector)             | Unsigned Multiply-Add Long (vector)                        | <a href="#">UMLAL, UMLAL2 (vector) (A64)</a>             |
| UMLSL, UMLSL2 (vector, by element) | Unsigned Multiply-Subtract Long (vector, by element)       | <a href="#">UMLSL, UMLSL2 (vector, by element) (A64)</a> |
| UMLSL, UMLSL2 (vector)             | Unsigned Multiply-Subtract Long (vector)                   | <a href="#">UMLSL, UMLSL2 (vector) (A64)</a>             |
| UMOV (vector)                      | Unsigned Move vector element to general-purpose register   | <a href="#">UMOV (vector) (A64)</a>                      |
| UMULL, UMULL2 (vector, by element) | Unsigned Multiply Long (vector, by element)                | <a href="#">UMULL, UMULL2 (vector, by element) (A64)</a> |
| UMULL, UMULL2 (vector)             | Unsigned Multiply long (vector)                            | <a href="#">UMULL, UMULL2 (vector) (A64)</a>             |
| UQADD (vector)                     | Unsigned saturating Add                                    | <a href="#">UQADD (vector) (A64)</a>                     |
| UQRSHL (vector)                    | Unsigned saturating Rounding Shift Left (register)         | <a href="#">UQRSHL (vector) (A64)</a>                    |
| UQRSHRN, UQRSHRN2 (vector)         | Unsigned saturating Rounded Shift Right Narrow (immediate) | <a href="#">UQRSHRN, UQRSHRN2 (vector) (A64)</a>         |
| UQSHL (vector, immediate)          | Unsigned saturating Shift Left (immediate)                 | <a href="#">UQSHL (vector, immediate) (A64)</a>          |
| UQSHL (vector, register)           | Unsigned saturating Shift Left (register)                  | <a href="#">UQSHL (vector, register) (A64)</a>           |
| UQSHRN, UQSHRN2 (vector)           | Unsigned saturating Shift Right Narrow (immediate)         | <a href="#">UQSHRN, UQSHRN2 (vector) (A64)</a>           |
| UQSUB (vector)                     | Unsigned saturating Subtract                               | <a href="#">UQSUB (vector) (A64)</a>                     |
| UQXTN, UQXTN2 (vector)             | Unsigned saturating extract Narrow                         | <a href="#">UQXTN, UQXTN2 (vector) (A64)</a>             |
| URECPE (vector)                    | Unsigned Reciprocal Estimate                               | <a href="#">URECPE (vector) (A64)</a>                    |
| URHADD (vector)                    | Unsigned Rounding Halving Add                              | <a href="#">URHADD (vector) (A64)</a>                    |
| URSHL (vector)                     | Unsigned Rounding Shift Left (register)                    | <a href="#">URSHL (vector) (A64)</a>                     |
| URSHR (vector)                     | Unsigned Rounding Shift Right (immediate)                  | <a href="#">URSHR (vector) (A64)</a>                     |
| URSQRTE (vector)                   | Unsigned Reciprocal Square Root Estimate                   | <a href="#">URSQRTE (vector) (A64)</a>                   |
| URSRA (vector)                     | Unsigned Rounding Shift Right and Accumulate (immediate)   | <a href="#">URSRA (vector) (A64)</a>                     |
| USHL (vector)                      | Unsigned Shift Left (register)                             | <a href="#">USHL (vector) (A64)</a>                      |

| Mnemonic               | Brief description                               | See                                          |
|------------------------|-------------------------------------------------|----------------------------------------------|
| USHLL, USHLL2 (vector) | Unsigned Shift Left Long (immediate)            | <a href="#">USHLL, USHLL2 (vector) (A64)</a> |
| USHR (vector)          | Unsigned Shift Right (immediate)                | <a href="#">USHR (vector) (A64)</a>          |
| USQADD (vector)        | Unsigned saturating Accumulate of Signed value  | <a href="#">USQADD (vector) (A64)</a>        |
| USRA (vector)          | Unsigned Shift Right and Accumulate (immediate) | <a href="#">USRA (vector) (A64)</a>          |
| USUBL, USUBL2 (vector) | Unsigned Subtract Long                          | <a href="#">USUBL, USUBL2 (vector) (A64)</a> |
| USUBW, USUBW2 (vector) | Unsigned Subtract Wide                          | <a href="#">USUBW, USUBW2 (vector) (A64)</a> |
| UXTL, UXTL2 (vector)   | Unsigned extend Long                            | <a href="#">UXTL, UXTL2 (vector) (A64)</a>   |
| UZP1 (vector)          | Unzip vectors (primary)                         | <a href="#">UZP1 (vector) (A64)</a>          |
| UZP2 (vector)          | Unzip vectors (secondary)                       | <a href="#">UZP2 (vector) (A64)</a>          |
| XTN, XTN2 (vector)     | Extract Narrow                                  | <a href="#">XTN, XTN2 (vector) (A64)</a>     |
| ZIP1 (vector)          | Zip vectors (primary)                           | <a href="#">ZIP1 (vector) (A64)</a>          |
| ZIP2 (vector)          | Zip vectors (secondary)                         | <a href="#">ZIP2 (vector) (A64)</a>          |

## 21.2 ABS (vector) (A64)

Absolute value (vector).

### Syntax

`ABS Vd.T, Vn.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register.

### Usage

Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD and FP register, puts the result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.3 ADD (vector) (A64)

Add (vector).

### Syntax

`ADD Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Add (vector). This instruction adds corresponding elements in the two source SIMD and FP registers, places the results into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.4 ADDHN, ADDHN2 (vector) (A64)

Add returning High Narrow.

### Syntax

`ADDHN2Vd.Tb, Vn.Ta, Vm.Ta`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

***Tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Vn***

Is the name of the first SIMD and FP source register.

***Ta***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Vm***

Is the name of the second SIMD and FP source register.

## Usage

Add returning High Narrow. This instruction adds each vector element in the first source SIMD and FP register to the corresponding vector element in the second source SIMD and FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register.

The results are truncated. For rounded results, see [RADDHN, RADDHN2 \(vector\) \(A64\)](#).

The `ADDHN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `ADDHN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the `CPACR_EL1`, `CPTR_EL2`, and `CPTR_EL3` registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-2: ADDHN, ADDHN2 (Vector) specifier combinations**

| <Q> | <b><i>Tb</i></b> | <b><i>Ta</i></b> |
|-----|------------------|------------------|
| -   | 8B               | 8H               |
| 2   | 16B              | 8H               |
| -   | 4H               | 4S               |
| 2   | 8H               | 4S               |
| -   | 2S               | 2D               |
| 2   | 4S               | 2D               |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.5 ADDP (vector) (A64)

Add Pairwise (vector).

### Syntax

`ADDP Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD and FP register after the vector elements of the second source SIMD and FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.6 ADDV (vector) (A64)

Add across Vector.

### Syntax

`ADDV Vd, Vn.T`

Where:

**V**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

***Vn***

Is the name of the SIMD and FP source register.

***T***

Is an arrangement specifier, and can be one of the values shown in Usage.

**Usage**

Add across Vector. This instruction adds every vector element in the source SIMD and FP register together, and writes the scalar result to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-3: ADDV (Vector) specifier combinations**

| <b><i>V</i></b> | <b><i>T</i></b> |
|-----------------|-----------------|
| B               | 8B              |
| B               | 16B             |
| H               | 4H              |
| H               | 8H              |
| S               | 4S              |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.7 AND (vector) (A64)

Bitwise AND (vector).

**Syntax**

AND *Vd.T, Vn.T,Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be either 8B or 16B.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

## Usage

Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD and FP registers, and writes the result to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.8 BIC (vector, immediate) (A64)

Bitwise bit Clear (vector, immediate).

### Syntax

BIC *Vd.T, #imm8, LSL #amount* ; 16-bit

BIC *Vd.T, #imm8, LSL #amount* ; 32-bit

Where:

**T**

Is an arrangement specifier:

#### 16-bit

Can be one of 4H or 8H.

#### 32-bit

Can be one of 2S or 4S.

**amount**

Is the shift amount:

#### 16-bit

Can be one of 0 or 8.

#### 32-bit

Can be one of 0, 8, 16 or 24.

Defaults to zero if LSL is omitted.

**Vd**

Is the name of the SIMD and FP register.

**imm8**

Is an 8-bit immediate.

## Usage

Bitwise bit Clear (vector, immediate). This instruction reads each vector element from the destination SIMD and FP register, performs a bitwise AND between each result and the complement of an immediate constant, places the result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.9 BIC (vector, register) (A64)

Bitwise bit Clear (vector, register).

### Syntax

BIC *Vd.T, Vn.T, Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be either **8B** or **16B**.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

## Usage

Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD and FP register and the complement of the second source SIMD and FP register, and writes the result to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.10 BIF (vector) (A64)

Bitwise Insert if False.

### Syntax

**BIF** *Vd.T, Vn.T, Vm.T*

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either 8B or 16B.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Bitwise Insert if False. This instruction inserts each bit from the first source SIMD and FP register into the destination SIMD and FP register if the corresponding bit of the second source SIMD and FP register is 0, otherwise leaves the bit in the destination register unchanged.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.11 BIT (vector) (A64)

Bitwise Insert if True.

### Syntax

**BIT** *Vd.T, Vn.T, Vm.T*

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either 8B or 16B.

**Vn**

Is the name of the first SIMD and FP source register.

**V<sub>m</sub>**

Is the name of the second SIMD and FP source register.

**Usage**

Bitwise Insert if True. This instruction inserts each bit from the first source SIMD and FP register into the SIMD and FP destination register if the corresponding bit of the second source SIMD and FP register is 1, otherwise leaves the bit in the destination register unchanged.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.12 BSL (vector) (A64)

Bitwise Select.

**Syntax**

`BSL Vd.T, Vn.T, Vm.T`

Where:

**V<sub>d</sub>**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either 8B or 16B.

**V<sub>n</sub>**

Is the name of the first SIMD and FP source register.

**V<sub>m</sub>**

Is the name of the second SIMD and FP source register.

**Usage**

Bitwise Select. This instruction sets each bit in the destination SIMD and FP register to the corresponding bit from the first source SIMD and FP register when the original destination bit was 1, otherwise from the second source SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.13 CLS (vector) (A64)

Count Leading Sign bits (vector).

### Syntax

`CLS Vd.T, Vn.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the SIMD and FP source register.

### Usage

Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD and FP register, places the result into a vector, and writes the vector to the destination SIMD and FP register. The count does not include the most significant bit itself.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.14 CLZ (vector) (A64)

Count Leading Zero bits (vector).

### Syntax

`CLZ Vd.T, Vn.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the SIMD and FP source register.

## Usage

Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD and FP register, places the result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.15 CMEQ (vector, register) (A64)

Compare bitwise Equal (vector).

### Syntax

CMEQ *Vd.T*, *Vn.T*, *Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

## Usage

Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD and FP register with the corresponding vector element from the second source SIMD and FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.16 CMEQ (vector, zero) (A64)

Compare bitwise Equal to zero (vector).

### Syntax

CMEQ *Vd.T, Vn.T, #0*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Vn***

Is the name of the SIMD and FP source register.

### Usage

Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD and FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.17 CMGE (vector, register) (A64)

Compare signed Greater than or Equal (vector).

### Syntax

CMGE *Vd.T, Vn.T, Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

**Usage**

Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD and FP register with the corresponding vector element in the second source SIMD and FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.18 CMGE (vector, zero) (A64)

Compare signed Greater than or Equal to zero (vector).

**Syntax**

CMGE *Vd.T*, *Vn.T*, #0

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Vn***

Is the name of the SIMD and FP source register.

**Usage**

Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD and FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.19 CMGT (vector, register) (A64)

Compare signed Greater than (vector).

### Syntax

`CMGT Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD and FP register with the corresponding vector element in the second source SIMD and FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.20 CMGT (vector, zero) (A64)

Compare signed Greater than zero (vector).

### Syntax

`CMGT Vd.T, Vn.T, #0`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register.

## Usage

Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD and FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.21 CMHI (vector, register) (A64)

Compare unsigned Higher (vector).

### Syntax

`CMHI Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Usage

Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD and FP register with the corresponding vector element in the second source SIMD and FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.22 CMHS (vector, register) (A64)

Compare unsigned Higher or Same (vector).

### Syntax

CMHS *Vd.T*, *Vn.T*, *Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

### Usage

Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD and FP register with the corresponding vector element in the second source SIMD and FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.23 CMLE (vector, zero) (A64)

Compare signed Less than or Equal to zero (vector).

### Syntax

```
CMLE Vd.T, Vn.T, #0
```

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register.

### Usage

Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD and FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.24 CMLT (vector, zero) (A64)

Compare signed Less than zero (vector).

### Syntax

```
CMLT Vd.T, Vn.T, #0
```

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register.

## Usage

Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD and FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.25 CMTST (vector) (A64)

Compare bitwise Test bits nonzero (vector).

### Syntax

CMTST *Vd.T, Vn.T, Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

## Usage

Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD and FP register, performs an AND with the corresponding vector element in the second source SIMD and FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.26 CNT (vector) (A64)

Population Count per byte.

### Syntax

CNT *Vd.T, Vn.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be either 8B or 16B.

***Vn***

Is the name of the SIMD and FP source register.

### Usage

Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD and FP register, places the result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.27 DUP (vector, element) (A64)

vector.

### Syntax

DUP *Vd.T, Vn.Ts [index]*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Ts***

Is an element size specifier, and can be one of the values shown in Usage.

***Vn***

Is the name of the SIMD and FP source register.

***index***

Is the element index, in the range shown in Usage.

**Usage**

Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD and FP register into a scalar or each element in a vector, and writes the result to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-4: DUP (Vector) specifier combinations**

| <b>T</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| 8B       | B         | 0 to 15      |
| 16B      | B         | 0 to 15      |
| 4H       | H         | 0 to 7       |
| 8H       | H         | 0 to 7       |
| 2S       | S         | 0 to 3       |
| 4S       | S         | 0 to 3       |
| 2D       | D         | 0 or 1       |

**Related information**

[A64 SIMD scalar instructions in alphabetical order](#) on page 1024

## 21.28 DUP (vector, general) (A64)

Duplicate general-purpose register to vector.

**Syntax**

DUP *Vd.T, Rn*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of the values shown in Usage.

***R***

Is the width specifier for the general-purpose source register, and can be either w or x.

***n***

Is the number [0-30] of the general-purpose source register or ZR (31).

## Usage

Duplicate general-purpose register to vector. This instruction duplicates the contents of the source general-purpose register into a scalar or each element in a vector, and writes the result to the SIMD and FP destination register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-5: DUP (Vector) specifier combinations**

| T   | R |
|-----|---|
| 8B  | W |
| 16B | W |
| 4H  | W |
| 8H  | W |
| 2S  | W |
| 4S  | W |
| 2D  | X |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.29 EOR (vector) (A64)

Bitwise Exclusive OR (vector).

### Syntax

EOR *Vd.T, Vn.T, Vm.T*

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either 8B or 16B.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Usage

Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD and FP registers, and places the result in the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.30 EXT (vector) (A64)

Extract vector from pair of vectors.

### Syntax

```
EXT Vd.T, Vn.T, Vm.T, #index
```

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either 8B or 16B.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**index**

Is the lowest numbered byte element to be extracted in the range shown in Usage.

## Usage

Extract vector from pair of vectors. This instruction extracts the lowest vector elements from the second source SIMD and FP register and the highest vector elements from the first source SIMD and FP register, concatenates the results into a vector, and writes the vector to the destination SIMD and FP register vector. The index value specifies the lowest vector element to extract from the first source register, and consecutive elements are extracted from the first, then second, source registers until the destination vector is filled.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-6: EXT (Vector) specifier combinations**

| <b>T</b> | <i>index</i> |
|----------|--------------|
| 8B       | 0 to 7       |
| 16B      | 0 to 15      |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.31 FABD (vector) (A64)

Floating-point Absolute Difference (vector).

**Syntax**

`FABD Vd.T, Vn.T, Vm.T ; Vector half precision`

`FABD Vd.T, Vn.T, Vm.T ; Vector single-precision and double-precision`

Where:

**Vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

**Vector half precision**

Can be one of 4H or 8H.

**Vector single-precision and double-precision**

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register

**Vm**

Is the name of the second SIMD and FP source register

**Architectures supported (vector)**

Supported in the Arm®v8.2 architecture and later.

**Usage**

Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD and FP register, from the corresponding floating-point values in the elements of the first source SIMD and FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.32 FABS (vector) (A64)

Floating-point Absolute value (vector).

### Syntax

`FABS Vd.T, Vn.T ; Half-precision`

`FABS Vd.T, Vn.T ; Single-precision and double-precision`

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of 4H or 8H.

#### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD and FP register, writes the result to a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.33 FACGE (vector) (A64)

Floating-point Absolute Compare Greater than or Equal (vector).

### Syntax

`FACGE Vd.T, Vn.T, Vm.T ; Vector half precision`

`FACGE Vd.T, Vn.T, Vm.T ; Vector single-precision and double-precision`

Where:

**Vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register

**Vm**

Is the name of the second SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD and FP register with the absolute value of the corresponding floating-point value in the second source SIMD and FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.34 FACGT (vector) (A64)

Floating-point Absolute Compare Greater than (vector).

### Syntax

`FACGT Vd.T, Vn.T, Vm.T ; Vector half precision`

`FACGT Vd.T, Vn.T, Vm.T ; Vector single-precision and double-precision`

Where:

**Vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register

**Vm**

Is the name of the second SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD and FP register with the absolute value of the corresponding vector element in the second source SIMD and FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated.

For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.35 FADD (vector) (A64)

Floating-point Add (vector).

### Syntax

FADD *Vd.T, Vn.T, Vm.T ; Half-precision*

FADD *Vd.T, Vn.T, Vm.T ; Single-precision and double-precision*

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of 4H or 8H.

#### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD and FP registers, writes the result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are floating-point values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated.

For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.36 FADDP (vector) (A64)

Floating-point Add Pairwise (vector).

### Syntax

FADDP *Vd.T, Vn.T, Vm.T ; Half-precision*

FADDP *Vd.T, Vn.T, Vm.T ; Single-precision and double-precision*

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of 4H or 8H.

#### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD and FP register after the vector elements of the second source SIMD and FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are floating-point values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.37 FCADD (vector) (A64)

Floating-point Complex Add.

### Syntax

FCADD *Vd.T, Vn.T, Vm.T, #rotate*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 4H, 8H, 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

***rotate***

Is the rotation, and can be either 90 or 270.

### Architectures supported (vector)

Supported in the Arm®v8.3-A architecture and later.

### Usage

Floating-point Complex Add.

This instruction adds two source complex numbers from the *Vm* and the *Vn* vector registers and places the resulting complex number in the destination *Vd* vector register. The number of complex numbers that can be stored in the *Vm*, the *Vn*, and the *Vd* registers is calculated as the vector register size divided by the length of each complex number. These lengths are 16 for half-precision, 32 for single-precision, and 64 for double-precision. Each complex number is represented in a SIMD and FP register as a pair of elements with the imaginary part of the number being placed in the more significant element, and the real part of the number being placed in the less significant

element. Both real and imaginary parts of the source and the resulting complex number are represented as floating-point values.

One of the two vector elements that are read from each of the numbers in the  $v_m$  source SIMD and FP register can be optionally negated based on the rotation value:

- If the rotation is 90, the odd-numbered vector elements are negated.
- If the rotation is 270, the even-numbered vector elements are negated.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.38 FCMEQ (vector, register) (A64)

Floating-point Compare Equal (vector).

### Syntax

FCMEQ  $Vd.T, Vn.T, Vm.T$

Where:

**$Vd$**

Is the name of the SIMD and FP destination register.

**$T$**

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

**$Vn$**

Is the name of the first SIMD and FP source register.

**$Vm$**

Is the name of the second SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD and FP register, with the corresponding floating-point value from the second source SIMD and FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.39 FCMEQ (vector, zero) (A64)

Floating-point Compare Equal to zero (vector).

### Syntax

FCMEQ *Vd.T, Vn.T, #0.0*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

***Vn***

Is the name of the SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD and FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.40 FCMGE (vector, register) (A64)

Floating-point Compare Greater than or Equal (vector).

### Syntax

FCMGE *Vd.T, Vn.T, Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD and FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD and FP register sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.41 FCMGE (vector, zero) (A64)

Floating-point Compare Greater than or Equal to zero (vector).

### Syntax

FCMGE *Vd.T, Vn.T, #0.0*

Where:

***vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

***Vn***

Is the name of the SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD and FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.42 FCMGT (vector, register) (A64)

Floating-point Compare Greater than (vector).

### Syntax

FCMGT *Vd.T, Vn.T, Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD and FP register and if the value is greater than the corresponding floating-point value in the second source SIMD and FP register sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.43 FCMGT (vector, zero) (A64)

Floating-point Compare Greater than zero (vector).

### Syntax

FCMGT *Vd.T, Vn.T, #0.0*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

***Vn***

Is the name of the SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD and FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.44 FCMLA (vector) (A64)

Floating-point Complex Multiply Accumulate.

### Syntax

`FCMLA Vd.T, Vn.T, Vm.T, #rotate`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**rotate**

Is the rotation, and can be one of 0, 90, 180 or 270.

### Architectures supported (vector)

Supported in the Arm®v8.3-A architecture and later.

## Usage

This instruction multiplies the two source complex numbers from the  $v_m$  and the  $v_n$  vector registers and adds the result to the corresponding complex number in the destination  $v_d$  vector register. The number of complex numbers that can be stored in the  $v_m$ , the  $v_n$ , and the  $v_d$  registers is calculated

as the vector register size divided by the length of each complex number. These lengths are 16 for half-precision, 32 for single-precision, and 64 for double-precision. Each complex number is represented in a SIMD and FP register as a pair of elements with the imaginary part of the number being placed in the more significant element, and the real part of the number being placed in the less significant element. Both real and imaginary parts of the source and the resulting complex number are represented as floating-point values.

None, one, or both of the two vector elements that are read from each of the numbers in the  $v_m$  source SIMD and FP register can be negated based on the rotation value:

- If the rotation is 0, none of the vector elements are negated.
- If the rotation is 90, the odd-numbered vector elements are negated.
- If the rotation is 180, both vector elements are negated.
- If the rotation is 270, the even-numbered vector elements are negated.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.45 FCMLE (vector, zero) (A64)

Floating-point Compare Less than or Equal to zero (vector).

### Syntax

FCMLE  $Vd.T, Vn.T, \#0.0 ;$  Vector half precision

FCMLE  $Vd.T, Vn.T, \#0.0 ;$  Vector single-precision and double-precision

Where:

**$Vd$**

Is the name of the SIMD and FP destination register

**$T$**

Is an arrangement specifier:

#### Vector half precision

Can be one of  $4H$  or  $8H$ .

#### Vector single-precision and double-precision

Can be one of  $2S$ ,  $4S$  or  $2D$ .

***Vn***

Is the name of the SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD and FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.46 FCMLT (vector, zero) (A64)

Floating-point Compare Less than zero (vector).

### Syntax

FCMLT *Vd.T, Vn.T, #0.0 ; Vector half precision*

FCMLT *Vd.T, Vn.T, #0.0 ; Vector single-precision and double-precision*

Where:

***Vd***

Is the name of the SIMD and FP destination register

***T***

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

***Vn***

Is the name of the SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD and FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD and FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD and FP register to zero.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.47 FCVTAS (vector) (A64)

Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector).

### Syntax

```
FCVTAS Vd.T, Vn.T ; Vector half precision
```

```
FCVTAS Vd.T, Vn.T ; Vector single-precision and double-precision
```

Where:

***Vd***

Is the name of the SIMD and FP destination register

***T***

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

***Vn***

Is the name of the SIMD and FP source register

**Architectures supported (vector)**

Supported in the Arm®v8.2 architecture and later.

**Usage**

Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.48 FCVTAU (vector) (A64)

Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector).

**Syntax**

```
FCVTAU Vd.T, Vn.T ; Vector half precision
```

```
FCVTAU Vd.T, Vn.T ; Vector single-precision and double-precision
```

Where:

***Vd***

Is the name of the SIMD and FP destination register

***T***

Is an arrangement specifier:

**Vector half precision**

Can be one of 4H or 8H.

**Vector single-precision and double-precision**

Can be one of 2S, 4S or 2D.

***Vn***

Is the name of the SIMD and FP source register

**Architectures supported (vector)**

Supported in the Arm®v8.2 architecture and later.

**Usage**

Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.49 FCVTL, FCVTL2 (vector) (A64)

Floating-point Convert to higher precision Long (vector).

**Syntax**

`FCVTL2Vd. Ta, Vn. Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be either 4s or 2D.

**Vn**

Is the name of the SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

## Usage

Floating-point Convert to higher precision Long (vector). This instruction reads each element in a vector in the SIMD and FP source register, converts each value to double the precision of the source element using the rounding mode that is determined by the FPCR, and writes each result to the equivalent element of the vector in the SIMD and FP destination register.

Where the operation lengthens a 64-bit vector to a 128-bit vector, the `FCVTL2` variant operates on the elements in the top 64 bits of the source register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-7: FCVTL, FCVTL2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> |
|------------------|-----------|-----------|
| -                | 4S        | 4H        |
| 2                | 4S        | 8H        |
| -                | 2D        | 2S        |
| 2                | 2D        | 4S        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.50 FCVTMS (vector) (A64)

Floating-point Convert to Signed integer, rounding toward Minus infinity (vector).

### Syntax

`FCVTMS Vd.T, Vn.T ; Vector half precision`

`FCVTMS Vd.T, Vn.T ; Vector single-precision and double-precision`

Where:

**Vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

**Vector half precision**

Can be one of 4H or 8H.

**Vector single-precision and double-precision**

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register

**Architectures supported (vector)**

Supported in the Arm®v8.2 architecture and later.

**Usage**

Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.51 FCVTMU (vector) (A64)

Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector).

**Syntax**

```
FCVTMU Vd.T, Vn.T ; Vector half precision
```

```
FCVTMU Vd.T, Vn.T ; Vector single-precision and double-precision
```

Where:

**Vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

**Vector half precision**

Can be one of 4H or 8H.

**Vector single-precision and double-precision**

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register

**Architectures supported (vector)**

Supported in the Arm®v8.2 architecture and later.

**Usage**

Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.52 FCVTN, FCVTN2 (vector) (A64)

Floating-point Convert to lower precision Narrow (vector).

**Syntax**

FCVTN2Vd.Tb, Vn.Ta

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**Ta**

Is an arrangement specifier, and can be either 4S or 2D.

## Usage

Floating-point Convert to lower precision Narrow (vector). This instruction reads each vector element in the SIMD and FP source register, converts each result to half the precision of the source element, writes the final result to a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. The destination vector elements are half as long as the source vector elements. The rounding mode is determined by the FPCR.

The `FCVTN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `FCVTN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-8: FCVTN, FCVTN2 (Vector) specifier combinations**

| <Q> | <b>Tb</b> | <b>Ta</b> |
|-----|-----------|-----------|
| -   | 4H        | 4S        |
| 2   | 8H        | 4S        |
| -   | 2S        | 2D        |
| 2   | 4S        | 2D        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.53 FCVTNS (vector) (A64)

Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector).

### Syntax

```
FCVTNS Vd.T, Vn.T ; Vector half precision
```

```
FCVTNS Vd.T, Vn.T ; Vector single-precision and double-precision
```

Where:

**Vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.54 FCVTNU (vector) (A64)

Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector).

### Syntax

```
FCVTNU Vd.T, Vn.T ; Vector half precision
```

```
FCVTNU Vd.T, Vn.T ; Vector single-precision and double-precision
```

Where:

**Vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.55 FCVTPS (vector) (A64)

Floating-point Convert to Signed integer, rounding toward Plus infinity (vector).

### Syntax

```
FCVTPS Vd.T, Vn.T ; Vector half precision
```

```
FCVTPS Vd.T, Vn.T ; Vector single-precision and double-precision
```

Where:

**Vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.56 FCVTPU (vector) (A64)

Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector).

### Syntax

```
FCVTPU Vd.T, Vn.T ; Vector half precision
FCVTPU Vd.T, Vn.T ; Vector single-precision and double-precision
```

Where:

**Vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.57 FCVTXN, FCVTXN2 (vector) (A64)

Floating-point Convert to lower precision Narrow, rounding to odd (vector).

### Syntax

`FCVTXN2Vd.Tb, Vn.Ta`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Tb**

Is an arrangement specifier, and can be either 2s or 4s.

**Vn**

Is the name of the SIMD and FP source register.

**Ta**

Is an arrangement specifier, 2D.

### Usage

Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element in the source SIMD and FP register, narrows each value to half the precision of the source element using the Round to Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD and FP register.

The `FCVTXN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `FCVTXN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-9: FCVTXN{2} (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Tb</b> | <b>Ta</b> |
|------------------|-----------|-----------|
| -                | 2S        | 2D        |

| <b>&lt;Q&gt;</b> | <b>Tb</b> | <b>Ta</b> |
|------------------|-----------|-----------|
| 2                | 4S        | 2D        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.58 FCVTZS (vector, fixed-point) (A64)

Floating-point Convert to Signed fixed-point, rounding toward Zero (vector).

### Syntax

`FCVTZS Vd.T, Vn.T, #fbits`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**fbits**

Is the number of fractional bits, in the range 1 to the element width.

### Usage

Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-10: FCVTZS (Vector) specifier combinations**

| <b>T</b> | <b>fbits</b> |
|----------|--------------|
| 4H       | -            |

| <b>t</b> | <b>fbits</b> |
|----------|--------------|
| 8H       | -            |
| 2S       | 1 to 32      |
| 4S       | 1 to 32      |
| 2D       | 1 to 64      |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.59 FCVTZS (vector, integer) (A64)

Floating-point Convert to Signed integer, rounding toward Zero (vector).

### Syntax

`FCVTZS Vd.T, Vn.T ; Vector half precision`

`FCVTZS Vd.T, Vn.T ; Vector single-precision and double-precision`

Where:

**vd**

Is the name of the SIMD and FP destination register

**t**

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

**vn**

Is the name of the SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Convert to Signed integer, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Zero rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.60 FCVTZU (vector, fixed-point) (A64)

Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector).

### Syntax

`FCVTZU Vd.T, Vn.T, #fbits`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**fbits**

Is the number of fractional bits, in the range 1 to the element width.

### Usage

Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-11: FCVTZU (Vector) specifier combinations**

| <b>T</b> | <b>fbits</b> |
|----------|--------------|
| 4H       | -            |
| 8H       | -            |
| 2S       | 1 to 32      |
| 4S       | 1 to 32      |
| 2D       | 1 to 64      |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.61 FCVTZU (vector, integer) (A64)

Floating-point Convert to Unsigned integer, rounding toward Zero (vector).

**Syntax**

FCVTZU *Vd.T, Vn.T* ; Vector half precision

FCVTZU *Vd.T, Vn.T* ; Vector single-precision and double-precision

Where:

***Vd***

Is the name of the SIMD and FP destination register

***T***

Is an arrangement specifier:

**Vector half precision**

Can be one of 4H or 8H.

**Vector single-precision and double-precision**

Can be one of 2S, 4S or 2D.

***Vn***

Is the name of the SIMD and FP source register

**Architectures supported (vector)**

Supported in the Arm®v8.2 architecture and later.

**Usage**

Floating-point Convert to Unsigned integer, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Zero rounding mode, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.62 FDIV (vector) (A64)

Floating-point Divide (vector).

### Syntax

`FDIV Vd.T, Vn.T, Vm.T ; Half-precision`

`FDIV Vd.T, Vn.T, Vm.T ; Single-precision and double-precision`

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of 4H or 8H.

#### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD and FP register, by the floating-point values in the corresponding elements in

the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.63 FMAX (vector) (A64)

Floating-point Maximum (vector).

### Syntax

```
FMAX Vd.T, Vn.T, Vm.T ; Half-precision
FMAX Vd.T, Vn.T, Vm.T ; Single-precision and double-precision
```

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of 4H or 8H.

#### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD and FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.64 FMAXNM (vector) (A64)

Floating-point Maximum Number (vector).

### Syntax

FMAXNM *Vd.T, Vn.T, Vm.T ; Half-precision*

FMAXNM *Vd.T, Vn.T, Vm.T ; Single-precision and double-precision*

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of 4H or 8H.

#### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD and FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD and FP register.

Nan values are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value, otherwise the result is identical to *FMAX* (*scalar*).

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.65 FMAXNMP (vector) (A64)

Floating-point Maximum Number Pairwise (vector).

### Syntax

`FMAXNMP Vd.T, Vn.T,Vm.T ; Half-precision`

`FMAXNMP Vd.T, Vn.T,Vm.T ; Single-precision and double-precision`

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of 4H or 8H.

#### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD and FP register after the vector elements of the second source SIMD and FP register, reads each pair of adjacent vector elements in the two source SIMD and FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are floating-point values.

Nans are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result is the numerical value, otherwise the result is identical to *FMAX* (*scalar*).

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.66 FMAXNMV (vector) (A64)

Floating-point Maximum Number across Vector.

### Syntax

`FMAXNMV Vd, Vn.T ; Half-precision`

`FMAXNMV Vd, Vn.T ; Single-precision and double-precision`

Where:

**v**

Is the destination width specifier:

#### Single-precision and double-precision

Must be s.

#### Half-precision

Must be h.

**T**

Is an arrangement specifier:

### Half-precision

Can be one of `4H` or `8H`.

### Single-precision and double-precision

Must be `4S`.

**d**

Is the number of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Maximum Number across Vector. This instruction compares all the vector elements in the source SIMD and FP register, and writes the largest of the values as a scalar to the destination SIMD and FP register. All the values in this instruction are floating-point values.

Nans are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result of the comparison is the numerical value, otherwise the result is identical to *FMAX* (scalar).

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.67 FMAXP (vector) (A64)

Floating-point Maximum Pairwise (vector).

### Syntax

`FMAXP Vd.T, Vn.T, Vm.T ; Half-precision`

`FMAXP Vd.T, Vn.T, Vm.T ; Single-precision and double-precision`

Where:

**T**

Is an arrangement specifier:

### Half-precision

Can be one of 4H or 8H.

### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD and FP register after the vector elements of the second source SIMD and FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are floating-point values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.68 FMAXV (vector) (A64)

Floating-point Maximum across Vector.

### Syntax

FMAXV Vd, Vn.T ; Half-precision

**FMAXV Vd, Vn.T ; Single-precision and double-precision**

Where:

**v**

Is the destination width specifier:

**Single-precision and double-precision**

Must be s.

**Half-precision**

Must be  $\text{H}$ .

**t**

Is an arrangement specifier:

**Half-precision**

Can be one of  $4\text{H}$  or  $8\text{H}$ .

**Single-precision and double-precision**

Must be  $4\text{s}$ .

**d**

Is the number of the SIMD and FP destination register.

**vn**

Is the name of the SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Maximum across Vector. This instruction compares all the vector elements in the source SIMD and FP register, and writes the largest of the values as a scalar to the destination SIMD and FP register. All the values in this instruction are floating-point values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.69 FMIN (vector) (A64)

Floating-point minimum (vector).

### Syntax

```
FMIN Vd.T, Vn.T, Vm.T ; Half-precision
```

```
FMIN Vd.T, Vn.T, Vm.T ; Single-precision and double-precision
```

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of 4H or 8H.

#### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD and FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.70 FMINNM (vector) (A64)

Floating-point Minimum Number (vector).

### Syntax

```
FMINNM Vd.T, Vn.T, Vm.T ; Half-precision
```

```
FMINNM Vd.T, Vn.T, Vm.T ; Single-precision and double-precision
```

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of 4H or 8H.

#### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD and FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD and FP register.

Nans are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value, otherwise the result is identical to *FMIN* (scalar).

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

# 21.71 FMINNMP (vector) (A64)

Floating-point Minimum Number Pairwise (vector).

## Syntax

`FMINNMP Vd.T, Vn.T, Vm.T ; Half-precision`

`FMINNMP Vd.T, Vn.T, Vm.T ; Single-precision and double-precision`

Where:

**T**

Is an arrangement specifier:

### Half-precision

Can be one of `4H` or `8H`.

### Single-precision and double-precision

Can be one of `2S`, `4S` or `2D`.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD and FP register after the vector elements of the second source SIMD and FP register, reads each pair of adjacent vector elements in the two source SIMD and FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are floating-point values.

Nans are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result is the numerical value, otherwise the result is identical to `FMIN` (`scalar`).

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.72 FMINNMV (vector) (A64)

Floating-point Minimum Number across Vector.

### Syntax

`FMINNMV Vd, Vn.T ; Half-precision`

`FMINNMV Vd, Vn.T ; Single-precision and double-precision`

Where:

**V**

Is the destination width specifier:

#### Single-precision and double-precision

Must be s.

#### Half-precision

Must be  $\text{H}$ .

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of  $4\text{H}$  or  $8\text{H}$ .

#### Single-precision and double-precision

Must be  $4\text{s}$ .

**d**

Is the number of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Minimum Number across Vector. This instruction compares all the vector elements in the source SIMD and FP register, and writes the smallest of the values as a scalar to the destination SIMD and FP register. All the values in this instruction are floating-point values.

Nan<sub>s</sub>s are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result of the comparison is the numerical value, otherwise the result is identical to *FMIN* (*scalar*).

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.73 FMINP (vector) (A64)

Floating-point Minimum Pairwise (vector).

### Syntax

`FMINP Vd.T, Vn.T, Vm.T ; Half-precision`

`FMINP Vd.T, Vn.T, Vm.T ; Single-precision and double-precision`

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of 4H or 8H.

#### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD and FP register after the vector elements of the second source SIMD and FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are floating-point values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.74 FMINV (vector) (A64)

Floating-point Minimum across Vector.

### Syntax

FMINV *Vd*, *Vn.T* ; Half-precision

FMINV *Vd*, *Vn.T* ; Single-precision and double-precision

Where:

**v**

Is the destination width specifier:

#### Single-precision and double-precision

Must be s.

#### Half-precision

Must be  $\text{H}$ .

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of  $4\text{H}$  or  $8\text{H}$ .

**Single-precision and double-precision**

Must be 4s.

***d***

Is the number of the SIMD and FP destination register.

***Vn***

Is the name of the SIMD and FP source register.

**Architectures supported (vector)**

Supported in the Arm®v8.2 architecture and later.

**Usage**

Floating-point Minimum across Vector. This instruction compares all the vector elements in the source SIMD and FP register, and writes the smallest of the values as a scalar to the destination SIMD and FP register. All the values in this instruction are floating-point values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.75 FMLA (vector, by element) (A64)

Floating-point fused Multiply-Add to accumulator (by element).

**Syntax**

`FMLA Vd.T, Vn.T, Vm.Ts[index]`

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier:

**Vector, half-precision**

Can be one of 4H or 8H.

**Vector, single-precision and double-precision**

Can be one of 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

***Ts***

Is an element size specifier:

**Vector, half-precision**

Must be `H`.

**Vector, single-precision and double-precision**

Can be one of `S` or `D`.

***index***

Is the element index:

**Vector, half-precision**

Must be `H:L:M`.

**Vector, single-precision and double-precision**

Can be one of `H:L` or `H`.

***Vm***

Is the name of the second SIMD and FP source register in the range 0 to 31.

**Architectures supported (vector)**

Supported in the Arm®v8.2 architecture and later.

**Usage**

Floating-point fused Multiply-Add to accumulator (by element). This instruction multiplies the vector elements in the first source SIMD and FP register by the specified value in the second source SIMD and FP register, and accumulates the results in the vector elements of the destination SIMD and FP register. All the values in this instruction are floating-point values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-12: FMLA (Vector, single-precision and double-precision) specifier combinations**

| <b><i>T</i></b> | <b><i>Ts</i></b> | <b><i>index</i></b> |
|-----------------|------------------|---------------------|
| 2S              | S                | 0 to 3              |
| 4S              | S                | 0 to 3              |
| 2D              | D                | 0 or 1              |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

# 21.76 FMLA (vector) (A64)

Floating-point fused Multiply-Add to accumulator (vector).

## Syntax

`FMLA Vd.T, Vn.T, Vm.T`

Where:

**T**

Is an arrangement specifier:

### Half-precision

Can be one of 4H or 8H.

### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD and FP registers, adds the product to the corresponding vector element of the destination SIMD and FP register, and writes the result to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.77 FMLS (vector, by element) (A64)

Floating-point fused Multiply-Subtract from accumulator (by element).

**Syntax**

**FMLS** *Vd.T, Vn.T, Vm.Ts[index]*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier:

**Vector, half-precision**

Can be one of 4H or 8H.

**Vector, single-precision and double-precision**

Can be one of 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

***Ts***

Is an element size specifier:

**Vector, half-precision**

Must be H.

**Vector, single-precision and double-precision**

Can be one of S or D.

***index***

Is the element index:

**Vector, half-precision**

Must be H:L:M.

**Vector, single-precision and double-precision**

Can be one of H:L or H.

***Vm***

Is the name of the second SIMD and FP source register in the range 0 to 31.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point fused Multiply-Subtract from accumulator (by element). This instruction multiplies the vector elements in the first source SIMD and FP register by the specified value in the second source SIMD and FP register, and subtracts the results from the vector elements of the destination SIMD and FP register. All the values in this instruction are floating-point values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-13: FMLS (Vector, single-precision and double-precision) specifier combinations**

| T  | Ts | index  |
|----|----|--------|
| 2S | S  | 0 to 3 |
| 4S | S  | 0 to 3 |
| 2D | D  | 0 or 1 |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.78 FMLS (vector) (A64)

Floating-point fused Multiply-Subtract from accumulator (vector).

### Syntax

`FMLS Vd.T, Vn.T, Vm.T`

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of `4H` or `8H`.

#### Single-precision and double-precision

Can be one of `2S`, `4S` or `2D`.

***Vd***

Is the name of the SIMD and FP destination register.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD and FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD and FP register, and writes the result to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.79 FMOV (vector, immediate) (A64)

Floating-point move immediate (vector).

### Syntax

```
FMOV Vd.T, #imm ; Half-precision
FMOV Vd.T, #imm ; Single-precision
FMOV Vd.2D, #imm ; Double-precision
```

Where:

***Vd***

The value depends on the instruction variant:

**Half-precision** Is the name of the SIMD and FP destination register

**Single-precision** Is the name of the SIMD and FP destination register

**Double-precision** Is the name of the SIMD and FP destination register

#### **T**

Is an arrangement specifier:

**Half-precision**

Can be one of 4H or 8H.

**Single-precision**

Can be one of 2S or 4S.

#### **imm**

The value depends on the instruction variant:

**Half-precision** Is a signed floating-point constant with 3-bit exponent and normalized 4 bits of precision. For details of the range of constants available and the encoding of *imm*, see Modified immediate constants in A64 floating-point instructions in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Single-precision** Is a signed floating-point constant with 3-bit exponent and normalized 4 bits of precision. For details of the range of constants available and the encoding of *imm*, see Modified immediate constants in A64 floating-point instructions in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

**Double-precision** Is a signed floating-point constant with 3-bit exponent and normalized 4 bits of precision. For details of the range of constants available and the encoding of *imm*, see Modified immediate constants in A64 floating-point instructions in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point move immediate (vector). This instruction copies an immediate floating-point constant into every element of the SIMD and FP destination register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.80 FMUL (vector, by element) (A64)

Floating-point Multiply (by element).

### Syntax

`FMUL Vd.T, Vn.T, Vm.Ts[index]`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier:

#### Vector, half-precision

Can be one of 4H or 8H.

#### Vector, single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Ts**

Is an element size specifier:

#### Vector, half-precision

Must be H.

#### Vector, single-precision and double-precision

Can be one of S or D.

**index**

Is the element index:

#### Vector, half-precision

Must be H:L:M.

#### Vector, single-precision and double-precision

Can be one of H:L or H.

**Vm**

Is the name of the second SIMD and FP source register in the range 0 to 31.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Multiply (by element). This instruction multiplies the vector elements in the first source SIMD and FP register by the specified value in the second source SIMD and FP register,

places the results in a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are floating-point values.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-14: FMUL (Vector, single-precision and double-precision) specifier combinations**

| <b>T</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| 2S       | S         | 0 to 3       |
| 4S       | S         | 0 to 3       |
| 2D       | D         | 0 or 1       |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.81 FMUL (vector) (A64)

Floating-point Multiply (vector).

### Syntax

`FMUL Vd.T, Vn.T, Vm.T`

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of 4H or 8H.

#### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD and FP registers, places the result in a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.82 FMULX (vector, by element) (A64)

Floating-point Multiply extended (by element).

### Syntax

`FMULX Vd.T, Vn.T, Vm.Ts[index]`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier:

#### Vector, half-precision

Can be one of 4H or 8H.

#### Vector, single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Ts**

Is an element size specifier:

#### Vector, half-precision

Must be H.

**Vector, single-precision and double-precision**

Can be one of s or d.

***index***

Is the element index:

**Vector, half-precision**

Must be H:L:M.

**Vector, single-precision and double-precision**

Can be one of H:L or H.

***Vm***

Is the name of the second SIMD and FP source register in the range 0 to 31.

**Architectures supported (vector)**

Supported in the Arm®v8.2 architecture and later.

**Usage**

Floating-point Multiply extended (by element). This instruction multiplies the floating-point values in the vector elements in the first source SIMD and FP register by the specified floating-point value in the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

Before each multiplication, a check is performed for whether one value is infinite and the other is zero. In this case, if only one of the values is negative, the result is 2.0, otherwise the result is -2.0.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-15: FMULX (Vector, single-precision and double-precision) specifier combinations**

| <b>T</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| 2S       | S         | 0 to 3       |
| 4S       | S         | 0 to 3       |
| 2D       | D         | 0 or 1       |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.83 FMULX (vector) (A64)

Floating-point Multiply extended.

### Syntax

`FMULX Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier:

#### Vector half precision

Can be one of `4H` or `8H`.

#### Vector single-precision and double-precision

Can be one of `2S`, `4S` or `2D`.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD and FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD and FP register.

If one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the values is negative, otherwise the result is positive.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.84 FNEG (vector) (A64)

Floating-point Negate (vector).

### Syntax

`FNEG Vd.T, Vn.T ; Half-precision`

`FNEG Vd.T, Vn.T ; Single-precision and double-precision`

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of `4H` or `8H`.

#### Single-precision and double-precision

Can be one of `2S`, `4S` or `2D`.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD and FP register, writes the result to a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the `CPACR_EL1`, `CPTR_EL2`, and `CPTR_EL3` registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.85 FRECPE (vector) (A64)

Floating-point Reciprocal Estimate.

### Syntax

`FRECPE Vd.T, Vn.T ; Vector half precision`

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

`FRECPE Vd.T, Vn.T ; Vector single-precision and double-precision`

Where:

**Vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD and FP register, places the result in a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.86 FRECPS (vector) (A64)

Floating-point Reciprocal Step.

### Syntax

`FRECPS Vd.T, Vn.T, Vm.T ; Vector half precision`

`FRECPS Vd.T, Vn.T, Vm.T ; Vector single-precision and double-precision`

Where:

***Vd***

Is the name of the SIMD and FP destination register

***T***

Is an arrangement specifier:

#### **Vector half precision**

Can be one of `4H` or `8H`.

#### **Vector single-precision and double-precision**

Can be one of `2S`, `4S` or `2D`.

***Vn***

Is the name of the first SIMD and FP source register

***Vm***

Is the name of the second SIMD and FP source register

### **Architectures supported (vector)**

Supported in the Arm®v8.2 architecture and later.

### **Usage**

Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD and FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### **Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## **21.87 FRECPX (vector) (A64)**

Floating-point Reciprocal exponent (scalar).

### **Syntax**

`FRECPX Hd, Hn ; Half-precision`

`FRECPX Vd, Vn ; Single-precision and double-precision`

Where:

**Hd**

Is the 16-bit name of the SIMD and FP destination register.

**Hn**

Is the 16-bit name of the SIMD and FP source register.

**v**

Is a width specifier, and can be either s or d.

**d**

Is the number of the SIMD and FP destination register.

**n**

Is the number of the SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD and FP register, places the result in a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.88 FRINTA (vector) (A64)

Floating-point Round to Integral, to nearest with ties to Away (vector).

### Syntax

FRINTA Vd.T, Vn.T ; Half-precision

FRINTA Vd.T, Vn.T ; Single-precision and double-precision

Where:

**T**

Is an arrangement specifier:

### Half-precision

Can be one of 4H or 8H.

### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD and FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD and FP destination register.

A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.89 FRINTI (vector) (A64)

Floating-point Round to Integral, using current rounding mode (vector).

### Syntax

FRINTI Vd.T, Vn.T ; Half-precision

FRINTI Vd.T, Vn.T ; Single-precision and double-precision

Where:

**T**

Is an arrangement specifier:

### Half-precision

Can be one of 4H or 8H.

### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD and FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD and FP destination register.

A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.90 FRINTM (vector) (A64)

Floating-point Round to Integral, toward Minus infinity (vector).

### Syntax

FRINTM Vd.T, Vn.T ; Half-precision

FRINTM Vd.T, Vn.T ; Single-precision and double-precision

Where:

**T**

Is an arrangement specifier:

#### Half-precision

Can be one of 4H or 8H.

#### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

#### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

#### Usage

Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD and FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD and FP destination register.

A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

#### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.91 FRINTN (vector) (A64)

Floating-point Round to Integral, to nearest with ties to even (vector).

#### Syntax

FRINTN Vd.T, Vn.T ; Half-precision

FRINTN Vd.T, Vn.T ; Single-precision and double-precision

Where:

**T**

Is an arrangement specifier:

### Half-precision

Can be one of 4H or 8H.

### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD and FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD and FP destination register.

A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.92 FRINTP (vector) (A64)

Floating-point Round to Integral, toward Plus infinity (vector).

### Syntax

FRINTP Vd.T, Vn.T ; Half-precision

FRINTP Vd.T, Vn.T ; Single-precision and double-precision

Where:

**T**

Is an arrangement specifier:

### Half-precision

Can be one of 4H or 8H.

### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD and FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD and FP destination register.

A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.93 FRINTX (vector) (A64)

Floating-point Round to Integral exact, using current rounding mode (vector).

### Syntax

FRINTX Vd.T, Vn.T ; Half-precision

FRINTX Vd.T, Vn.T ; Single-precision and double-precision

Where:

**T**

Is an arrangement specifier:

### Half-precision

Can be one of 4H or 8H.

### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD and FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD and FP destination register.

An Inexact exception is raised when the result value is not numerically equal to the input value. A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.94 FRINTZ (vector) (A64)

Floating-point Round to Integral, toward Zero (vector).

### Syntax

FRINTZ Vd.T, Vn.T ; Half-precision

`FRINTZ Vd.T, Vn.T ; Single-precision and double-precision`

Where:

**T**

Is an arrangement specifier:

### Half-precision

Can be one of 4H or 8H.

### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD and FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD and FP destination register.

A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.95 FRSQRTE (vector) (A64)

Floating-point Reciprocal Square Root Estimate.

## Syntax

`FRSQRTE Vd.T, Vn.T ; Vector half precision`

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

`FRSQRTE Vd.T, Vn.T ; Vector single-precision and double-precision`

Where:

**Vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD and FP register, places the result in a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.96 FRSQRTS (vector) (A64)

Floating-point Reciprocal Square Root Step.

### Syntax

`FRSQRTS Vd.T, Vn.T, Vm.T ; Vector half precision`

`FRSQRTS Vd.T, Vn.T, Vm.T ; Vector single-precision and double-precision`

Where:

***vd***

Is the name of the SIMD and FP destination register

***t***

Is an arrangement specifier:

#### **Vector half precision**

Can be one of 4H or 8H.

#### **Vector single-precision and double-precision**

Can be one of 2S, 4S or 2D.

***vn***

Is the name of the first SIMD and FP source register

***vm***

Is the name of the second SIMD and FP source register

### **Architectures supported (vector)**

Supported in the Arm®v8.2 architecture and later.

### **Usage**

Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD and FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### **Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## **21.97 FSQRT (vector) (A64)**

Floating-point Square Root (vector).

### **Syntax**

`FSQRT Vd.T, Vn.T ; Half-precision`

`FSQRT Vd.T, Vn.T ; Single-precision and double-precision`

Where:

**T**

Is an arrangement specifier:

### Half-precision

Can be one of 4H or 8H.

### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

**Vd**

Is the name of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD and FP register, places the result in a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.98 FSUB (vector) (A64)

Floating-point Subtract (vector).

### Syntax

FSUB Vd.T, Vn.T, Vm.T ; Half-precision

FSUB Vd.T, Vn.T, Vm.T ; Single-precision and double-precision

Where:

***T***

Is an arrangement specifier:

### Half-precision

Can be one of 4H or 8H.

### Single-precision and double-precision

Can be one of 2S, 4S or 2D.

***Vd***

Is the name of the SIMD and FP destination register.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

## Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

## Usage

Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD and FP register, from the corresponding elements in the vector in the first source SIMD and FP register, places each result into elements of a vector, and writes the vector to the destination SIMD and FP register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.99 INS (vector, element) (A64)

Insert vector element from another vector element.

This instruction is used by the alias `MOV` (element).

## Syntax

`INS Vd.Ts[index1], Vn.Ts[index2]`

Where:

***vd***

Is the name of the SIMD and FP destination register.

***ts***

Is an element size specifier, and can be one of the values shown in Usage.

***index1***

Is the destination element index, in the range shown in Usage.

***vn***

Is the name of the SIMD and FP source register.

***index2***

Is the source element index in the range shown in Usage.

## Usage

Insert vector element from another vector element. This instruction copies the vector element of the source SIMD and FP register to the specified vector element of the destination SIMD and FP register.

This instruction can insert data into individual elements within a SIMD and FP register without clearing the remaining bits to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-16: INS (Vector) specifier combinations**

| <b><i>ts</i></b> | <b><i>index1</i></b> | <b><i>index2</i></b> |
|------------------|----------------------|----------------------|
| B                | 0 to 15              | 0 to 15              |
| H                | 0 to 7               | 0 to 7               |
| S                | 0 to 3               | 0 to 3               |
| D                | 0 or 1               | 0 or 1               |

## Related information

[MOV \(vector, element\) \(A64\)](#) on page 1260

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.100 INS (vector, general) (A64)

Insert vector element from general-purpose register.

This instruction is used by the alias `mov` (from general).

## Syntax

`INS Vd.Ts[index], Rn`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**Ts**

Is an element size specifier, and can be one of the values shown in Usage.

**index**

Is the element index, in the range shown in Usage.

**R**

Is the width specifier for the general-purpose source register, and can be either w or x.

**n**

Is the number [0-30] of the general-purpose source register or ZR (31).

## Usage

Insert vector element from general-purpose register. This instruction copies the contents of the source general-purpose register to the specified vector element in the destination SIMD and FP register.

This instruction can insert data into individual elements within a SIMD and FP register without clearing the remaining bits to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-17: INS (Vector) specifier combinations**

| <b>Ts</b> | <b>index</b> | <b>R</b> |
|-----------|--------------|----------|
| B         | 0 to 15      | W        |
| H         | 0 to 7       | W        |
| S         | 0 to 3       | W        |
| D         | 0 or 1       | X        |

## Related information

[MOV \(vector, from general\) \(A64\)](#) on page 1261

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.101 LD1 (vector, multiple structures) (A64)

Load multiple single-element structures to one, two, three, or four registers.

### Syntax

```
LD1 {Vt.T}, [Xn|SP] ; One register

LD1 {Vt.T, Vt2.T}, [Xn|SP] ; Two registers

LD1 {Vt.T, Vt2.T, Vt3.T}, [Xn|SP] ; Three registers

LD1 {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP] ; Four registers

LD1 {Vt.T}, [Xn|SP], imm ; One register, immediate offset, Post-index

LD1 {Vt.T}, [Xn|SP], Xm ; One register, register offset, Post-index

LD1 {Vt.T, Vt2.T}, [Xn|SP], imm ; Two registers, immediate offset, Post-index

LD1 {Vt.T, Vt2.T}, [Xn|SP], Xm ; Two registers, register offset, Post-index

LD1 {Vt.T, Vt2.T, Vt3.T}, [Xn|SP], imm ; Three registers, immediate offset, Post-index

LD1 {Vt.T, Vt2.T, Vt3.T}, [Xn|SP], Xm ; Three registers, register offset, Post-index

LD1 {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP], imm ; Four registers, immediate offset, Post-index

LD1 {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP], Xm ; Four registers, register offset, Post-index
```

Where:

#### **vt**

Is the name of the first or only SIMD and FP register to be transferred.

#### **vt2**

Is the name of the second SIMD and FP register to be transferred.

#### **vt3**

Is the name of the third SIMD and FP register to be transferred.

#### **vt4**

Is the name of the fourth SIMD and FP register to be transferred.

#### **imm**

Is the post-index immediate offset:

#### **One register, immediate offset**

Can be one of #8 or #16.

**Two registers, immediate offset**

Can be one of #16 or #32.

**Three registers, immediate offset**

Can be one of #24 or #48.

**Four registers, immediate offset**

Can be one of #32 or #64.

***Xm***

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

***T***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load multiple single-element structures to one, two, three, or four registers. This instruction loads multiple single-element structures from memory and writes the result to one, two, three, or four SIMD and FP registers.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following tables show valid specifier combinations:

**Table 21-18: LD1 (One register, immediate offset) specifier combinations**

| <i>T</i> | <i>i.mn</i> |
|----------|-------------|
| 8B       | #8          |
| 16B      | #16         |
| 4H       | #8          |
| 8H       | #16         |
| 2S       | #8          |
| 4S       | #16         |
| 1D       | #8          |
| 2D       | #16         |

**Table 21-19: LD1 (Two registers, immediate offset) specifier combinations**

| <i>T</i> | <i>i.mn</i> |
|----------|-------------|
| 8B       | #16         |
| 16B      | #32         |
| 4H       | #16         |
| 8H       | #32         |
| 2S       | #16         |

| <i>T</i> | <i>imm</i> |
|----------|------------|
| 4S       | #32        |
| 1D       | #16        |
| 2D       | #32        |

**Table 21-20: LD1 (Three registers, immediate offset) specifier combinations**

| <i>T</i> | <i>imm</i> |
|----------|------------|
| 8B       | #24        |
| 16B      | #48        |
| 4H       | #24        |
| 8H       | #48        |
| 2S       | #24        |
| 4S       | #48        |
| 1D       | #24        |
| 2D       | #48        |

**Table 21-21: LD1 (Four registers, immediate offset) specifier combinations**

| <i>T</i> | <i>imm</i> |
|----------|------------|
| 8B       | #32        |
| 16B      | #64        |
| 4H       | #32        |
| 8H       | #64        |
| 2S       | #32        |
| 4S       | #64        |
| 1D       | #32        |
| 2D       | #64        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.102 LD1 (vector, single structure) (A64)

Load one single-element structure to one lane of one register.

### Syntax

LD1 {Vt.B }[index], [Xn|SP] ; 8-bit

LD1 {Vt.H }[index], [Xn|SP] ; 16-bit

LD1 {Vt.S }[index], [Xn|SP] ; 32-bit

LD1 {Vt.D }[index], [Xn|SP] ; 64-bit

```

LD1  {Vt.B }[index], [Xn|SP], #1 ; 8-bit, immediate offset, Post-index
LD1  {Vt.B }[index], [Xn|SP], Xm ; 8-bit, register offset, Post-index
LD1  {Vt.H }[index], [Xn|SP], #2 ; 16-bit, immediate offset, Post-index
LD1  {Vt.H }[index], [Xn|SP], Xm ; 16-bit, register offset, Post-index
LD1  {Vt.S }[index], [Xn|SP], #4 ; 32-bit, immediate offset, Post-index
LD1  {Vt.S }[index], [Xn|SP], Xm ; 32-bit, register offset, Post-index
LD1  {Vt.D }[index], [Xn|SP], #8 ; 64-bit, immediate offset, Post-index
LD1  {Vt.D }[index], [Xn|SP], Xm ; 64-bit, register offset, Post-index

```

Where:

#### **vt**

Is the name of the first or only SIMD and FP register to be transferred.

#### **index**

The value depends on the instruction variant:

**8-bit** Is the element index, in the range 0 to 15.

**16-bit** Is the element index, in the range 0 to 7.

**32-bit** Is the element index, in the range 0 to 3.

**64-bit** Is the element index, and can be either 0 or 1.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

#### **Xm**

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

## Usage

Load one single-element structure to one lane of one register. This instruction loads a single-element structure from memory and writes the result to the specified lane of the SIMD and FP register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.103 LD1R (vector) (A64)

Load one single-element structure and Replicate to all lanes (of one register).

### Syntax

```
LD1R {Vt.T}, [Xn|SP] ; No offset
LD1R {Vt.T}, [Xn|SP], imm ; Immediate offset, Post-index
LD1R {Vt.T}, [Xn|SP], Xm ; Register offset, Post-index
```

Where:

***imm***

Is the post-index immediate offset, and can be one of the values shown in Usage.

***Xm***

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

***Vt***

Is the name of the first or only SIMD and FP register to be transferred.

***T***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

### Usage

Load one single-element structure and Replicate to all lanes (of one register). This instruction loads a single-element structure from memory and replicates the structure to all the lanes of the SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-22: LD1R (Immediate offset) specifier combinations**

| <b><i>T</i></b> | <b><i>imm</i></b> |
|-----------------|-------------------|
| 8B              | #1                |
| 16B             | #1                |
| 4H              | #2                |
| 8H              | #2                |
| 2S              | #4                |
| 4S              | #4                |
| 1D              | #8                |
| 2D              | #8                |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.104 LD2 (vector, multiple structures) (A64)

Load multiple 2-element structures to two registers.

### Syntax

```
LD2 {Vt.T, Vt2.T}, [Xn|SP] ; No offset
LD2 {Vt.T, Vt2.T}, [Xn|SP], imm ; Immediate offset, Post-index
LD2 {Vt.T, Vt2.T}, [Xn|SP], Xm ; Register offset, Post-index
```

Where:

**Vt**

Is the name of the first or only SIMD and FP register to be transferred.

**Vt2**

Is the name of the second SIMD and FP register to be transferred.

**imm**

Is the post-index immediate offset, and can be either #16 or #32.

**Xm**

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Usage

Load multiple 2-element structures to two registers. This instruction loads multiple 2-element structures from memory and writes the result to the two SIMD and FP registers, with de-interleaving.

For an example of de-interleaving, see [LD3 \(multiple structures\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.105 LD2 (vector, single structure) (A64)

Load single 2-element structure to one lane of two registers.

### Syntax

```
LD2 {Vt.B, Vt2.B }[index], [Xn|SP] ; 8-bit
LD2 {Vt.H, Vt2.H }[index], [Xn|SP] ; 16-bit
LD2 {Vt.S, Vt2.S }[index], [Xn|SP] ; 32-bit
LD2 {Vt.D, Vt2.D }[index], [Xn|SP] ; 64-bit
LD2 {Vt.B, Vt2.B }[index], [Xn|SP], #2 ; 8-bit, immediate offset, Post-index
LD2 {Vt.B, Vt2.B }[index], [Xn|SP], Xm ; 8-bit, register offset, Post-index
LD2 {Vt.H, Vt2.H }[index], [Xn|SP], #4 ; 16-bit, immediate offset, Post-index
LD2 {Vt.H, Vt2.H }[index], [Xn|SP], Xm ; 16-bit, register offset, Post-index
LD2 {Vt.S, Vt2.S }[index], [Xn|SP], #8 ; 32-bit, immediate offset, Post-index
LD2 {Vt.S, Vt2.S }[index], [Xn|SP], Xm ; 32-bit, register offset, Post-index
LD2 {Vt.D, Vt2.D }[index], [Xn|SP], #16 ; 64-bit, immediate offset, Post-index
LD2 {Vt.D, Vt2.D }[index], [Xn|SP], Xm ; 64-bit, register offset, Post-index
```

Where:

#### **vt**

Is the name of the first or only SIMD and FP register to be transferred.

#### **vt2**

Is the name of the second SIMD and FP register to be transferred.

#### **index**

The value depends on the instruction variant:

**8-bit** Is the element index, in the range 0 to 15.

**16-bit** Is the element index, in the range 0 to 7.

**32-bit** Is the element index, in the range 0 to 3.

**64-bit** Is the element index, and can be either 0 or 1.

#### **Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Xm**

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

**Usage**

Load single 2-element structure to one lane of two registers. This instruction loads a 2-element structure from memory and writes the result to the corresponding elements of the two SIMD and FP registers without affecting the other bits of the registers.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.106 LD2R (vector) (A64)

Load single 2-element structure and Replicate to all lanes of two registers.

**Syntax**

```
LD2R {Vt.T, Vt2.T}, [Xn|SP] ; No offset
LD2R {Vt.T, Vt2.T}, [Xn|SP], imm ; Immediate offset, Post-index
LD2R {Vt.T, Vt2.T}, [Xn|SP], Xm ; Register offset, Post-index
```

Where:

**Vt**

Is the name of the first or only SIMD and FP register to be transferred.

**Vt2**

Is the name of the second SIMD and FP register to be transferred.

**imm**

Is the post-index immediate offset, and can be one of the values shown in Usage.

**Xm**

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Load single 2-element structure and Replicate to all lanes of two registers. This instruction loads a 2-element structure from memory and replicates the structure to all the lanes of the two SIMD and FP registers.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-23: LD2R (Immediate offset) specifier combinations**

| <b>T</b> | <b>imm</b> |
|----------|------------|
| 8B       | #2         |
| 16B      | #2         |
| 4H       | #4         |
| 8H       | #4         |
| 2S       | #8         |
| 4S       | #8         |
| 1D       | #16        |
| 2D       | #16        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.107 LD3 (vector, multiple structures) (A64)

Load multiple 3-element structures to three registers.

### Syntax

LD3 {Vt.T, Vt2.T, Vt3.T}, [Xn|SP] ; No offset

LD3 {Vt.T, Vt2.T, Vt3.T}, [Xn|SP], imm ; Immediate offset, Post-index

LD3 {Vt.T, Vt2.T, Vt3.T}, [Xn|SP], Xm ; Register offset, Post-index

Where:

**Vt**

Is the name of the first or only SIMD and FP register to be transferred.

**Vt2**

Is the name of the second SIMD and FP register to be transferred.

**Vt3**

Is the name of the third SIMD and FP register to be transferred.

***imm***

Is the post-index immediate offset, and can be either #24 or #48.

***Xm***

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Xn|SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Load multiple 3-element structures to three registers. This instruction loads multiple 3-element structures from memory and writes the result to the three SIMD and FP registers, with de-interleaving.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.108 LD3 (vector, single structure) (A64)

Load single 3-element structure to one lane of three registers).

**Syntax**

```
LD3 {Vt.B, Vt2.B, Vt3.B }[index], [Xn|SP] ; 8-bit
LD3 {Vt.H, Vt2.H, Vt3.H }[index], [Xn|SP] ; 16-bit
LD3 {Vt.S, Vt2.S, Vt3.S }[index], [Xn|SP] ; 32-bit
LD3 {Vt.D, Vt2.D, Vt3.D }[index], [Xn|SP] ; 64-bit
LD3 {Vt.B, Vt2.B, Vt3.B }[index], [Xn|SP], #3 ; 8-bit, immediate offset, Post-index
LD3 {Vt.B, Vt2.B, Vt3.B }[index], [Xn|SP], Xm ; 8-bit, register offset, Post-index
LD3 {Vt.H, Vt2.H, Vt3.H }[index], [Xn|SP], #6 ; 16-bit, immediate offset, Post-index
LD3 {Vt.H, Vt2.H, Vt3.H }[index], [Xn|SP], Xm ; 16-bit, register offset, Post-index
LD3 {Vt.S, Vt2.S, Vt3.S }[index], [Xn|SP], #12 ; 32-bit, immediate offset, Post-index
LD3 {Vt.S, Vt2.S, Vt3.S }[index], [Xn|SP], Xm ; 32-bit, register offset, Post-index
```

`LD3 {Vt.D, Vt2.D, Vt3.D }[index], [Xn|SP], #24 ; 64-bit, immediate offset, Post-index`

`LD3 {Vt.D, Vt2.D, Vt3.D }[index], [Xn|SP], Xm ; 64-bit, register offset, Post-index`

Where:

**Vt**

Is the name of the first or only SIMD and FP register to be transferred.

**Vt2**

Is the name of the second SIMD and FP register to be transferred.

**Vt3**

Is the name of the third SIMD and FP register to be transferred.

**index**

The value depends on the instruction variant:

**8-bit** Is the element index, in the range 0 to 15.

**16-bit** Is the element index, in the range 0 to 7.

**32-bit** Is the element index, in the range 0 to 3.

**64-bit** Is the element index, and can be either 0 or 1.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Xm**

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

## Usage

Load single 3-element structure to one lane of three registers). This instruction loads a 3-element structure from memory and writes the result to the corresponding elements of the three SIMD and FP registers without affecting the other bits of the registers.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.109 LD3R (vector) (A64)

Load single 3-element structure and Replicate to all lanes of three registers.

### Syntax

`LD3R {Vt.T, Vt2.T, Vt3.T }, [Xn|SP] ; No offset`

`LD3R {Vt.T, Vt2.T, Vt3.T}, [Xn|SP], imm ; Immediate offset, Post-index`

`LD3R {Vt.T, Vt2.T, Vt3.T}, [Xn|SP], Xm ; Register offset, Post-index`

Where:

**Vt**

Is the name of the first or only SIMD and FP register to be transferred.

**Vt2**

Is the name of the second SIMD and FP register to be transferred.

**Vt3**

Is the name of the third SIMD and FP register to be transferred.

**imm**

Is the post-index immediate offset, and can be one of the values shown in Usage.

**Xm**

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Load single 3-element structure and Replicate to all lanes of three registers. This instruction loads a 3-element structure from memory and replicates the structure to all the lanes of the three SIMD and FP registers.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-24: LD3R (Immediate offset) specifier combinations**

| <b>T</b> | <b>imm</b> |
|----------|------------|
| 8B       | #3         |
| 16B      | #3         |
| 4H       | #6         |
| 8H       | #6         |
| 2S       | #12        |
| 4S       | #12        |
| 1D       | #24        |
| 2D       | #24        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.110 LD4 (vector, multiple structures) (A64)

Load multiple 4-element structures to four registers.

### Syntax

```
LD4 {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP] ; No offset
LD4 {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP], imm ; Immediate offset, Post-index
LD4 {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP], Xm ; Register offset, Post-index
```

Where:

**Vt**

Is the name of the first or only SIMD and FP register to be transferred.

**Vt2**

Is the name of the second SIMD and FP register to be transferred.

**Vt3**

Is the name of the third SIMD and FP register to be transferred.

**Vt4**

Is the name of the fourth SIMD and FP register to be transferred.

**imm**

Is the post-index immediate offset, and can be either #32 or #64.

**Xm**

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 2D.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

### Usage

Load multiple 4-element structures to four registers. This instruction loads multiple 4-element structures from memory and writes the result to the four SIMD and FP registers, with de-interleaving.

For an example of de-interleaving, see [LD3 \(multiple structures\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.111 LD4 (vector, single structure) (A64)

Load single 4-element structure to one lane of four registers.

### Syntax

```

LD4  {Vt.B, Vt2.B, Vt3.B, Vt4.B }[index], [Xn|SP] ; 8-bit
LD4  {Vt.H, Vt2.H, Vt3.H, Vt4.H }[index], [Xn|SP] ; 16-bit
LD4  {Vt.S, Vt2.S, Vt3.S, Vt4.S }[index], [Xn|SP] ; 32-bit
LD4  {Vt.D, Vt2.D, Vt3.D, Vt4.D }[index], [Xn|SP] ; 64-bit
LD4  {Vt.B, Vt2.B, Vt3.B, Vt4.B }[index], [Xn|SP], #4 ; 8-bit, immediate offset, Post-
index
LD4  {Vt.B, Vt2.B, Vt3.B, Vt4.B }[index], [Xn|SP], Xm ; 8-bit, register offset, Post-
index
LD4  {Vt.H, Vt2.H, Vt3.H, Vt4.H }[index], [Xn|SP], #8 ; 16-bit, immediate offset, Post-
index
LD4  {Vt.H, Vt2.H, Vt3.H, Vt4.H }[index], [Xn|SP], Xm ; 16-bit, register offset, Post-
index
LD4  {Vt.S, Vt2.S, Vt3.S, Vt4.S }[index], [Xn|SP], #16 ; 32-bit, immediate offset,
Post-index
LD4  {Vt.S, Vt2.S, Vt3.S, Vt4.S }[index], [Xn|SP], Xm ; 32-bit, register offset, Post-
index
LD4  {Vt.D, Vt2.D, Vt3.D, Vt4.D }[index], [Xn|SP], #32 ; 64-bit, immediate offset,
Post-index
LD4  {Vt.D, Vt2.D, Vt3.D, Vt4.D }[index], [Xn|SP], Xm ; 64-bit, register offset, Post-
index

```

Where:

#### **vt**

Is the name of the first or only SIMD and FP register to be transferred.

#### **vt2**

Is the name of the second SIMD and FP register to be transferred.

***vt3***

Is the name of the third SIMD and FP register to be transferred.

***vt4***

Is the name of the fourth SIMD and FP register to be transferred.

***index***

The value depends on the instruction variant:

**8-bit** Is the element index, in the range 0 to 15.

**16-bit** Is the element index, in the range 0 to 7.

**32-bit** Is the element index, in the range 0 to 3.

**64-bit** Is the element index, and can be either 0 or 1.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***Xm***

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

## Usage

Load single 4-element structure to one lane of four registers. This instruction loads a 4-element structure from memory and writes the result to the corresponding elements of the four SIMD and FP registers without affecting the other bits of the registers.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.112 LD4R (vector) (A64)

Load single 4-element structure and Replicate to all lanes of four registers.

### Syntax

```
LD4R {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP] ; No offset
LD4R {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP], imm ; Immediate offset, Post-index
LD4R {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP], Xm ; Register offset, Post-index
```

Where:

***vt***

Is the name of the first or only SIMD and FP register to be transferred.

***vt2***

Is the name of the second SIMD and FP register to be transferred.

***vt3***

Is the name of the third SIMD and FP register to be transferred.

***vt4***

Is the name of the fourth SIMD and FP register to be transferred.

***imm***

Is the post-index immediate offset, and can be one of the values shown in Usage.

***Xm***

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

***T***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Load single 4-element structure and Replicate to all lanes of four registers. This instruction loads a 4-element structure from memory and replicates the structure to all the lanes of the four SIMD and FP registers.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-25: LD4R (Immediate offset) specifier combinations**

| <b><i>T</i></b> | <b><i>imm</i></b> |
|-----------------|-------------------|
| 8B              | #4                |
| 16B             | #4                |
| 4H              | #8                |
| 8H              | #8                |
| 2S              | #16               |
| 4S              | #16               |
| 1D              | #32               |
| 2D              | #32               |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.113 MLA (vector, by element) (A64)

Multiply-Add to accumulator (vector, by element).

### Syntax

`MLA Vd.T, Vn.T, Vm.Ts[index]`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register:

- If  $Ts$  is H, then  $Vm$  must be in the range V0 to V15.
- If  $Ts$  is S, then  $Vm$  must be in the range V0 to V31.

**Ts**

Is an element size specifier, and can be either H or S.

**index**

Is the element index, in the range shown in Usage.

### Usage

Multiply-Add to accumulator (vector, by element). This instruction multiplies the vector elements in the first source SIMD and FP register by the specified value in the second source SIMD and FP register, and accumulates the results with the vector elements of the destination SIMD and FP register. All the values in this instruction are unsigned integer values.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-26: MLA (Vector) specifier combinations**

| <b>T</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| 4H       | H         | 0 to 7       |
| 8H       | H         | 0 to 7       |
| 2S       | S         | 0 to 3       |
| 4S       | S         | 0 to 3       |

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.114 MLA (vector) (A64)

Multiply-Add to accumulator (vector).

**Syntax**

```
MLA Vd.T, Vn.T, Vm.T
```

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**Usage**

Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD and FP registers, and accumulates the results with the vector elements of the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.115 MLS (vector, by element) (A64)

Multiply-Subtract from accumulator (vector, by element).

**Syntax**

```
MLS Vd.T, Vn.T, Vm.Ts[index]
```

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register:

- If  $T_s$  is H, then  $V_m$  must be in the range V0 to V15.
- If  $T_s$  is S, then  $V_m$  must be in the range V0 to V31.

**Ts**

Is an element size specifier, and can be either H or S.

**index**

Is the element index, in the range shown in Usage.

## Usage

Multiply-Subtract from accumulator (vector, by element). This instruction multiplies the vector elements in the first source SIMD and FP register by the specified value in the second source SIMD and FP register, and subtracts the results from the vector elements of the destination SIMD and FP register. All the values in this instruction are unsigned integer values.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-27: MLS (Vector) specifier combinations**

| T  | Ts | index  |
|----|----|--------|
| 4H | H  | 0 to 7 |
| 8H | H  | 0 to 7 |
| 2S | S  | 0 to 3 |
| 4S | S  | 0 to 3 |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.116 MLS (vector) (A64)

Multiply-Subtract from accumulator (vector).

### Syntax

MLS Vd.T, Vn.T, Vm.T

Where:

***vd***

Is the name of the SIMD and FP destination register.

***ts***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

***vn***

Is the name of the first SIMD and FP source register.

***vm***

Is the name of the second SIMD and FP source register.

**Usage**

Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD and FP registers, and subtracts the results from the vector elements of the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.117 MOV (vector, element) (A64)

Move vector element to another vector element.

This instruction is an alias of `INS` (element).

The equivalent instruction is `INS vd.Ts[index1], vn.Ts[index2]`.

**Syntax**

`MOV vd.Ts[index1], vn.Ts[index2]`

Where:

***vd***

Is the name of the SIMD and FP destination register.

***ts***

Is an element size specifier, and can be one of the values shown in Usage.

***index1***

Is the destination element index, in the range shown in Usage.

***vn***

Is the name of the SIMD and FP source register.

***index2***

Is the source element index in the range shown in Usage.

## Usage

Move vector element to another vector element. This instruction copies the vector element of the source SIMD and FP register to the specified vector element of the destination SIMD and FP register.

This instruction can insert data into individual elements within a SIMD and FP register without clearing the remaining bits to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-28: MOV (Vector) specifier combinations**

| <i>Ts</i> | <i>index1</i> | <i>index2</i> |
|-----------|---------------|---------------|
| B         | 0 to 15       | 0 to 15       |
| H         | 0 to 7        | 0 to 7        |
| S         | 0 to 3        | 0 to 3        |
| D         | 0 or 1        | 0 or 1        |

## Related information

[INS \(vector, element\) \(A64\)](#) on page 1238

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.118 MOV (vector, from general) (A64)

Move general-purpose register to a vector element.

This instruction is an alias of `INS` (general).

The equivalent instruction is `INS vd.Ts[index], Rn`.

## Syntax

`MOV vd.Ts[index], Rn`

Where:

**vd**

Is the name of the SIMD and FP destination register.

**Ts**

Is an element size specifier, and can be one of the values shown in Usage.

**index**

Is the element index, in the range shown in Usage.

**R**

Is the width specifier for the general-purpose source register, and can be either w or x.

**n**

Is the number [0-30] of the general-purpose source register or ZR (31).

## Usage

Move general-purpose register to a vector element. This instruction copies the contents of the source general-purpose register to the specified vector element in the destination SIMD and FP register.

This instruction can insert data into individual elements within a SIMD and FP register without clearing the remaining bits to zero.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-29: MOV (Vector) specifier combinations**

| <b>Ts</b> | <b>index</b> | <b>R</b> |
|-----------|--------------|----------|
| B         | 0 to 15      | W        |
| H         | 0 to 7       | W        |
| S         | 0 to 3       | W        |
| D         | 0 or 1       | X        |

## Related information

[INS \(vector, general\) \(A64\)](#) on page 1239

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.119 MOV (vector) (A64)

Move vector.

This instruction is an alias of ORR (vector, register).

The equivalent instruction is ORR  $Vd.T, Vn.T, Vn.T$ .

## Syntax

MOV  $Vd.T, Vn.T$

Where:

**vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either **8B** or **16B**.

**Vn**

Is the name of the first SIMD and FP source register.

## Usage

Move vector. This instruction copies the vector in the source SIMD and FP register into the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[ORR \(vector, register\) \(A64\)](#) on page 1272

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.120 MOV (vector, to general) (A64)

Move vector element to general-purpose register.

This instruction is an alias of **UMOV**.

The equivalent instruction is **UMOV *wd*, *Vn.S[index]***.

## Syntax

**MOV *wd*, *Vn.S[index]* ; 32-bit**

**MOV *Xd*, *Vn.D[index]* ; 64-bit**

Where:

**wd**

Is the 32-bit name of the general-purpose destination register.

**index**

The value depends on the instruction variant:

**32-bit** Is the element index.

**64-bit** Is the element index and can be either 0 or 1.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Vn**

Is the name of the SIMD and FP source register.

## Usage

Move vector element to general-purpose register. This instruction reads the unsigned integer from the source SIMD and FP register, zero-extends it to form a 32-bit or 64-bit value, and writes the result to the destination general-purpose register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[UMOV \(vector\) \(A64\)](#) on page 1393

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.121 MOVI (vector) (A64)

Move Immediate (vector).

### Syntax

```
MOVI Vd.T, #imm8, LSL #0 ; 8-bit
MOVI Vd.T, #imm8, LSL #{amount} ; 16-bit shifted immediate
MOVI Vd.T, #imm8, LSL #{amount} ; 32-bit shifted immediate
MOVI Vd.T, #imm8, MSL #amount ; 32-bit shifting ones
MOVI Dd, #imm ; 64-bit scalar
MOVI Vd.2D, #imm ; 64-bit vector
```

Where:

**Vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

#### 8-bit

Can be one of 8B or 16B.

#### 16-bit shifted immediate

Can be one of 4H or 8H.

#### 32-bit shifted immediate

Can be one of 2S or 4S.

#### 32-bit shifting ones

Can be one of 2S or 4S.

***imm8***

Is an 8-bit immediate.

***amount***

Is the shift amount:

**16-bit shifted immediate**

Can be one of 0 or 8.

**32-bit shifted immediate**

Can be one of 0, 8, 16 or 24.

**32-bit shifting ones**

Can be one of 8 or 16.

Defaults to zero if `LSL` is omitted.

***Dd***

Is the 64-bit name of the SIMD and FP destination register.

***imm***

Is a 64-bit immediate.

**Usage**

Move Immediate (vector). This instruction places an immediate constant into every vector element of the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.122 MUL (vector, by element) (A64)

Multiply (vector, by element).

**Syntax**

`MUL Vd.T, Vn.T, Vm.Ts[index]`

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register:

- If  $T_s$  is H, then  $V_m$  must be in the range V0 to V15.
- If  $T_s$  is S, then  $V_m$  must be in the range V0 to V31.

***Ts***

Is an element size specifier, and can be either H or S.

***index***

Is the element index, in the range shown in Usage.

**Usage**

Multiply (vector, by element). This instruction multiplies the vector elements in the first source SIMD and FP register by the specified value in the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are unsigned integer values.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-30: MUL (Vector) specifier combinations**

| <b><i>T</i></b> | <b><i>Ts</i></b> | <b><i>index</i></b> |
|-----------------|------------------|---------------------|
| 4H              | H                | 0 to 7              |
| 8H              | H                | 0 to 7              |
| 2S              | S                | 0 to 3              |
| 4S              | S                | 0 to 3              |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.123 MUL (vector) (A64)

Multiply (vector).

**Syntax**

MUL  $Vd.T, Vn.T, Vm.T$

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

## Usage

Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD and FP registers, places the results in a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.124 MVN (vector) (A64)

Bitwise NOT (vector).

This instruction is an alias of `NOT`.

The equivalent instruction is `NOT Vd.T, Vn.T`.

## Syntax

`MVN Vd.T, Vn.T`

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be either `8B` or `16B`.

***Vn***

Is the name of the SIMD and FP source register.

## Usage

Bitwise NOT (vector). This instruction reads each vector element from the source SIMD and FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[NOT \(vector\) \(A64\)](#) on page 1269

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.125 MVNI (vector) (A64)

Move inverted Immediate (vector).

### Syntax

```
MVNI Vd.T, #imm8, LSL #{amount} ; 16-bit shifted immediate
MVNI Vd.T, #imm8, LSL #{amount} ; 32-bit shifted immediate
MVNI Vd.T, #imm8, MSL #amount ; 32-bit shifting ones
```

Where:

**T**

Is an arrangement specifier:

#### 16-bit shifted immediate

Can be one of 4H or 8H.

#### 32-bit shifted immediate

Can be one of 2S or 4S.

#### 32-bit shifting ones

Can be one of 2S or 4S.

**amount**

Is the shift amount:

#### 16-bit shifted immediate

Can be one of 0 or 8.

#### 32-bit shifted immediate

Can be one of 0, 8, 16 or 24.

#### 32-bit shifting ones

Can be one of 8 or 16.

Defaults to zero if LSL is omitted.

**Vd**

Is the name of the SIMD and FP destination register.

**imm8**

Is an 8-bit immediate.

## Usage

Move inverted Immediate (vector). This instruction places the inverse of an immediate constant into every vector element of the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.126 NEG (vector) (A64)

Negate (vector).

### Syntax

`NEG Vd.T, Vn.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register.

## Usage

Negate (vector). This instruction reads each vector element from the source SIMD and FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.127 NOT (vector) (A64)

Bitwise NOT (vector).

This instruction is used by the alias `MVN`.

## Syntax

`NOT Vd.T, Vn.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either `8B` or `16B`.

**Vn**

Is the name of the SIMD and FP source register.

## Usage

Bitwise NOT (vector). This instruction reads each vector element from the source SIMD and FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[MVN \(vector\) \(A64\)](#) on page 1267

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.128 ORN (vector) (A64)

Bitwise inclusive OR NOT (vector).

## Syntax

`ORN Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either `8B` or `16B`.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Usage

Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD and FP registers, and writes the result to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.129 ORR (vector, immediate) (A64)

Bitwise inclusive OR (vector, immediate).

### Syntax

`ORR Vd.T, #imm8, LSL #{amount}`

Where:

***T***

Is an arrangement specifier:

#### 16-bit

Can be one of 4H or 8H.

#### 32-bit

Can be one of 2S or 4S.

***amount***

Is the shift amount:

#### 16-bit

Can be one of 0 or 8.

#### 32-bit

Can be one of 0, 8, 16 or 24.

Defaults to zero if *LSL* is omitted.

***Vd***

Is the name of the SIMD and FP register.

***imm8***

Is an 8-bit immediate.

## Usage

Bitwise inclusive OR (vector, immediate). This instruction reads each vector element from the destination SIMD and FP register, performs a bitwise OR between each result and an immediate

constant, places the result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.130 ORR (vector, register) (A64)

Bitwise inclusive OR (vector, register).

This instruction is used by the alias `mov` (vector).

### Syntax

`ORR Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either `8B` or `16B`.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD and FP registers, and writes the result to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[MOV \(vector\) \(A64\)](#) on page 1262

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.131 PMUL (vector) (A64)

Polynomial Multiply.

### Syntax

`PMUL Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either 8B or 16B.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Polynomial Multiply. This instruction multiplies corresponding elements in the vectors of the two source SIMD and FP registers, places the results in a vector, and writes the vector to the destination SIMD and FP register.

For information about multiplying polynomials see *Polynomial arithmetic over {0, 1}* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.132 PMULL, PMULL2 (vector) (A64)

Polynomial Multiply Long.

### Syntax

`PMULL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

***Vd***

Is the name of the SIMD and FP destination register.

***Ta***

Is an arrangement specifier, 8H.

***Vn***

Is the name of the first SIMD and FP source register.

***Tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Vm***

Is the name of the second SIMD and FP source register.

## Usage

Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD and FP registers, places the results in a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

For information about multiplying polynomials see *Polynomial arithmetic over {0, 1}* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

The PMULL instruction extracts each source vector from the lower half of each source register, while the PMULL2 instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-31: PMULL, PMULL2 (Vector) specifier combinations**

| <Q> | <i>Ta</i> | <i>Tb</i> |
|-----|-----------|-----------|
| -   | 8H        | 8B        |
| 2   | 8H        | 16B       |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.133 RADDHN, RADDHN2 (vector) (A64)

Rounding Add returning High Narrow.

### Syntax

RADDHN2*Vd.Tb, Vn.Ta, Vm.Ta*

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vn**

Is the name of the first SIMD and FP source register.

**ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vm**

Is the name of the second SIMD and FP source register.

## Usage

Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD and FP register to the corresponding vector element in the second source SIMD and FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register.

The results are rounded. For truncated results, see [ADDHN, ADDHN2 \(vector\) \(A64\)](#).

The `RADDHN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `RADDHN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-32: RADDHN, RADDHN2 (Vector) specifier combinations**

| <Q> | <b>Tb</b> | <b>Ta</b> |
|-----|-----------|-----------|
| -   | 8B        | 8H        |
| 2   | 16B       | 8H        |
| -   | 4H        | 4S        |
| 2   | 8H        | 4S        |
| -   | 2S        | 2D        |
| 2   | 4S        | 2D        |

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.134 RBIT (vector) (A64)

Reverse Bit order (vector).

**Syntax**

`RBIT Vd.T, Vn.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either `8B` or `16B`.

**Vn**

Is the name of the SIMD and FP source register.

**Usage**

Reverse Bit order (vector). This instruction reads each vector element from the source SIMD and FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.135 REV16 (vector) (A64)

Reverse elements in 16-bit halfwords (vector).

**Syntax**

`REV16 Vd.T, Vn.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either `8B` or `16B`.

***Vn***

Is the name of the SIMD and FP source register.

**Usage**

Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.136 REV32 (vector) (A64)

Reverse elements in 32-bit words (vector).

**Syntax**

REV32 *Vd.T, Vn.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of **8B**, **16B**, **4H** or **8H**.

***Vn***

Is the name of the SIMD and FP source register.

**Usage**

Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.137 REV64 (vector) (A64)

Reverse elements in 64-bit doublewords (vector).

### Syntax

`REV64 Vd.T, Vn.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the SIMD and FP source register.

### Usage

Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.138 RSHRN, RSHRN2 (vector) (A64)

Rounding Shift Right Narrow (immediate).

### Syntax

`RSHRN2Vd.Tb, Vn.Ta, #shift`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

***Vn***

Is the name of the SIMD and FP source register.

***Ta***

Is an arrangement specifier, and can be one of the values shown in Usage.

***shift***

Is the right shift amount, in the range 1 to the destination element width in bits, and can be one of the values shown in Usage.

**Usage**

Rounding Shift Right Narrow (immediate). This instruction reads each unsigned integer value from the vector in the source SIMD and FP register, right shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. The destination vector elements are half as long as the source vector elements. The results are rounded. For truncated results, see [SHRN, SHRN2 \(vector\) \(A64\)](#).

The `RSHRN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `RSHRN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-33: RSHRN, RSHRN2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b><i>Tb</i></b> | <b><i>Ta</i></b> | <b><i>shift</i></b> |
|------------------|------------------|------------------|---------------------|
| -                | 8B               | 8H               | 1 to 8              |
| 2                | 16B              | 8H               | 1 to 8              |
| -                | 4H               | 4S               | 1 to 16             |
| 2                | 8H               | 4S               | 1 to 16             |
| -                | 2S               | 2D               | 1 to 32             |
| 2                | 4S               | 2D               | 1 to 32             |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.139 RSUBHN, RSUBHN2 (vector) (A64)

Rounding Subtract returning High Narrow.

**Syntax**

`RSUBHN2Vd.Tb, Vn.Ta, Vm.Ta`

Where:

Copyright © 2014–2017, 2019–2020, 2023 Arm Limited (or its affiliates). All rights reserved.

Non-Confidential

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vn**

Is the name of the first SIMD and FP source register.

**ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vm**

Is the name of the second SIMD and FP source register.

## Usage

Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD and FP register from the corresponding vector element of the first source SIMD and FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register.

The results are rounded. For truncated results, see [SUBHN, SUBHN2 \(vector\) \(A64\)](#).

The `RSUBHN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `RSUBHN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-34: RSUBHN, RSUBHN2 (Vector) specifier combinations**

| <Q> | Tb  | Ta |
|-----|-----|----|
| -   | 8B  | 8H |
| 2   | 16B | 8H |
| -   | 4H  | 4S |
| 2   | 8H  | 4S |
| -   | 2S  | 2D |
| 2   | 4S  | 2D |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.140 SABA (vector) (A64)

Signed Absolute difference and Accumulate.

### Syntax

`SABA Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD and FP register from the corresponding elements of the first source SIMD and FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.141 SABAL, SABAL2 (vector) (A64)

Signed Absolute difference and Accumulate Long.

### Syntax

`SABAL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

***vd***

Is the name of the SIMD and FP destination register.

***ta***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vn***

Is the name of the first SIMD and FP source register.

***tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vm***

Is the name of the second SIMD and FP source register.

## Usage

Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD and FP register from the corresponding vector elements of the first source SIMD and FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements.

The `SABAL` instruction extracts each source vector from the lower half of each source register, while the `SBAL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-35: SABAL, SABAL2 (Vector) specifier combinations**

| <Q> | <i>ta</i> | <i>tb</i> |
|-----|-----------|-----------|
| -   | 8H        | 8B        |
| 2   | 8H        | 16B       |
| -   | 4S        | 4H        |
| 2   | 4S        | 8H        |
| -   | 2D        | 2S        |
| 2   | 2D        | 4S        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.142 SABD (vector) (A64)

Signed Absolute Difference.

### Syntax

`SABD Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD and FP register from the corresponding elements of the first source SIMD and FP register, places the absolute values of the results into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.143 SABDL, SABDL2 (vector) (A64)

Signed Absolute Difference Long.

### Syntax

`SABDL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

***vd***

Is the name of the SIMD and FP destination register.

***ta***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vn***

Is the name of the first SIMD and FP source register.

***tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vm***

Is the name of the second SIMD and FP source register.

**Usage**

Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD and FP register from the corresponding vector elements of the first source SIMD and FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements.

The `SABDL` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `SABDL2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-36: SABDL, SABDL2 (Vector) specifier combinations**

| <Q> | <i>ta</i> | <i>tb</i> |
|-----|-----------|-----------|
| -   | 8H        | 8B        |
| 2   | 8H        | 16B       |
| -   | 4S        | 4H        |
| 2   | 4S        | 8H        |
| -   | 2D        | 2S        |
| 2   | 2D        | 4S        |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.144 SADALP (vector) (A64)

Signed Add and Accumulate Long Pairwise.

### Syntax

`SADALP Vd.Ta, Vn.Tb`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

### Usage

Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD and FP register and accumulates the results into the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-37: SADALP (Vector) specifier combinations**

| <b>Ta</b> | <b>Tb</b> |
|-----------|-----------|
| 4H        | 8B        |
| 8H        | 16B       |
| 2S        | 4H        |
| 4S        | 8H        |
| 1D        | 2S        |
| 2D        | 4S        |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.145 SADDL, SADDL2 (vector) (A64)

Signed Add Long (vector).

### Syntax

`SADDL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD and FP register to the corresponding vector element of the second source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.

The `SADDL` instruction extracts each source vector from the lower half of each source register, while the `SADDL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-38: SADDL, SADDL2 (Vector) specifier combinations**

| <Q> | Ta | Tb  |
|-----|----|-----|
| -   | 8H | 8B  |
| 2   | 8H | 16B |
| -   | 4S | 4H  |

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> |
|------------------|-----------|-----------|
| 2                | 4S        | 8H        |
| -                | 2D        | 2S        |
| 2                | 2D        | 4S        |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.146 SADDLP (vector) (A64)

Signed Add Long Pairwise.

### Syntax

SADDLP *Vd.Ta, Vn.Tb*

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

### Usage

Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD and FP register, places the result into a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-39: SADDLP (Vector) specifier combinations**

| <b>Ta</b> | <b>Tb</b> |
|-----------|-----------|
| 4H        | 8B        |
| 8H        | 16B       |
| 2S        | 4H        |
| 4S        | 8H        |

| <i>Ta</i> | <i>Tb</i> |
|-----------|-----------|
| 1D        | 2S        |
| 2D        | 4S        |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.147 SADDLV (vector) (A64)

Signed Add Long across Vector.

**Syntax**

SADDLV *Vd*, *Vn.T*

Where:

**v**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Usage**

Signed Add Long across Vector. This instruction adds every vector element in the source SIMD and FP register together, and writes the scalar result to the destination SIMD and FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-40: SADDLV (Vector) specifier combinations**

| <b>v</b> | <b>T</b> |
|----------|----------|
| H        | 8B       |
| H        | 16B      |
| S        | 4H       |
| S        | 8H       |
| D        | 4S       |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

# 21.148 SADDW, SADDW2 (vector) (A64)

Signed Add Wide.

## Syntax

`SADDW2Vd.Ta, Vn.Ta, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

## Usage

Signed Add Wide. This instruction adds vector elements of the first source SIMD and FP register to the corresponding vector elements in the lower or upper half of the second source SIMD and FP register, places the results in a vector, and writes the vector to the SIMD and FP destination register.

The `SADDW` instruction extracts the second source vector from the lower half of the second source register, while the `SADDW2` instruction extracts the second source vector from the upper half of the second source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-41: SADDW, SADDW2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> |
|------------------|-----------|-----------|
| -                | 8H        | 8B        |
| 2                | 8H        | 16B       |
| -                | 4S        | 4H        |
| 2                | 4S        | 8H        |
| -                | 2D        | 2S        |
| 2                | 2D        | 4S        |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.149 SCVTF (vector, fixed-point) (A64)

Signed fixed-point Convert to Floating-point (vector).

**Syntax**

**SCVTF Vd.T, Vn.T, #fbits**

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**fbits**

Is the number of fractional bits, in the range 1 to the element width.

**Usage**

Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-42: SCVTF (Vector) specifier combinations**

| <b>T</b> | <b>fbits</b> |
|----------|--------------|
| 4H       | -            |
| 8H       | -            |
| 2S       | 1 to 32      |
| 4S       | 1 to 32      |
| 2D       | 1 to 64      |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.150 SCVTF (vector, integer) (A64)

Signed integer Convert to Floating-point (vector).

### Syntax

SCVTF *Vd.T, Vn.T ; Vector half precision*

SCVTF *Vd.T, Vn.T ; Vector single-precision and double-precision*

Where:

**vd**

Is the name of the SIMD and FP destination register

**T**

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

**vn**

Is the name of the SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Signed integer Convert to Floating-point (vector). This instruction converts each element in a vector from signed integer to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.151 SHADD (vector) (A64)

Signed Halving Add.

### Syntax

SHADD *Vd.T, Vn.T, Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

### Usage

Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD and FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD and FP register.

The results are truncated. For rounded results, see [SRHADD \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.152 SHL (vector) (A64)

Shift Left (immediate).

### Syntax

`SHL Vd.T, Vn.T, #shift`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**shift**

Is the left shift amount, in the range 0 to the element width in bits minus 1, and can be one of the values shown in Usage.

### Usage

Shift Left (immediate). This instruction reads each value from a vector, right shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-43: SHL (Vector) specifier combinations**

| <b>T</b> | <b>shift</b> |
|----------|--------------|
| 8B       | 0 to 7       |
| 16B      | 0 to 7       |
| 4H       | 0 to 15      |
| 8H       | 0 to 15      |
| 2S       | 0 to 31      |
| 4S       | 0 to 31      |
| 2D       | 0 to 63      |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.153 SHLL, SHLL2 (vector) (A64)

Shift Left Long (by element size).

### Syntax

`SHLL2Vd.Ta, Vn.Tb, #shift`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**shift**

Is the left shift amount, which must be equal to the source element width in bits, and can be one of the values shown in Usage.

### Usage

Shift Left Long (by element size). This instruction reads each vector element in the lower or upper half of the source SIMD and FP register, left shifts each result by the element size, writes the final result to a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements.

The `SHLL` instruction extracts vector elements from the lower half of the source register, while the `SHLL2` instruction extracts vector elements from the upper half of the source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-44: SHLL, SHLL2 (Vector) specifier combinations**

| <Q> | Ta | Tb  | shift |
|-----|----|-----|-------|
| -   | 8H | 8B  | 8     |
| 2   | 8H | 16B | 8     |
| -   | 4S | 4H  | 16    |

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> | <b>shift</b> |
|------------------|-----------|-----------|--------------|
| 2                | 4S        | 8H        | 16           |
| -                | 2D        | 2S        | 32           |
| 2                | 2D        | 4S        | 32           |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.154 SHRN, SHRN2 (vector) (A64)

Shift Right Narrow (immediate).

### Syntax

`SHRN2Vd.Tb, Vn.Ta, #shift`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**shift**

Is the right shift amount, in the range 1 to the destination element width in bits, and can be one of the values shown in Usage.

### Usage

Shift Right Narrow (immediate). This instruction reads each unsigned integer value from the source SIMD and FP register, right shifts each result by an immediate value, puts the final result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. The destination vector elements are half as long as the source vector elements. The results are truncated. For rounded results, see [RSHRN, RSHRN2 \(vector\) \(A64\)](#).

The `RSHRN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `RSHRN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-45: SHRN, SHRN2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Tb</b> | <b>Ta</b> | <b>shift</b> |
|------------------|-----------|-----------|--------------|
| -                | 8B        | 8H        | 1 to 8       |
| 2                | 16B       | 8H        | 1 to 8       |
| -                | 4H        | 4S        | 1 to 16      |
| 2                | 8H        | 4S        | 1 to 16      |
| -                | 2S        | 2D        | 1 to 32      |
| 2                | 4S        | 2D        | 1 to 32      |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.155 SHSUB (vector) (A64)

Signed Halving Subtract.

### Syntax

**SHSUB Vd.T, Vn.T, Vm.T**

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD and FP register from the corresponding elements in the vector in the first source SIMD and FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.156 SLI (vector) (A64)

Shift Left and Insert (immediate).

### Syntax

`SLI Vd.T, Vn.T, #shift`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**shift**

Is the left shift amount, in the range 0 to the element width in bits minus 1, and can be one of the values shown in Usage.

### Usage

Shift Left and Insert (immediate). This instruction reads each vector element in the source SIMD and FP register, left shifts each vector element by an immediate value, and inserts the result into the corresponding vector element in the destination SIMD and FP register such that the new zero bits created by the shift are not inserted but retain their existing value. Bits shifted out of the left of each vector element in the source register are lost.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-46: SLI (Vector) specifier combinations**

| <b>T</b> | <b>shift</b> |
|----------|--------------|
| 8B       | 0 to 7       |
| 16B      | 0 to 7       |
| 4H       | 0 to 15      |
| 8H       | 0 to 15      |
| 2S       | 0 to 31      |
| 4S       | 0 to 31      |
| 2D       | 0 to 63      |

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.157 SMAX (vector) (A64)

Signed Maximum (vector).

**Syntax**

`SMAX Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**Usage**

Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD and FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.158 SMAXP (vector) (A64)

Signed Maximum Pairwise.

**Syntax**

`SMAXP Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Usage

Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD and FP register after the vector elements of the second source SIMD and FP register, reads each pair of adjacent vector elements in the two source SIMD and FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.159 SMAXV (vector) (A64)

Signed Maximum across Vector.

### Syntax

`SMAXV Vd, Vn.T`

Where:

**v**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

## Usage

Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD and FP register, and writes the largest of the values as a scalar to the destination SIMD and FP register. All the values in this instruction are signed integer values.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-47: SMAXV (Vector) specifier combinations**

| <b>V</b> | <b>T</b> |
|----------|----------|
| B        | 8B       |
| B        | 16B      |
| H        | 4H       |
| H        | 8H       |
| S        | 4S       |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.160 SMIN (vector) (A64)

Signed Minimum (vector).

### Syntax

`SMIN Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD and FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.161 SMINP (vector) (A64)

Signed Minimum Pairwise.

### Syntax

`SMINP Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD and FP register after the vector elements of the second source SIMD and FP register, reads each pair of adjacent vector elements in the two source SIMD and FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.162 SMINV (vector) (A64)

Signed Minimum across Vector.

### Syntax

`SMINV Vd, Vn.T`

Where:

**V**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**vn**

Is the name of the SIMD and FP source register.

**t**

Is an arrangement specifier, and can be one of the values shown in Usage.

## Usage

Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD and FP register, and writes the smallest of the values as a scalar to the destination SIMD and FP register. All the values in this instruction are signed integer values.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-48: SMINV (Vector) specifier combinations**

| v | t   |
|---|-----|
| B | 8B  |
| B | 16B |
| H | 4H  |
| H | 8H  |
| S | 4S  |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.163 SMLAL, SMLAL2 (vector, by element) (A64)

Signed Multiply-Add Long (vector, by element).

### Syntax

`SMLAL2Vd.Ta, Vn.Tb, Vm.Ts[index]`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

***Ta***

Is an arrangement specifier, and can be either 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

***Tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Vm***

Is the name of the second SIMD and FP source register:

- If *ts* is H, then *vm* must be in the range V0 to V15.
- If *ts* is S, then *vm* must be in the range V0 to V31.

***Ts***

Is an element size specifier, and can be either H or S.

***index***

Is the element index, in the range shown in Usage.

## Usage

Signed Multiply-Add Long (vector, by element). This instruction multiplies each vector element in the lower or upper half of the first source SIMD and FP register by the specified vector element in the second source SIMD and FP register, and accumulates the results with the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are signed integer values.

The **SMLAL** instruction extracts vector elements from the lower half of the first source register, while the **SMLAL2** instruction extracts vector elements from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-49: SMLAL, SMLAL2 (Vector) specifier combinations**

| <Q> | <b><i>Ta</i></b> | <b><i>Tb</i></b> | <b><i>Ts</i></b> | <b><i>index</i></b> |
|-----|------------------|------------------|------------------|---------------------|
| -   | 4S               | 4H               | H                | 0 to 7              |
| 2   | 4S               | 8H               | H                | 0 to 7              |
| -   | 2D               | 2S               | S                | 0 to 3              |
| 2   | 2D               | 4S               | S                | 0 to 3              |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.164 SMLAL, SMLAL2 (vector) (A64)

Signed Multiply-Add Long (vector).

### Syntax

`SMLAL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See `<Q>` in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD and FP registers, and accumulates the results with the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

The `SMLAL` instruction extracts each source vector from the lower half of each source register, while the `SMLAL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-50: SMLAL, SMLAL2 (Vector) specifier combinations**

| <code>&lt;Q&gt;</code> | <code>Ta</code> | <code>Tb</code> |
|------------------------|-----------------|-----------------|
| -                      | 8H              | 8B              |
| 2                      | 8H              | 16B             |
| -                      | 4S              | 4H              |
| 2                      | 4S              | 8H              |

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> |
|------------------|-----------|-----------|
| -                | 2D        | 2S        |
| 2                | 2D        | 4S        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.165 SMLSL, SMLSL2 (vector, by element) (A64)

Signed Multiply-Subtract Long (vector, by element).

### Syntax

`SMLSL2Vd.Ta, Vn.Tb, Vm.Ts[index]`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See **<Q>** in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be either **4S** or **2D**.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register:

- If **Ts** is **H**, then **Vm** must be in the range V0 to V15.
- If **Ts** is **s**, then **Vm** must be in the range V0 to V31.

**Ts**

Is an element size specifier, and can be either **H** or **s**.

**index**

Is the element index, in the range shown in Usage.

### Usage

Signed Multiply-Subtract Long (vector, by element). This instruction multiplies each vector element in the lower or upper half of the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register and subtracts the results from the vector elements of

the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

The **SMLSL** instruction extracts vector elements from the lower half of the first source register, while the **SMLSL2** instruction extracts vector elements from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-51: SMLSL, SMLSL2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> | <b>Ts</b> | <b>index</b> |
|------------------|-----------|-----------|-----------|--------------|
| -                | 4S        | 4H        | H         | 0 to 7       |
| 2                | 4S        | 8H        | H         | 0 to 7       |
| -                | 2D        | 2S        | S         | 0 to 3       |
| 2                | 2D        | 4S        | S         | 0 to 3       |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.166 SMLSL, SMLSL2 (vector) (A64)

Signed Multiply-Subtract Long (vector).

### Syntax

**SMLSL2Vd.Ta, Vn.Tb, Vm.Tb**

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See **<Q>** in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**V<sub>m</sub>**

Is the name of the second SIMD and FP source register.

**Usage**

Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD and FP registers, and subtracts the results from the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

The **SMLS<sub>L</sub>** instruction extracts each source vector from the lower half of each source register, while the **SMLS<sub>L2</sub>** instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-52: SMLS<sub>L</sub>, SMLS<sub>L2</sub> (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>T<sub>a</sub></b> | <b>T<sub>b</sub></b> |
|------------------|----------------------|----------------------|
| -                | 8H                   | 8B                   |
| 2                | 8H                   | 16B                  |
| -                | 4S                   | 4H                   |
| 2                | 4S                   | 8H                   |
| -                | 2D                   | 2S                   |
| 2                | 2D                   | 4S                   |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.167 SMOV (vector) (A64)

Signed Move vector element to general-purpose register.

**Syntax**

**SMOV Wd, Vn.Ts[index] ; 32-bit**

**SMOV Xd, Vn.Ts[index] ; 64-bit**

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Ts**

Is an element size specifier:

**32-bit**

Can be one of `B` or `H`.

**64-bit**

Can be one of `B`, `H` or `S`.

***index***

Is the element index, in the range shown in Usage.

***xd***

Is the 64-bit name of the general-purpose destination register.

***vn***

Is the name of the SIMD and FP source register.

**Usage**

Signed Move vector element to general-purpose register. This instruction reads the signed integer from the source SIMD and FP register, sign-extends it to form a 32-bit or 64-bit value, and writes the result to destination general-purpose register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following tables show valid specifier combinations:

**Table 21-53: SMOV (32-bit) specifier combinations**

| <b><i>Ts</i></b> | <b><i>index</i></b> |
|------------------|---------------------|
| B                | 0 to 15             |
| H                | 0 to 7              |

**Table 21-54: SMOV (64-bit) specifier combinations**

| <b><i>Ts</i></b> | <b><i>index</i></b> |
|------------------|---------------------|
| B                | 0 to 15             |
| H                | 0 to 7              |
| S                | 0 to 3              |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.168 SMULL, SMULL2 (vector, by element) (A64)

Signed Multiply Long (vector, by element).

**Syntax**

`SMULL2Vd.Ta, Vn.Tb, Vm.Ts[index]`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**ta**

Is an arrangement specifier, and can be either 4S or 2D.

**vn**

Is the name of the first SIMD and FP source register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vm**

Is the name of the second SIMD and FP source register:

- If  $ts$  is H, then  $vm$  must be in the range V0 to V15.
- If  $ts$  is S, then  $vm$  must be in the range V0 to V31.

**ts**

Is an element size specifier, and can be either H or S.

**index**

Is the element index, in the range shown in Usage.

## Usage

Signed Multiply Long (vector, by element). This instruction multiplies each vector element in the lower or upper half of the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register, places the result in a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

The **SMULL** instruction extracts vector elements from the lower half of the first source register, while the **SMULL2** instruction extracts vector elements from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-55: SMULL, SMULL2 (Vector) specifier combinations**

| <Q> | <b>ta</b> | <b>tb</b> | <b>ts</b> | <b>index</b> |
|-----|-----------|-----------|-----------|--------------|
| -   | 4S        | 4H        | H         | 0 to 7       |
| 2   | 4S        | 8H        | H         | 0 to 7       |
| -   | 2D        | 2S        | S         | 0 to 3       |

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> | <b>Ts</b> | <b>index</b> |
|------------------|-----------|-----------|-----------|--------------|
| 2                | 2D        | 4S        | S         | 0 to 3       |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.169 SMULL, SMULL2 (vector) (A64)

Signed Multiply Long (vector).

### Syntax

`SMULL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD and FP registers, places the results in a vector, and writes the vector to the destination SIMD and FP register.

The destination vector elements are twice as long as the elements that are multiplied.

The `SMULL` instruction extracts each source vector from the lower half of each source register, while the `SMULL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-56: SMULL, SMULL2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> |
|------------------|-----------|-----------|
| -                | 8H        | 8B        |
| 2                | 8H        | 16B       |
| -                | 4S        | 4H        |
| 2                | 4S        | 8H        |
| -                | 2D        | 2S        |
| 2                | 2D        | 4S        |

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.170 SQABS (vector) (A64)

Signed saturating Absolute value.

**Syntax**

**SQABS Vd.T, Vn.T**

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register.

**Usage**

Signed saturating Absolute value. This instruction reads each vector element from the source SIMD and FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are signed integer values.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.171 SQADD (vector) (A64)

Signed saturating Add.

### Syntax

`SQADD Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD and FP registers, places the results into a vector, and writes the vector to the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.172 SQDMLAL, SQDMLAL2 (vector, by element) (A64)

Signed saturating Doubling Multiply-Add Long (by element).

### Syntax

`SQDMLAL2Vd.Ta, Vn.Tb, Vm.Ts[index]`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

***vd***

Is the name of the SIMD and FP destination register.

***ta***

Is an arrangement specifier, and can be either `4S` or `2D`.

***vn***

Is the name of the first SIMD and FP source register.

***tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vm***

Is the name of the second SIMD and FP source register:

- If  $ts$  is `H`, then  $vm$  must be in the range V0 to V15.
- If  $ts$  is `S`, then  $vm$  must be in the range V0 to V31.

***ts***

Is an element size specifier, and can be either `H` or `S`.

***index***

Is the element index, in the range shown in Usage.

**Usage**

Signed saturating Doubling Multiply-Add Long (by element). This instruction multiplies each vector element in the lower or upper half of the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register, doubles the results, and accumulates the final results with the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `SQDMAL` instruction extracts vector elements from the lower half of the first source register, while the `SQDMAL2` instruction extracts vector elements from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-57: SQDMAL{2} (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b><i>ta</i></b> | <b><i>tb</i></b> | <b><i>ts</i></b> | <b><i>index</i></b> |
|------------------|------------------|------------------|------------------|---------------------|
| -                | 4S               | 4H               | H                | 0 to 7              |
| 2                | 4S               | 8H               | H                | 0 to 7              |
| -                | 2D               | 2S               | S                | 0 to 3              |
| 2                | 2D               | 4S               | S                | 0 to 3              |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.173 SQDMLAL, SQDMLAL2 (vector) (A64)

Signed saturating Doubling Multiply-Add Long.

### Syntax

`SQDMLAL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be either 4s or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD and FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `SQDMLAL` instruction extracts each source vector from the lower half of each source register, while the `SQDMLAL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-58: SQDMLAL{2} (Vector) specifier combinations**

| <Q> | <i>Ta</i> | <i>Tb</i> |
|-----|-----------|-----------|
| -   | 4S        | 4H        |
| 2   | 4S        | 8H        |
| -   | 2D        | 2S        |
| 2   | 2D        | 4S        |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.174 SQDMLSL, SQDMLSL2 (vector, by element) (A64)

Signed saturating Doubling Multiply-Subtract Long (by element).

### Syntax

`SQDMLSL2Vd.Ta, Vn.Tb, Vm.Ts[index]`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be either 4s or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register:

- If *Ts* is H, then *Vm* must be in the range V0 to V15.
- If *Ts* is S, then *Vm* must be in the range V0 to V31.

**Ts**

Is an element size specifier, and can be either H or S.

**index**

Is the element index, in the range shown in Usage.

## Usage

Signed saturating Doubling Multiply-Subtract Long (by element). This instruction multiplies each vector element in the lower or upper half of the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register, doubles the results, and subtracts the final results from the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are signed integer values.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `SQDMLSL` instruction extracts vector elements from the lower half of the first source register, while the `SQDMLSL2` instruction extracts vector elements from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-59: SQDMLSL{2} (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> | <b>Ts</b> | <b>index</b> |
|------------------|-----------|-----------|-----------|--------------|
| -                | 4S        | 4H        | H         | 0 to 7       |
| 2                | 4S        | 8H        | H         | 0 to 7       |
| -                | 2D        | 2S        | S         | 0 to 3       |
| 2                | 2D        | 4S        | S         | 0 to 3       |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.175 SQDMLSL, SQDMLSL2 (vector) (A64)

Signed saturating Doubling Multiply-Subtract Long.

### Syntax

`SQDMLSL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See **<Q>** in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be either 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register.

## Usage

Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD and FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `SQDMILSL` instruction extracts each source vector from the lower half of each source register, while the `SQDMILSL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-60: SQDMILSL{2} (Vector) specifier combinations**

| <Q> | <b>Ta</b> | <b>Tb</b> |
|-----|-----------|-----------|
| -   | 4S        | 4H        |
| 2   | 4S        | 8H        |
| -   | 2D        | 2S        |
| 2   | 2D        | 4S        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.176 SQDMULH (vector, by element) (A64)

Signed saturating Doubling Multiply returning High half (by element).

### Syntax

`SQDMULH Vd.T, Vn.T, Vm.Ts[index]`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register:

- If  $Ts$  is H, then  $Vm$  must be in the range V0 to V15.
- If  $Ts$  is S, then  $Vm$  must be in the range V0 to V31.

**Ts**

Is an element size specifier, and can be either H or S.

**index**

Is the element index, in the range shown in Usage.

### Usage

Signed saturating Doubling Multiply returning High half (by element). This instruction multiplies each vector element in the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD and FP register.

The results are truncated. For rounded results, see [SQRDMULH \(vector, by element\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-61: SQDMULH (Vector) specifier combinations**

| <b>T</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| 4H       | H         | 0 to 7       |
| 8H       | H         | 0 to 7       |
| 2S       | S         | 0 to 3       |
| 4S       | S         | 0 to 3       |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.177 SQDMULH (vector) (A64)

Signed saturating Doubling Multiply returning High half.

**Syntax**

`SQDMULH Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**Usage**

Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD and FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD and FP register.

The results are truncated. For rounded results, see [SQRDMULH \(vector\) \(A64\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.178 SQDMULL, SQDMULL2 (vector, by element) (A64)

Signed saturating Doubling Multiply Long (by element).

### Syntax

`SQDMULL2Vd.Ta, Vn.Tb, Vm.Ts[index]`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be either `4s` or `2D`.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register:

- If `Ts` is `H`, then `Vm` must be in the range V0 to V15.
- If `Ts` is `s`, then `Vm` must be in the range V0 to V31.

**Ts**

Is an element size specifier, and can be either `H` or `s`.

**index**

Is the element index, in the range shown in Usage.

### Usage

Signed saturating Doubling Multiply Long (by element). This instruction multiplies each vector element in the lower or upper half of the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are signed integer values.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `SQDMULL` instruction extracts the first source vector from the lower half of the first source register, while the `SQDMULL2` instruction extracts the first source vector from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-62: SQDMULL{2} (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> | <b>Ts</b> | <b>index</b> |
|------------------|-----------|-----------|-----------|--------------|
| -                | 4S        | 4H        | H         | 0 to 7       |
| 2                | 4S        | 8H        | H         | 0 to 7       |
| -                | 2D        | 2S        | S         | 0 to 3       |
| 2                | 2D        | 4S        | S         | 0 to 3       |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.179 SQDMULL, SQDMULL2 (vector) (A64)

Signed saturating Doubling Multiply Long.

### Syntax

`SQDMULL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See **<Q>** in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be either **4s** or **2d**.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD and FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `SQDMULL` instruction extracts each source vector from the lower half of each source register, while the `SQDMULL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-63: SQDMULL{2} (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> |
|------------------|-----------|-----------|
| -                | 4S        | 4H        |
| 2                | 4S        | 8H        |
| -                | 2D        | 2S        |
| 2                | 2D        | 4S        |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.180 SQNEG (vector) (A64)

Signed saturating Negate.

### Syntax

`SQNEG Vd.T, Vn.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register.

### Usage

Signed saturating Negate. This instruction reads each vector element from the source SIMD and FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are signed integer values.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

# 21.181 SQRDMLAH (vector, by element) (A64)

Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element).

## Syntax

`SQRDMLAH Vd.T, Vn.T, Vm.Ts[index]`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register:

- If  $ts$  is  $H$ , then  $v_m$  must be in the range V0 to V15.
- If  $ts$  is  $s$ , then  $v_m$  must be in the range V0 to V31.

**Ts**

Is an element size specifier, and can be either  $H$  or  $s$ .

**index**

Is the element index, in the range shown in Usage.

## Architectures supported (vector)

Supported in the Arm®v8.1 architecture and later.

## Usage

Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element). This instruction multiplies the vector elements of the first source SIMD and FP register with the value of a vector element of the second source SIMD and FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD and FP register. The results are rounded.

If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-64: SQRDMLAH (Vector) specifier combinations**

| <b>T</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| 4H       | H         | 0 to 7       |
| 8H       | H         | 0 to 7       |
| 2S       | S         | 0 to 3       |
| 4S       | S         | 0 to 3       |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.182 SQRDMLAH (vector) (A64)

Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector).

### Syntax

`SQRDMLAH Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.1 architecture and later.

### Usage

Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD and FP register with the corresponding vector elements of the second source SIMD and FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD and FP register. The results are rounded.

If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

# 21.183 SQRDMLSH (vector, by element) (A64)

Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element).

## Syntax

SQRDMLSH *Vd.T, Vn.T, Vm.Ts[index]*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register:

- If *Ts* is H, then *Vm* must be in the range V0 to V15.
- If *Ts* is S, then *Vm* must be in the range V0 to V31.

***Ts***

Is an element size specifier, and can be either H or S.

***index***

Is the element index, in the range shown in Usage.

## Architectures supported (vector)

Supported in the Arm®v8.1 architecture and later.

## Usage

Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element). This instruction multiplies the vector elements of the first source SIMD and FP register with the value of a vector element of the second source SIMD and FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD and FP register. The results are rounded.

If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-65: SQRDMLSH (Vector) specifier combinations**

| <i>T</i> | <i>Ts</i> | <i>index</i> |
|----------|-----------|--------------|
| 4H       | H         | 0 to 7       |
| 8H       | H         | 0 to 7       |
| 2S       | S         | 0 to 3       |
| 4S       | S         | 0 to 3       |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.184 SQRDMLSH (vector) (A64)

Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector).

### Syntax

`SQRDMLSH Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Architectures supported (vector)

Supported in the Arm®v8.1 architecture and later.

### Usage

Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD and FP register with the corresponding vector elements of the second source SIMD and FP register without saturating the

multiplies results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD and FP register. The results are rounded.

If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.185 SQRDMLH (vector, by element) (A64)

Signed saturating Rounding Doubling Multiply returning High half (by element).

### Syntax

`SQRDMLH Vd.T, Vn.T, Vm.Ts[index]`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register:

- If  $ts$  is  $H$ , then  $v_m$  must be in the range V0 to V15.
- If  $ts$  is  $s$ , then  $v_m$  must be in the range V0 to V31.

**Ts**

Is an element size specifier, and can be either  $H$  or  $s$ .

**index**

Is the element index, in the range shown in Usage.

### Usage

Signed saturating Rounding Doubling Multiply returning High half (by element). This instruction multiplies each vector element in the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD and FP register.

The results are rounded. For truncated results, see [SQDMULH \(vector, by element\) \(A64\)](#).

If any of the results overflows, they are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-66: SQRDMLH (Vector) specifier combinations**

| <b>T</b> | <b>Ts</b> | <b>index</b> |
|----------|-----------|--------------|
| 4H       | H         | 0 to 7       |
| 8H       | H         | 0 to 7       |
| 2S       | S         | 0 to 3       |
| 4S       | S         | 0 to 3       |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.186 SQRDMLH (vector) (A64)

Signed saturating Rounding Doubling Multiply returning High half.

### Syntax

`SQRDMLH Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD and FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD and FP register.

The results are rounded. For truncated results, see [SQDMULH \(vector\) \(A64\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.187 SQRSHL (vector) (A64)

Signed saturating Rounding Shift Left (register).

### Syntax

SQRSHL *Vd.T, Vn.T, Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

### Usage

Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD and FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. For truncated results, see [SQSHL \(vector, register\) \(A64\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.188 SQRSHRN, SQRSHRN2 (vector) (A64)

Signed saturating Rounded Shift Right Narrow (immediate).

### Syntax

`SQRSHRN2Vd.Tb, Vn.Ta, #shift`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**shift**

Is the right shift amount, in the range 1 to the destination element width in bits, and can be one of the values shown in Usage.

### Usage

Signed saturating Rounded Shift Right Narrow (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, saturates each shifted result to a value that is half the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. All the values in this instruction are signed integer values. The destination vector elements are half as long as the source vector elements. The results are rounded. For truncated results, see [SQSHRN, SQSHRN2 \(vector\) \(A64\)](#).

The `SQRSHRN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `SQRSHRN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-67: SQRSHRN{2} (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Tb</b> | <b>Ta</b> | <b>shift</b> |
|------------------|-----------|-----------|--------------|
| -                | 8B        | 8H        | 1 to 8       |
| 2                | 16B       | 8H        | 1 to 8       |
| -                | 4H        | 4S        | 1 to 16      |
| 2                | 8H        | 4S        | 1 to 16      |
| -                | 2S        | 2D        | 1 to 32      |
| 2                | 4S        | 2D        | 1 to 32      |

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.189 SQRSHRUN, SQRSHRUN2 (vector) (A64)

Signed saturating Rounded Shift Right Unsigned Narrow (immediate).

**Syntax**

`SQRSHRUN2Vd.Tb, Vn.Ta, #shift`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See **<Q>** in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**shift**

Is the right shift amount, in the range 1 to the destination element width in bits, and can be one of the values shown in Usage.

**Usage**

Signed saturating Rounded Shift Right Unsigned Narrow (immediate). This instruction reads each signed integer value in the vector of the source SIMD and FP register, right shifts each value by an immediate value, saturates the result to an unsigned integer value that is half the original width,

places the final result into a vector, and writes the vector to the destination SIMD and FP register. The results are rounded. For truncated results, see [SQSHRUN, SQSHRUN2 \(vector\) \(A64\)](#).

The `SQRSHRUN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `SQRSHRUN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-68: SQRSHRUN{2} (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Tb</b> | <b>Ta</b> | <b>shift</b> |
|------------------|-----------|-----------|--------------|
| -                | 8B        | 8H        | 1 to 8       |
| 2                | 16B       | 8H        | 1 to 8       |
| -                | 4H        | 4S        | 1 to 16      |
| 2                | 8H        | 4S        | 1 to 16      |
| -                | 2S        | 2D        | 1 to 32      |
| 2                | 4S        | 2D        | 1 to 32      |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.190 SQSHL (vector, immediate) (A64)

Signed saturating Shift Left (immediate).

### Syntax

`SQSHL Vd.T, Vn.T, #shift`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**shift**

Is the left shift amount, in the range 0 to the element width in bits minus 1, and can be one of the values shown in Usage.

## Usage

Signed saturating Shift Left (immediate). This instruction reads each vector element in the source SIMD and FP register, shifts each result by an immediate value, places the final result in a vector, and writes the vector to the destination SIMD and FP register. The results are truncated. For rounded results, see [UQRSHL \(vector\) \(A64\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-69: SQSHL (Vector) specifier combinations**

| <b>T</b> | <b>shift</b> |
|----------|--------------|
| 8B       | 0 to 7       |
| 16B      | 0 to 7       |
| 4H       | 0 to 15      |
| 8H       | 0 to 15      |
| 2S       | 0 to 31      |
| 4S       | 0 to 31      |
| 2D       | 0 to 63      |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.191 SQSHL (vector, register) (A64)

Signed saturating Shift Left (register).

### Syntax

`SQSHL Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

**Usage**

Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD and FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. For rounded results, see [SQRSHL \(vector\) \(A64\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.192 SQSHLU (vector) (A64)

Signed saturating Shift Left Unsigned (immediate).

**Syntax**

`SQSHLU Vd.T, Vn.T, #shift`

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Vn***

Is the name of the SIMD and FP source register.

***shift***

Is the left shift amount, in the range 0 to the element width in bits minus 1, and can be one of the values shown in Usage.

**Usage**

Signed saturating Shift Left Unsigned (immediate). This instruction reads each signed integer value in the vector of the source SIMD and FP register, shifts each value by an immediate value, saturates the shifted result to an unsigned integer value, places the result in a vector, and writes

the vector to the destination SIMD and FP register. The results are truncated. For rounded results, see [UQRSHL \(vector\) \(A64\)](#).

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-70: SQSHLU (Vector) specifier combinations**

| <b><i>T</i></b> | <b><i>shift</i></b> |
|-----------------|---------------------|
| 8B              | 0 to 7              |
| 16B             | 0 to 7              |
| 4H              | 0 to 15             |
| 8H              | 0 to 15             |
| 2S              | 0 to 31             |
| 4S              | 0 to 31             |
| 2D              | 0 to 63             |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.193 SQSHRN, SQSHRN2 (vector) (A64)

Signed saturating Shift Right Narrow (immediate).

### Syntax

`SQSHRN2Vd.Tb, Vn.Ta, #shift`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vn**

Is the name of the SIMD and FP source register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**shift**

Is the right shift amount, in the range 1 to the destination element width in bits, and can be one of the values shown in Usage.

**Usage**

Signed saturating Shift Right Narrow (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts and truncates each result by an immediate value, saturates each shifted result to a value that is half the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. All the values in this instruction are signed integer values. The destination vector elements are half as long as the source vector elements. For rounded results, see [SQRSHRN, SQRSHRN2 \(vector\) \(A64\)](#).

The `SQSHRN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `SQSHRN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-71: SQSHRN{2} (Vector) specifier combinations**

| <Q> | <i>Tb</i> | <i>Ta</i> | <i>shift</i> |
|-----|-----------|-----------|--------------|
| -   | 8B        | 8H        | 1 to 8       |
| 2   | 16B       | 8H        | 1 to 8       |
| -   | 4H        | 4S        | 1 to 16      |
| 2   | 8H        | 4S        | 1 to 16      |
| -   | 2S        | 2D        | 1 to 32      |
| 2   | 4S        | 2D        | 1 to 32      |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.194 SQSHRUN, SQSHRUN2 (vector) (A64)

Signed saturating Shift Right Unsigned Narrow (immediate).

**Syntax**

`SQSHRUN2Vd.Tb, Vn.Ta, #shift`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vn**

Is the name of the SIMD and FP source register.

**ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**shift**

Is the right shift amount, in the range 1 to the destination element width in bits, and can be one of the values shown in Usage.

## Usage

Signed saturating Shift Right Unsigned Narrow (immediate). This instruction reads each signed integer value in the vector of the source SIMD and FP register, right shifts each value by an immediate value, saturates the result to an unsigned integer value that is half the original width, places the final result into a vector, and writes the vector to the destination SIMD and FP register. The results are truncated. For rounded results, see [SQRSHRUN, SQRSHRUN2 \(vector\) \(A64\)](#).

The `sqrshrun` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `sqrshrun2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-72: SQSHRUN[2] (Vector) specifier combinations**

| <Q> | <b>tb</b> | <b>ta</b> | <b>shift</b> |
|-----|-----------|-----------|--------------|
| -   | 8B        | 8H        | 1 to 8       |
| 2   | 16B       | 8H        | 1 to 8       |
| -   | 4H        | 4S        | 1 to 16      |
| 2   | 8H        | 4S        | 1 to 16      |
| -   | 2S        | 2D        | 1 to 32      |
| 2   | 4S        | 2D        | 1 to 32      |

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.195 SQSUB (vector) (A64)

Signed saturating Subtract.

**Syntax**

```
SQSUB Vd.T, Vn.T, Vm.T
```

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**Usage**

Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD and FP register from the corresponding element values of the first source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.196 SQXTN, SQXTN2 (vector) (A64)

Signed saturating extract Narrow.

**Syntax**

```
SQXTN2Vd.Tb, Vn.Ta
```

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vn**

Is the name of the SIMD and FP source register.

**ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

## Usage

Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD and FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The sqxtn instruction writes the vector to the lower half of the destination register and clears the upper half, while the sqxtn2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-73: SQXTN{2} (Vector) specifier combinations**

| <Q> | <b>tb</b> | <b>ta</b> |
|-----|-----------|-----------|
| -   | 8B        | 8H        |
| 2   | 16B       | 8H        |
| -   | 4H        | 4S        |
| 2   | 8H        | 4S        |
| -   | 2S        | 2D        |
| 2   | 4S        | 2D        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.197 SQXTUN, SQXTUN2 (vector) (A64)

Signed saturating extract Unsigned Narrow.

### Syntax

`SQXTUN2Vd.Tb, Vn.Ta`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

### Usage

Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD and FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. The destination vector elements are half as long as the source vector elements.

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `sqxtun` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `sqxtun2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-74: SQXTUN{2} (Vector) specifier combinations**

| <Q> | <i>Tb</i> | <i>Ta</i> |
|-----|-----------|-----------|
| -   | 8B        | 8H        |
| 2   | 16B       | 8H        |

| <b>&lt;Q&gt;</b> | <b>Tb</b> | <b>Ta</b> |
|------------------|-----------|-----------|
| -                | 4H        | 4S        |
| 2                | 8H        | 4S        |
| -                | 2S        | 2D        |
| 2                | 4S        | 2D        |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.198 SRHADD (vector) (A64)

Signed Rounding Halving Add.

### Syntax

`SRHADD Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD and FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD and FP register.

The results are rounded. For truncated results, see [SHADD \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.199 SRI (vector) (A64)

Shift Right and Insert (immediate).

### Syntax

`SRI Vd.T, Vn.T, #shift`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**shift**

Is the right shift amount, in the range 1 to the element width in bits, and can be one of the values shown in Usage.

### Usage

Shift Right and Insert (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each vector element by an immediate value, and inserts the result into the corresponding vector element in the destination SIMD and FP register such that the new zero bits created by the shift are not inserted but retain their existing value. Bits shifted out of the right of each vector element of the source register are lost.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-75: SRI (Vector) specifier combinations**

| <b>T</b> | <b>shift</b> |
|----------|--------------|
| 8B       | 1 to 8       |
| 16B      | 1 to 8       |
| 4H       | 1 to 16      |
| 8H       | 1 to 16      |
| 2S       | 1 to 32      |
| 4S       | 1 to 32      |
| 2D       | 1 to 64      |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.200 SRSHL (vector) (A64)

Signed Rounding Shift Left (register).

### Syntax

`SRSHL Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD and FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift. For a truncating shift, see [SSH \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.201 SRSHR (vector) (A64)

Signed Rounding Shift Right (immediate).

### Syntax

`SRSHR Vd.T, Vn.T, #shift`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Vn***

Is the name of the SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to the element width in bits, and can be one of the values shown in Usage.

## Usage

Signed Rounding Shift Right (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, places the final result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are signed integer values. The results are rounded. For truncated results, see [SSHR \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-76: SRSHR (Vector) specifier combinations**

| <b><i>T</i></b> | <b><i>shift</i></b> |
|-----------------|---------------------|
| 8B              | 1 to 8              |
| 16B             | 1 to 8              |
| 4H              | 1 to 16             |
| 8H              | 1 to 16             |
| 2S              | 1 to 32             |
| 4S              | 1 to 32             |
| 2D              | 1 to 64             |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.202 SRSRA (vector) (A64)

Signed Rounding Shift Right and Accumulate (immediate).

### Syntax

`SRSRA Vd.T, Vn.T, #shift`

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Vn***

Is the name of the SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to the element width in bits, and can be one of the values shown in Usage.

## Usage

Signed Rounding Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of the destination SIMD and FP register. All the values in this instruction are signed integer values. The results are rounded. For truncated results, see [SSRA \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-77: SRSRA (Vector) specifier combinations**

| <b><i>T</i></b> | <b><i>shift</i></b> |
|-----------------|---------------------|
| 8B              | 1 to 8              |
| 16B             | 1 to 8              |
| 4H              | 1 to 16             |
| 8H              | 1 to 16             |
| 2S              | 1 to 32             |
| 4S              | 1 to 32             |
| 2D              | 1 to 64             |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.203 SSHL (vector) (A64)

Signed Shift Left (register).

### Syntax

`SSHL Vd.T, Vn.T, Vm.T`

Where:

***Vd***

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Usage

Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD and FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift. For a rounding shift, see [SRSHL \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.204 SSHLL, SSHLL2 (vector) (A64)

Signed Shift Left Long (immediate).

This instruction is used by the alias `sxtl`, `sxtl2`, `sxtl`, `sxtl22`.

## Syntax

`SSHLL2Vd.Ta, Vn.Tb, #shift`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

***Tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***shift***

Is the left shift amount, in the range 0 to the source element width in bits minus 1, and can be one of the values shown in Usage.

**Usage**

Signed Shift Left Long (immediate). This instruction reads each vector element from the source SIMD and FP register, left shifts each vector element by the specified shift amount, places the result into a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.

The `SSHLL` instruction extracts vector elements from the lower half of the source register, while the `SSHLL2` instruction extracts vector elements from the upper half of the source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-78: SSHLL, SSHLL2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b><i>Ta</i></b> | <b><i>Tb</i></b> | <b><i>shift</i></b> |
|------------------|------------------|------------------|---------------------|
| -                | 8H               | 8B               | 0 to 7              |
| 2                | 8H               | 16B              | 0 to 7              |
| -                | 4S               | 4H               | 0 to 15             |
| 2                | 4S               | 8H               | 0 to 15             |
| -                | 2D               | 2S               | 0 to 31             |
| 2                | 2D               | 4S               | 0 to 31             |

**Related information**

[SXTL, SXTL2 \(vector\) \(A64\)](#) on page 1365

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.205 SSHR (vector) (A64)

Signed Shift Right (immediate).

**Syntax**

`SSHR Vd.T, Vn.T, #shift`

Where:

***vd***

Is the name of the SIMD and FP destination register.

***t***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vn***

Is the name of the SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to the element width in bits, and can be one of the values shown in Usage.

## Usage

Signed Shift Right (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, places the final result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are signed integer values. The results are truncated. For rounded results, see [SRSHR \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-79: SSHR (Vector) specifier combinations**

| <b><i>t</i></b> | <b><i>shift</i></b> |
|-----------------|---------------------|
| 8B              | 1 to 8              |
| 16B             | 1 to 8              |
| 4H              | 1 to 16             |
| 8H              | 1 to 16             |
| 2S              | 1 to 32             |
| 4S              | 1 to 32             |
| 2D              | 1 to 64             |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.206 SSRA (vector) (A64)

Signed Shift Right and Accumulate (immediate).

### Syntax

`SSRA Vd.T, Vn.T, #shift`

Where:

***vd***

Is the name of the SIMD and FP destination register.

***t***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vn***

Is the name of the SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to the element width in bits, and can be one of the values shown in Usage.

**Usage**

Signed Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of the destination SIMD and FP register. All the values in this instruction are signed integer values. The results are truncated. For rounded results, see [SRSRA \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-80: SSRA (Vector) specifier combinations**

| <b><i>t</i></b> | <b><i>shift</i></b> |
|-----------------|---------------------|
| 8B              | 1 to 8              |
| 16B             | 1 to 8              |
| 4H              | 1 to 16             |
| 8H              | 1 to 16             |
| 2S              | 1 to 32             |
| 4S              | 1 to 32             |
| 2D              | 1 to 64             |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.207 SSUBL, SSUBL2 (vector) (A64)

Signed Subtract Long.

**Syntax**

`SSUBL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vn**

Is the name of the first SIMD and FP source register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vm**

Is the name of the second SIMD and FP source register.

## Usage

Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD and FP register from the corresponding vector element of the first source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.

The SSUBL instruction extracts each source vector from the lower half of each source register, while the SSUBL2 instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-81: SSUBL, SSUBL2 (Vector) specifier combinations**

| <Q> | ta | tb  |
|-----|----|-----|
| -   | 8H | 8B  |
| 2   | 8H | 16B |
| -   | 4S | 4H  |
| 2   | 4S | 8H  |
| -   | 2D | 2S  |
| 2   | 2D | 4S  |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.208 SSUBW, SSUBW2 (vector) (A64)

Signed Subtract Wide.

### Syntax

`SSUBW2Vd.Ta, Vn.Ta, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

### Usage

Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD and FP register from the corresponding vector element in the first source SIMD and FP register, places the result in a vector, and writes the vector to the SIMD and FP destination register. All the values in this instruction are signed integer values.

The `ssubw` instruction extracts the second source vector from the lower half of the second source register, while the `ssubw2` instruction extracts the second source vector from the upper half of the second source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-82: SSUBW, SSUBW2 (Vector) specifier combinations**

| <Q> | <b>Ta</b> | <b>Tb</b> |
|-----|-----------|-----------|
| -   | 8H        | 8B        |
| 2   | 8H        | 16B       |
| -   | 4S        | 4H        |

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> |
|------------------|-----------|-----------|
| 2                | 4S        | 8H        |
| -                | 2D        | 2S        |
| 2                | 2D        | 4S        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.209 ST1 (vector, multiple structures) (A64)

Store multiple single-element structures from one, two, three, or four registers.

### Syntax

```
ST1 {Vt.T}, [Xn|SP] ; T1 One register
ST1 {Vt.T, Vt2.T}, [Xn|SP] ; T1 Two registers
ST1 {Vt.T, Vt2.T, Vt3.T}, [Xn|SP] ; T1 Three registers
ST1 {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP] ; T1 Four registers
ST1 {Vt.T}, [Xn|SP], imm ; T1 One register, immediate offset, Post-index
ST1 {Vt.T}, [Xn|SP], Xm ; T1 One register, register offset, Post-index
ST1 {Vt.T, Vt2.T}, [Xn|SP], imm ; T1 Two registers, immediate offset, Post-index
ST1 {Vt.T, Vt2.T}, [Xn|SP], Xm ; T1 Two registers, register offset, Post-index
ST1 {Vt.T, Vt2.T, Vt3.T}, [Xn|SP], imm ; T1 Three registers, immediate offset, Post-index
ST1 {Vt.T, Vt2.T, Vt3.T}, [Xn|SP], Xm ; T1 Three registers, register offset, Post-index
ST1 {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP], imm ; T1 Four registers, immediate offset, Post-index
ST1 {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP], Xm ; T1 Four registers, register offset, Post-index
```

Where:

**vt2**

Is the name of the second SIMD and FP register to be transferred.

***vt3***

Is the name of the third SIMD and FP register to be transferred.

***vt4***

Is the name of the fourth SIMD and FP register to be transferred.

***imm***

Is the post-index immediate offset:

**One register, immediate offset**

Can be one of #8 or #16.

**Two registers, immediate offset**

Can be one of #16 or #32.

**Three registers, immediate offset**

Can be one of #24 or #48.

**Four registers, immediate offset**

Can be one of #32 or #64.

***Xm***

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

***vt***

Is the name of the first or only SIMD and FP register to be transferred.

***T***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Store multiple single-element structures from one, two, three, or four registers. This instruction stores elements to memory from one, two, three, or four SIMD and FP registers, without interleaving. Every element of each register is stored.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following tables show valid specifier combinations:

**Table 21-83: ST1 (One register, immediate offset) specifier combinations**

| <b><i>T</i></b> | <b><i>imm</i></b> |
|-----------------|-------------------|
| 8B              | #8                |
| 16B             | #16               |
| 4H              | #8                |
| 8H              | #16               |
| 2S              | #8                |

| <i>T</i> | <i>imm</i> |
|----------|------------|
| 4S       | #16        |
| 1D       | #8         |
| 2D       | #16        |

**Table 21-84: ST1 (Two registers, immediate offset) specifier combinations**

| <i>T</i> | <i>imm</i> |
|----------|------------|
| 8B       | #16        |
| 16B      | #32        |
| 4H       | #16        |
| 8H       | #32        |
| 2S       | #16        |
| 4S       | #32        |
| 1D       | #16        |
| 2D       | #32        |

**Table 21-85: ST1 (Three registers, immediate offset) specifier combinations**

| <i>T</i> | <i>imm</i> |
|----------|------------|
| 8B       | #24        |
| 16B      | #48        |
| 4H       | #24        |
| 8H       | #48        |
| 2S       | #24        |
| 4S       | #48        |
| 1D       | #24        |
| 2D       | #48        |

**Table 21-86: ST1 (Four registers, immediate offset) specifier combinations**

| <i>T</i> | <i>imm</i> |
|----------|------------|
| 8B       | #32        |
| 16B      | #64        |
| 4H       | #32        |
| 8H       | #64        |
| 2S       | #32        |
| 4S       | #64        |
| 1D       | #32        |
| 2D       | #64        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.210 ST1 (vector, single structure) (A64)

Store a single-element structure from one lane of one register.

### Syntax

```
ST1 {Vt.B }[index], [Xn|SP] ; T1 8-bit
ST1 {Vt.H }[index], [Xn|SP] ; T1 16-bit
ST1 {Vt.S }[index], [Xn|SP] ; T1 32-bit
ST1 {Vt.D }[index], [Xn|SP] ; T1 64-bit
ST1 {Vt.B }[index], [Xn|SP], #1 ; T1 8-bit, immediate offset, Post-index
ST1 {Vt.B }[index], [Xn|SP], Xm ; T1 8-bit, register offset, Post-index
ST1 {Vt.H }[index], [Xn|SP], #2 ; T1 16-bit, immediate offset, Post-index
ST1 {Vt.H }[index], [Xn|SP], Xm ; T1 16-bit, register offset, Post-index
ST1 {Vt.S }[index], [Xn|SP], #4 ; T1 32-bit, immediate offset, Post-index
ST1 {Vt.S }[index], [Xn|SP], Xm ; T1 32-bit, register offset, Post-index
ST1 {Vt.D }[index], [Xn|SP], #8 ; T1 64-bit, immediate offset, Post-index
ST1 {Vt.D }[index], [Xn|SP], Xm ; T1 64-bit, register offset, Post-index
```

Where:

#### **Vt**

Is the name of the first or only SIMD and FP register to be transferred.

#### **index**

The value depends on the instruction variant:

**8-bit** Is the element index, in the range 0 to 15.

**16-bit** Is the element index, in the range 0 to 7.

**32-bit** Is the element index, in the range 0 to 3.

**64-bit** Is the element index, and can be either 0 or 1.

#### **Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

#### **Xm**

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

## Usage

Store a single-element structure from one lane of one register. This instruction stores the specified element of a SIMD and FP register to memory.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.211 ST2 (vector, multiple structures) (A64)

Store multiple 2-element structures from two registers.

### Syntax

```
ST2 {Vt.T, Vt2.T}, [Xn|SP] ; T2
      {Vt.T, Vt2.T}, [Xn|SP], imm ; T2
      {Vt.T, Vt2.T}, [Xn|SP], Xm ; T2
```

Where:

**Vt**

Is the name of the first or only SIMD and FP register to be transferred.

**Vt2**

Is the name of the second SIMD and FP register to be transferred.

**imm**

Is the post-index immediate offset, and can be either #16 or #32.

**Xm**

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

## Usage

Store multiple 2-element structures from two registers. This instruction stores multiple 2-element structures from two SIMD and FP registers to memory, with interleaving. Every element of each register is stored.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.212 ST2 (vector, single structure) (A64)

Store single 2-element structure from one lane of two registers.

### Syntax

```

ST2  {Vt.B, Vt2.B }[index], [Xn|SP] ; T2
ST2  {Vt.H, Vt2.H }[index], [Xn|SP] ; T2
ST2  {Vt.S, Vt2.S }[index], [Xn|SP] ; T2
ST2  {Vt.D, Vt2.D }[index], [Xn|SP] ; T2
ST2  {Vt.B, Vt2.B }[index], [Xn|SP], #2 ; T2
ST2  {Vt.B, Vt2.B }[index], [Xn|SP], Xm ; T2
ST2  {Vt.H, Vt2.H }[index], [Xn|SP], #4 ; T2
ST2  {Vt.H, Vt2.H }[index], [Xn|SP], Xm ; T2
ST2  {Vt.S, Vt2.S }[index], [Xn|SP], #8 ; T2
ST2  {Vt.S, Vt2.S }[index], [Xn|SP], Xm ; T2
ST2  {Vt.D, Vt2.D }[index], [Xn|SP], #16 ; T2
ST2  {Vt.D, Vt2.D }[index], [Xn|SP], Xm ; T2

```

Where:

#### **vt**

Is the name of the first or only SIMD and FP register to be transferred.

#### **vt2**

Is the name of the second SIMD and FP register to be transferred.

#### **index**

The value depends on the instruction variant:

**8-bit** Is the element index, in the range 0 to 15.

**16-bit** Is the element index, in the range 0 to 7.

**32-bit** Is the element index, in the range 0 to 3.

**64-bit** Is the element index, and can be either 0 or 1.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Xm**

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

## Usage

Store single 2-element structure from one lane of two registers. This instruction stores a 2-element structure to memory from corresponding elements of two SIMD and FP registers.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.213 ST3 (vector, multiple structures) (A64)

Store multiple 3-element structures from three registers.

### Syntax

```
ST3 {Vt.T, Vt2.T, Vt3.T}, [Xn|SP] ; T3
ST3 {Vt.T, Vt2.T, Vt3.T}, [Xn|SP], imm ; T3
ST3 {Vt.T, Vt2.T, Vt3.T}, [Xn|SP], Xm ; T3
```

Where:

**Vt**

Is the name of the first or only SIMD and FP register to be transferred.

**Vt2**

Is the name of the second SIMD and FP register to be transferred.

**Vt3**

Is the name of the third SIMD and FP register to be transferred.

**imm**

Is the post-index immediate offset, and can be either #24 or #48.

**Xm**

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Xn|SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Store multiple 3-element structures from three registers. This instruction stores multiple 3-element structures to memory from three SIMD and FP registers, with interleaving. Every element of each register is stored.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.214 ST3 (vector, single structure) (A64)

Store single 3-element structure from one lane of three registers.

**Syntax**

```
ST3 {Vt.B, Vt2.B, Vt3.B }[index], [Xn|SP] ; T3
ST3 {Vt.H, Vt2.H, Vt3.H }[index], [Xn|SP] ; T3
ST3 {Vt.S, Vt2.S, Vt3.S }[index], [Xn|SP] ; T3
ST3 {Vt.D, Vt2.D, Vt3.D }[index], [Xn|SP] ; T3
ST3 {Vt.B, Vt2.B, Vt3.B }[index], [Xn|SP], #3 ; T3
ST3 {Vt.B, Vt2.B, Vt3.B }[index], [Xn|SP], Xm ; T3
ST3 {Vt.H, Vt2.H, Vt3.H }[index], [Xn|SP], #6 ; T3
ST3 {Vt.H, Vt2.H, Vt3.H }[index], [Xn|SP], Xm ; T3
ST3 {Vt.S, Vt2.S, Vt3.S }[index], [Xn|SP], #12 ; T3
ST3 {Vt.S, Vt2.S, Vt3.S }[index], [Xn|SP], Xm ; T3
ST3 {Vt.D, Vt2.D, Vt3.D }[index], [Xn|SP], #24 ; T3
ST3 {Vt.D, Vt2.D, Vt3.D }[index], [Xn|SP], Xm ; T3
```

Where:

**vt**

Is the name of the first or only SIMD and FP register to be transferred.

***vt2***

Is the name of the second SIMD and FP register to be transferred.

***vt3***

Is the name of the third SIMD and FP register to be transferred.

***index***

The value depends on the instruction variant:

**8-bit** Is the element index, in the range 0 to 15.

**16-bit** Is the element index, in the range 0 to 7.

**32-bit** Is the element index, in the range 0 to 3.

**64-bit** Is the element index, and can be either 0 or 1.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

***Xm***

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

**Usage**

Store single 3-element structure from one lane of three registers. This instruction stores a 3-element structure to memory from corresponding elements of three SIMD and FP registers.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.215 ST4 (vector, multiple structures) (A64)

Store multiple 4-element structures from four registers.

**Syntax**

```
ST4 {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP] ;
ST4 {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP], imm ;
ST4 {Vt.T, Vt2.T, Vt3.T, Vt4.T}, [Xn|SP], Xm ;
```

Where:

***vt***

Is the name of the first or only SIMD and FP register to be transferred.

***Vt2***

Is the name of the second SIMD and FP register to be transferred.

***Vt3***

Is the name of the third SIMD and FP register to be transferred.

***Vt4***

Is the name of the fourth SIMD and FP register to be transferred.

***imm***

Is the post-index immediate offset, and can be either #32 or #64.

***Xm***

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Xn | SP***

Is the 64-bit name of the general-purpose base register or stack pointer.

**Usage**

Store multiple 4-element structures from four registers. This instruction stores multiple 4-element structures to memory from four SIMD and FP registers, with interleaving. Every element of each register is stored.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.216 ST4 (vector, single structure) (A64)

Store single 4-element structure from one lane of four registers.

**Syntax**

```
ST4 {Vt.B, Vt2.B, Vt3.B, Vt4.B }[index], [Xn|SP] ;
ST4 {Vt.H, Vt2.H, Vt3.H, Vt4.H }[index], [Xn|SP] ;
ST4 {Vt.S, Vt2.S, Vt3.S, Vt4.S }[index], [Xn|SP] ;
ST4 {Vt.D, Vt2.D, Vt3.D, Vt4.D }[index], [Xn|SP] ;
ST4 {Vt.B, Vt2.B, Vt3.B, Vt4.B }[index], [Xn|SP], #4 ;
ST4 {Vt.B, Vt2.B, Vt3.B, Vt4.B }[index], [Xn|SP], Xm ;
```

```

ST4 {Vt.H, Vt2.H, Vt3.H, Vt4.H} [index], [Xn|SP], #8 ;
ST4 {Vt.H, Vt2.H, Vt3.H, Vt4.H} [index], [Xn|SP], Xm ;
ST4 {Vt.S, Vt2.S, Vt3.S, Vt4.S} [index], [Xn|SP], #16 ;
ST4 {Vt.S, Vt2.S, Vt3.S, Vt4.S} [index], [Xn|SP], Xm ;
ST4 {Vt.D, Vt2.D, Vt3.D, Vt4.D} [index], [Xn|SP], #32 ;
ST4 {Vt.D, Vt2.D, Vt3.D, Vt4.D} [index], [Xn|SP], Xm ;

```

Where:

**vt**

Is the name of the first or only SIMD and FP register to be transferred.

**vt2**

Is the name of the second SIMD and FP register to be transferred.

**vt3**

Is the name of the third SIMD and FP register to be transferred.

**vt4**

Is the name of the fourth SIMD and FP register to be transferred.

**index**

The value depends on the instruction variant:

**8-bit** Is the element index, in the range 0 to 15.

**16-bit** Is the element index, in the range 0 to 7.

**32-bit** Is the element index, in the range 0 to 3.

**64-bit** Is the element index, and can be either 0 or 1.

**Xn | SP**

Is the 64-bit name of the general-purpose base register or stack pointer.

**Xm**

Is the 64-bit name of the general-purpose post-index register, excluding XZR.

## Usage

Store single 4-element structure from one lane of four registers. This instruction stores a 4-element structure to memory from corresponding elements of four SIMD and FP registers.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.217 SUB (vector) (A64)

Subtract (vector).

### Syntax

`SUB Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Subtract (vector). This instruction subtracts each vector element in the second source SIMD and FP register from the corresponding vector element in the first source SIMD and FP register, places the result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.218 SUBHN, SUBHN2 (vector) (A64)

Subtract returning High Narrow.

### Syntax

`SUBHN2Vd.Tb, Vn.Ta, Vm.Ta`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

***vd***

Is the name of the SIMD and FP destination register.

***tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vn***

Is the name of the first SIMD and FP source register.

***ta***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vm***

Is the name of the second SIMD and FP source register.

## Usage

Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD and FP register from the corresponding vector element in the first source SIMD and FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. All the values in this instruction are signed integer values.

The results are truncated. For rounded results, see [RSUBHN, RSUBHN2 \(vector\) \(A64\)](#).

The `SUBHN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `SUBHN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-87: SUBHN, SUBHN2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b><i>tb</i></b> | <b><i>ta</i></b> |
|------------------|------------------|------------------|
| -                | 8B               | 8H               |
| 2                | 16B              | 8H               |
| -                | 4H               | 4S               |
| 2                | 8H               | 4S               |
| -                | 2S               | 2D               |
| 2                | 4S               | 2D               |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.219 SUQADD (vector) (A64)

Signed saturating Accumulate of Unsigned value.

### Syntax

`SUQADD Vd.T, Vn.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the SIMD and FP source register.

### Usage

Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD and FP register to corresponding signed integer values of the vector elements in the destination SIMD and FP register, and writes the resulting signed integer values to the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.220 SXTL, SXTL2 (vector) (A64)

Signed extend Long.

This instruction is an alias of `SSHLL`, `SSHLL2`.

The equivalent instruction is `SSHLL2Vd.Ta, Vn.Tb, #0`.

### Syntax

`SXTL2Vd.Ta, Vn.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vn**

Is the name of the SIMD and FP source register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

## Usage

Signed extend Long. This instruction duplicates each vector element in the lower or upper half of the source SIMD and FP register into a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.

The `sxtl` instruction extracts the source vector from the lower half of the source register, while the `sxtl2` instruction extracts the source vector from the upper half of the source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-88: SXTL, SXTL2 (Vector) specifier combinations**

| <Q> | <b>ta</b> | <b>tb</b> |
|-----|-----------|-----------|
| -   | 8H        | 8B        |
| 2   | 8H        | 16B       |
| -   | 4S        | 4H        |
| 2   | 4S        | 8H        |
| -   | 2D        | 2S        |
| 2   | 2D        | 4S        |

## Related information

[SSHLL, SSHLL2 \(vector\) \(A64\)](#) on page 1346

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.221 TBL (vector) (A64)

Table vector Lookup.

### Syntax

```
TBL Vd.Ta, {Vn.16B}, Vm.Ta ; Single register table

TBL Vd.Ta, {Vn.16B, <Vn+1>.16B}, Vm.Ta ; Two register table

TBL Vd.Ta, {Vn.16B, <Vn+1>.16B, <Vn+2>.16B}, Vm.Ta ; Three register table

TBL Vd.Ta, {Vn.16B, <Vn+1>.16B, <Vn+2>.16B, <Vn+3>.16B}, Vm.Ta ; Four register table
```

Where:

**Vn**

The value depends on the instruction variant:

**Single register table** Is the name of the SIMD and FP table register

**Two, Three, or Four register table** Is the name of the first SIMD and FP table register

**<V n+1>**

Is the name of the second SIMD and FP table register.

**<V n+2>**

Is the name of the third SIMD and FP table register.

**<V n+3>**

Is the name of the fourth SIMD and FP table register.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be either 8B or 16B.

**Vm**

Is the name of the SIMD and FP index register.

### Usage

Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD and FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD and FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD and FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

# 21.222 TBX (vector) (A64)

Table vector lookup extension.

## Syntax

TBX *Vd.Ta*, {*Vn.16B*}, *Vm.Ta* ; Single register table

TBX *Vd.Ta*, {*Vn.16B*, <*Vn+1*>.16B}, *Vm.Ta* ; Two register table

TBX *Vd.Ta*, {*Vn.16B*, <*Vn+1*>.16B, <*Vn+2*>.16B}, *Vm.Ta* ; Three register table

TBX *Vd.Ta*, {*Vn.16B*, <*Vn+1*>.16B, <*Vn+2*>.16B, <*Vn+3*>.16B}, *Vm.Ta* ; Four register table

Where:

***Vn***

Is the value depends on the instruction variant:

**Single register table** Is the name of the SIMD and FP table register

**Two, Three, or Four register table** Is the name of the first SIMD and FP table register

**<*V n+1*>**

Is the name of the second SIMD and FP table register.

**<*V n+2*>**

Is the name of the third SIMD and FP table register.

**<*V n+3*>**

Is the name of the fourth SIMD and FP table register.

***Vd***

Is the name of the SIMD and FP destination register.

***Ta***

Is an arrangement specifier, and can be either **8B** or **16B**.

***Vm***

Is the name of the SIMD and FP index register.

## Usage

Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD and FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD and FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD and FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left

unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.223 TRN1 (vector) (A64)

Transpose vectors (primary).

### Syntax

`TRN1 Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD and FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD and FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.



By using this instruction with `TRN2`, a 2 x 2 matrix can be transposed.

---

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.224 TRN2 (vector) (A64)

Transpose vectors (secondary).

### Syntax

`TRN2 Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD and FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD and FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.



By using this instruction with `TRN1`, a 2 x 2 matrix can be transposed.

**Note**

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.225 UABA (vector) (A64)

Unsigned Absolute difference and Accumulate.

### Syntax

`UABA Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD and FP register from the corresponding elements of the first source SIMD and FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.226 UABAL, UABAL2 (vector) (A64)

Unsigned Absolute difference and Accumulate Long.

### Syntax

`UABAL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

***vd***

Is the name of the SIMD and FP destination register.

***ta***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vn***

Is the name of the first SIMD and FP source register.

***tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vm***

Is the name of the second SIMD and FP source register.

## Usage

Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD and FP register from the corresponding vector elements of the first source SIMD and FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.

The `UABAL` instruction extracts each source vector from the lower half of each source register, while the `UABAL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-89: UABAL, UABAL2 (Vector) specifier combinations**

| <Q> | <i>ta</i> | <i>tb</i> |
|-----|-----------|-----------|
| -   | 8H        | 8B        |
| 2   | 8H        | 16B       |
| -   | 4S        | 4H        |
| 2   | 4S        | 8H        |
| -   | 2D        | 2S        |
| 2   | 2D        | 4S        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.227 UABD (vector) (A64)

Unsigned Absolute Difference (vector).

### Syntax

`UABD Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD and FP register from the corresponding elements of the first source SIMD and FP register, places the absolute values of the results into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.228 UABDL, UABDL2 (vector) (A64)

Unsigned Absolute Difference Long.

### Syntax

`UABDL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

***vd***

Is the name of the SIMD and FP destination register.

***ta***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vn***

Is the name of the first SIMD and FP source register.

***tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vm***

Is the name of the second SIMD and FP source register.

## Usage

Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD and FP register from the corresponding vector elements of the first source SIMD and FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.

The `UABDL` instruction extracts each source vector from the lower half of each source register, while the `UABDL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-90: UABDL, UABDL2 (Vector) specifier combinations**

| <Q> | <i>ta</i> | <i>tb</i> |
|-----|-----------|-----------|
| -   | 8H        | 8B        |
| 2   | 8H        | 16B       |
| -   | 4S        | 4H        |
| 2   | 4S        | 8H        |
| -   | 2D        | 2S        |
| 2   | 2D        | 4S        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.229 UADALP (vector) (A64)

Unsigned Add and Accumulate Long Pairwise.

### Syntax

`UADALP Vd.Ta, Vn.Tb`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

### Usage

Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD and FP register and accumulates the results with the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-91: UADALP (Vector) specifier combinations**

| <b>Ta</b> | <b>Tb</b> |
|-----------|-----------|
| 4H        | 8B        |
| 8H        | 16B       |
| 2S        | 4H        |
| 4S        | 8H        |
| 1D        | 2S        |
| 2D        | 4S        |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.230 UADDL, UADDL2 (vector) (A64)

Unsigned Add Long (vector).

### Syntax

`UADDL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD and FP register to the corresponding vector element of the second source SIMD and FP register, places the result into a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.

The `UADDL` instruction extracts each source vector from the lower half of each source register, while the `UADDL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-92: UADDL, UADDL2 (Vector) specifier combinations**

| <Q> | <b>Ta</b> | <b>Tb</b> |
|-----|-----------|-----------|
| -   | 8H        | 8B        |
| 2   | 8H        | 16B       |
| -   | 4S        | 4H        |

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> |
|------------------|-----------|-----------|
| 2                | 4S        | 8H        |
| -                | 2D        | 2S        |
| 2                | 2D        | 4S        |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.231 UADDLP (vector) (A64)

Unsigned Add Long Pairwise.

### Syntax

UADDLP *Vd.Ta, Vn.Tb*

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

### Usage

Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD and FP register, places the result into a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-93: UADDLP (Vector) specifier combinations**

| <b>Ta</b> | <b>Tb</b> |
|-----------|-----------|
| 4H        | 8B        |
| 8H        | 16B       |
| 2S        | 4H        |
| 4S        | 8H        |

| <i>Ta</i> | <i>Tb</i> |
|-----------|-----------|
| 1D        | 2S        |
| 2D        | 4S        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.232 UADDLV (vector) (A64)

Unsigned sum Long across Vector.

### Syntax

UADDLV *Vd*, *Vn.T*

Where:

**v**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

### Usage

Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD and FP register together, and writes the scalar result to the destination SIMD and FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-94: UADDLV (Vector) specifier combinations**

| <b>v</b> | <b>T</b> |
|----------|----------|
| H        | 8B       |
| H        | 16B      |
| S        | 4H       |
| S        | 8H       |
| D        | 4S       |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

# 21.233 UADDW, UADDW2 (vector) (A64)

Unsigned Add Wide.

## Syntax

`UADDW2 Vd. Ta, Vn. Ta, Vm. Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

## Usage

Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD and FP register to the corresponding vector elements in the lower or upper half of the second source SIMD and FP register, places the result in a vector, and writes the vector to the SIMD and FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.

The `UADDW` instruction extracts vector elements from the lower half of the second source register, while the `UADDW2` instruction extracts vector elements from the upper half of the second source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-95: UADDW, UADDW2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> |
|------------------|-----------|-----------|
| -                | 8H        | 8B        |
| 2                | 8H        | 16B       |
| -                | 4S        | 4H        |
| 2                | 4S        | 8H        |
| -                | 2D        | 2S        |
| 2                | 2D        | 4S        |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.234 UCVTF (vector, fixed-point) (A64)

Unsigned fixed-point Convert to Floating-point (vector).

**Syntax**

**UCVTF Vd.T, Vn.T, #fbits**

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**fbits**

Is the number of fractional bits, in the range 1 to the element width.

**Usage**

Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-96: UCVTF (Vector) specifier combinations**

| <i>T</i> | <i>fbits</i> |
|----------|--------------|
| 4H       | -            |
| 8H       | -            |
| 2S       | 1 to 32      |
| 4S       | 1 to 32      |
| 2D       | 1 to 64      |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.235 UCVTF (vector, integer) (A64)

Unsigned integer Convert to Floating-point (vector).

### Syntax

UCVTF *Vd.T, Vn.T ;* Vector half precision

UCVTF *Vd.T, Vn.T ;* Vector single-precision and double-precision

Where:

***Vd***

Is the name of the SIMD and FP destination register

***T***

Is an arrangement specifier:

#### Vector half precision

Can be one of 4H or 8H.

#### Vector single-precision and double-precision

Can be one of 2S, 4S or 2D.

***Vn***

Is the name of the SIMD and FP source register

### Architectures supported (vector)

Supported in the Arm®v8.2 architecture and later.

### Usage

Unsigned integer Convert to Floating-point (vector). This instruction converts each element in a vector from an unsigned integer value to a floating-point value using the rounding mode that is specified by the FPCR, and writes the result to the SIMD and FP destination register.

This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see *Floating-point exception traps* in the [Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the Security state and Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.236 UHADD (vector) (A64)

Unsigned Halving Add.

### Syntax

UHADD *Vd.T, Vn.T, Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

### Usage

Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD and FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD and FP register.

The results are truncated. For rounded results, see [URHADD \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.237 UHSUB (vector) (A64)

Unsigned Halving Subtract.

### Syntax

`UHSUB Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD and FP register from the corresponding vector elements in the first source SIMD and FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.238 UMAX (vector) (A64)

Unsigned Maximum (vector).

### Syntax

`UMAX Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

## Usage

Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD and FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.239 UMAXP (vector) (A64)

Unsigned Maximum Pairwise.

### Syntax

UMAXP *Vd.T, Vn.T, Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

## Usage

Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD and FP register after the vector elements of the second source SIMD and FP register, reads each pair of adjacent vector elements in the two source SIMD and FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.240 UMAXV (vector) (A64)

Unsigned Maximum across Vector.

**Syntax**

UMAXV *Vd*, *Vn.T*

Where:

**v**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Usage**

Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD and FP register, and writes the largest of the values as a scalar to the destination SIMD and FP register. All the values in this instruction are unsigned integer values.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-97: UMAXV (Vector) specifier combinations**

| <b>v</b> | <b>T</b> |
|----------|----------|
| B        | 8B       |
| B        | 16B      |
| H        | 4H       |
| H        | 8H       |
| S        | 4S       |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.241 UMIN (vector) (A64)

Unsigned Minimum (vector).

### Syntax

`UMIN Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD and FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.242 UMINP (vector) (A64)

Unsigned Minimum Pairwise.

### Syntax

`UMINP Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**Usage**

Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD and FP register after the vector elements of the second source SIMD and FP register, reads each pair of adjacent vector elements in the two source SIMD and FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.243 UMINV (vector) (A64)

Unsigned Minimum across Vector.

**Syntax**

UMINV *Vd*, *Vn.T*

Where:

**v**

Is the destination width specifier, and can be one of the values shown in Usage.

**d**

Is the number of the SIMD and FP destination register.

**Vn**

Is the name of the SIMD and FP source register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Usage**

Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD and FP register, and writes the smallest of the values as a scalar to the destination SIMD and FP register. All the values in this instruction are unsigned integer values.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-98: UMINV (Vector) specifier combinations**

| <b>v</b> | <b>T</b> |
|----------|----------|
| B        | 8B       |
| B        | 16B      |
| H        | 4H       |
| H        | 8H       |
| S        | 4S       |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.244 UMLAL, UMLAL2 (vector, by element) (A64)

Unsigned Multiply-Add Long (vector, by element).

**Syntax**

`UMLAL2Vd.Ta, Vn.Tb, Vm.Ts[index]`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**ta**

Is an arrangement specifier, and can be either `4s` or `2D`.

**vn**

Is the name of the first SIMD and FP source register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vm**

Is the name of the second SIMD and FP source register:

- If `ts` is `H`, then `vm` must be in the range V0 to V15.
- If `ts` is `S`, then `vm` must be in the range V0 to V31.

**ts**

Is an element size specifier, and can be either `H` or `S`.

**index**

Is the element index, in the range shown in Usage.

## Usage

Unsigned Multiply-Add Long (vector, by element). This instruction multiplies each vector element in the lower or upper half of the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register and accumulates the results with the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

The `UMLAL` instruction extracts vector elements from the lower half of the first source register, while the `UMLAL2` instruction extracts vector elements from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-99: UMLAL, UMLAL2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> | <b>Ts</b> | <b>index</b> |
|------------------|-----------|-----------|-----------|--------------|
| -                | 4S        | 4H        | H         | 0 to 7       |
| 2                | 4S        | 8H        | H         | 0 to 7       |
| -                | 2D        | 2S        | S         | 0 to 3       |
| 2                | 2D        | 4S        | S         | 0 to 3       |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.245 UMLAL, UMLAL2 (vector) (A64)

Unsigned Multiply-Add Long (vector).

### Syntax

`UMLAL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See **<Q>** in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register.

**Usage**

Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD and FP register by the corresponding vector elements of the second source SIMD and FP register, and accumulates the results with the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

The `UMLAL` instruction extracts vector elements from the lower half of the first source register, while the `UMLAL2` instruction extracts vector elements from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-100: UMLAL, UMLAL2 (Vector) specifier combinations**

| <Q> | Ta | Tb  |
|-----|----|-----|
| -   | 8H | 8B  |
| 2   | 8H | 16B |
| -   | 4S | 4H  |
| 2   | 4S | 8H  |
| -   | 2D | 2S  |
| 2   | 2D | 4S  |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.246 UMLSL, UMLSL2 (vector, by element) (A64)

Unsigned Multiply-Subtract Long (vector, by element).

**Syntax**

`UMLSL2Vd.Ta, Vn.Tb, Vm.Ts[index]`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

***vd***

Is the name of the SIMD and FP destination register.

***ta***

Is an arrangement specifier, and can be either **4s** or **2d**.

***vn***

Is the name of the first SIMD and FP source register.

***tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vm***

Is the name of the second SIMD and FP source register:

- If *ts* is **H**, then *vm* must be in the range V0 to V15.
- If *ts* is **s**, then *vm* must be in the range V0 to V31.

***ts***

Is an element size specifier, and can be either **H** or **s**.

***index***

Is the element index, in the range shown in Usage.

## Usage

Unsigned Multiply-Subtract Long (vector, by element). This instruction multiplies each vector element in the lower or upper half of the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register and subtracts the results from the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

The **UMLSL** instruction extracts vector elements from the lower half of the first source register, while the **UMLSL2** instruction extracts vector elements from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-101: UMLSL, UMLSL2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b><i>ta</i></b> | <b><i>tb</i></b> | <b><i>ts</i></b> | <b><i>index</i></b> |
|------------------|------------------|------------------|------------------|---------------------|
| -                | 4S               | 4H               | H                | 0 to 7              |
| 2                | 4S               | 8H               | H                | 0 to 7              |
| -                | 2D               | 2S               | S                | 0 to 3              |
| 2                | 2D               | 4S               | S                | 0 to 3              |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.247 UMLSL, UMLSL2 (vector) (A64)

Unsigned Multiply-Subtract Long (vector).

### Syntax

`UMLSL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD and FP registers, and subtracts the results from the vector elements of the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.

The `UMLSL` instruction extracts each source vector from the lower half of each source register, while the `UMLSL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-102: UMLSL, UMLSL2 (Vector) specifier combinations**

| <Q> | <b>Ta</b> | <b>Tb</b> |
|-----|-----------|-----------|
| -   | 8H        | 8B        |
| 2   | 8H        | 16B       |
| -   | 4S        | 4H        |

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> |
|------------------|-----------|-----------|
| 2                | 4S        | 8H        |
| -                | 2D        | 2S        |
| 2                | 2D        | 4S        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.248 UMOV (vector) (A64)

Unsigned Move vector element to general-purpose register.

This instruction is used by the alias `mov` (to general).

### Syntax

`UMOV Wd, Vn.Ts[index] ; 32-bit`

`UMOV Xd, Vn.Ts[index] ; 64-bit`

Where:

**Wd**

Is the 32-bit name of the general-purpose destination register.

**Ts**

Is an element size specifier:

**32-bit**

Can be one of `B`, `H` or `S`.

**64-bit**

Must be `D`.

**index**

The value depends on the instruction variant:

**32-bit** Is the element index, in the range shown in Usage.

**64-bit** Is the element index and can be either 0 or 1.

**Xd**

Is the 64-bit name of the general-purpose destination register.

**Vn**

Is the name of the SIMD and FP source register.

## Usage

Unsigned Move vector element to general-purpose register. This instruction reads the unsigned integer from the source SIMD and FP register, zero-extends it to form a 32-bit or 64-bit value, and writes the result to the destination general-purpose register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Table 21-103: UMOV (32-bit) specifier combinations**

| <i>Ts</i> | <i>index</i> |
|-----------|--------------|
| B         | 0 to 15      |
| H         | 0 to 7       |
| S         | 0 to 3       |

### Related information

[MOV \(vector, to general\) \(A64\)](#) on page 1263

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.249 UMULL, UMULL2 (vector, by element) (A64)

Unsigned Multiply Long (vector, by element).

### Syntax

UMULL2*Vd.Ta, Vn.Tb, Vm.Ts*[*index*]

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**ta**

Is an arrangement specifier, and can be either 4s or 2D.

**vn**

Is the name of the first SIMD and FP source register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vm**

Is the name of the second SIMD and FP source register:

- If *ts* is H, then *vm* must be in the range V0 to V15.

- If  $Ts$  is s, then  $Vm$  must be in the range V0 to V31.

***Ts***

Is an element size specifier, and can be either H or s.

***index***

Is the element index, in the range shown in Usage.

**Usage**

Unsigned Multiply Long (vector, by element). This instruction multiplies each vector element in the lower or upper half of the first source SIMD and FP register by the specified vector element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied.

The `UMULL` instruction extracts vector elements from the lower half of the first source register, while the `UMULL2` instruction extracts vector elements from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-104: UMULL, UMULL2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> | <b>Ts</b> | <b>index</b> |
|------------------|-----------|-----------|-----------|--------------|
| -                | 4S        | 4H        | H         | 0 to 7       |
| 2                | 4S        | 8H        | H         | 0 to 7       |
| -                | 2D        | 2S        | S         | 0 to 3       |
| 2                | 2D        | 4S        | S         | 0 to 3       |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.250 UMULL, UMULL2 (vector) (A64)

Unsigned Multiply long (vector).

**Syntax**

`UMULL2Vd.Ta, Vn.Tb, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See **<Q>** in the Usage table.

***vd***

Is the name of the SIMD and FP destination register.

***ta***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vn***

Is the name of the first SIMD and FP source register.

***tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vm***

Is the name of the second SIMD and FP source register.

## Usage

Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD and FP registers, places the result in a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.

The `UMULL` instruction extracts each source vector from the lower half of each source register, while the `UMULL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-105: UMULL, UMULL2 (Vector) specifier combinations**

| <Q> | <i>ta</i> | <i>tb</i> |
|-----|-----------|-----------|
| -   | 8H        | 8B        |
| 2   | 8H        | 16B       |
| -   | 4S        | 4H        |
| 2   | 4S        | 8H        |
| -   | 2D        | 2S        |
| 2   | 2D        | 4S        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.251 UQADD (vector) (A64)

Unsigned saturating Add.

### Syntax

`UQADD Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD and FP registers, places the results into a vector, and writes the vector to the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.252 UQRSHL (vector) (A64)

Unsigned saturating Rounding Shift Left (register).

### Syntax

`UQRSHL Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**Usage**

Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD and FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. For truncated results, see [UQSHL \(vector, immediate\) \(A64\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.253 UQRSHRN, UQRSHRN2 (vector) (A64)

Unsigned saturating Rounded Shift Right Narrow (immediate).

**Syntax**

`UQRSHRN2Vd.Tb, Vn.Ta, #shift`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

***Ta***

Is an arrangement specifier, and can be one of the values shown in Usage.

***shift***

Is the right shift amount, in the range 1 to the destination element width in bits, and can be one of the values shown in Usage.

**Usage**

Unsigned saturating Rounded Shift Right Narrow (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, puts the final result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. All the values in this instruction are unsigned integer values. The results are rounded. For truncated results, see [UQSHRN, UQSHRN2 \(vector\) \(A64\)](#).

The `UQRSHRN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `UQRSHRN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-106: UQRSHRN{2} (Vector) specifier combinations**

| <Q> | <i>Tb</i> | <i>Ta</i> | <i>shift</i> |
|-----|-----------|-----------|--------------|
| -   | 8B        | 8H        | 1 to 8       |
| 2   | 16B       | 8H        | 1 to 8       |
| -   | 4H        | 4S        | 1 to 16      |
| 2   | 8H        | 4S        | 1 to 16      |
| -   | 2S        | 2D        | 1 to 32      |
| 2   | 4S        | 2D        | 1 to 32      |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.254 UQSHL (vector, immediate) (A64)

Unsigned saturating Shift Left (immediate).

**Syntax**

`UQSHL Vd.T, Vn.T, #shift`

Where:

***vd***

Is the name of the SIMD and FP destination register.

***t***

Is an arrangement specifier, and can be one of the values shown in Usage.

***vn***

Is the name of the SIMD and FP source register.

***shift***

Is the left shift amount, in the range 0 to the element width in bits minus 1, and can be one of the values shown in Usage.

## Usage

Unsigned saturating Shift Left (immediate). This instruction takes each vector element in the source SIMD and FP register, shifts it by an immediate value, places the results in a vector, and writes the vector to the destination SIMD and FP register. The results are truncated. For rounded results, see [UQRSHL \(vector\) \(A64\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-107: UQSHL (Vector) specifier combinations**

| <b><i>t</i></b> | <b><i>shift</i></b> |
|-----------------|---------------------|
| 8B              | 0 to 7              |
| 16B             | 0 to 7              |
| 4H              | 0 to 15             |
| 8H              | 0 to 15             |
| 2S              | 0 to 31             |
| 4S              | 0 to 31             |
| 2D              | 0 to 63             |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.255 UQSHL (vector, register) (A64)

Unsigned saturating Shift Left (register).

### Syntax

`UQSHL Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD and FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. For rounded results, see [UQRSHL \(vector\) \(A64\)](#).

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.256 UQSHRN, UQSHRN2 (vector) (A64)

Unsigned saturating Shift Right Narrow (immediate).

### Syntax

`UQSHRN2Vd.Tb, Vn.Ta, #shift`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vn**

Is the name of the SIMD and FP source register.

**ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**shift**

Is the right shift amount, in the range 1 to the destination element width in bits, and can be one of the values shown in Usage.

## Usage

Unsigned saturating Shift Right Narrow (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, saturates each shifted result to a value that is half the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. All the values in this instruction are unsigned integer values. The results are truncated. For rounded results, see [UQRSHRN, UQRSHRN2 \(vector\) \(A64\)](#).

The `UQSHRN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `UQSHRN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-108: UQSHRN{2} (Vector) specifier combinations**

| <Q> | <b>tb</b> | <b>ta</b> | <b>shift</b> |
|-----|-----------|-----------|--------------|
| -   | 8B        | 8H        | 1 to 8       |
| 2   | 16B       | 8H        | 1 to 8       |
| -   | 4H        | 4S        | 1 to 16      |
| 2   | 8H        | 4S        | 1 to 16      |
| -   | 2S        | 2D        | 1 to 32      |
| 2   | 4S        | 2D        | 1 to 32      |

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.257 UQSUB (vector) (A64)

Unsigned saturating Subtract.

**Syntax**

`UQSUB Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**Usage**

Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD and FP register from the corresponding element values of the first source SIMD and FP register, places the results into a vector, and writes the vector to the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.258 UQXTN, UQXTN2 (vector) (A64)

Unsigned saturating extract Narrow.

**Syntax**

`UQXTN2Vd.Tb, Vn.Ta`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vn**

Is the name of the SIMD and FP source register.

**ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

## Usage

Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD and FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are unsigned integer values.

If saturation occurs, the cumulative saturation bit FPSR.QC is set.

The `UQXTN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `UQXTN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-109: UQXTN{2} (Vector) specifier combinations**

| <Q> | <b>tb</b> | <b>ta</b> |
|-----|-----------|-----------|
| -   | 8B        | 8H        |
| 2   | 16B       | 8H        |
| -   | 4H        | 4S        |
| 2   | 8H        | 4S        |
| -   | 2S        | 2D        |
| 2   | 4S        | 2D        |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.259 URECPE (vector) (A64)

Unsigned Reciprocal Estimate.

### Syntax

`URECPE Vd.T, Vn.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either 2s or 4s.

**Vn**

Is the name of the SIMD and FP source register.

### Usage

Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD and FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD and FP register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.260 URHADD (vector) (A64)

Unsigned Rounding Halving Add.

### Syntax

`URHADD Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S or 4S.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Usage

Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD and FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD and FP register.

The results are rounded. For truncated results, see [UHADD \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.261 URSHL (vector) (A64)

Unsigned Rounding Shift Left (register).

### Syntax

`URSHL Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Usage

Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD and FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.262 URSHR (vector) (A64)

Unsigned Rounding Shift Right (immediate).

### Syntax

`URSHR Vd.T, Vn.T, #shift`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**shift**

Is the right shift amount, in the range 1 to the element width in bits, and can be one of the values shown in Usage.

### Usage

Unsigned Rounding Shift Right (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are unsigned integer values. The results are rounded. For truncated results, see [USHR \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-110: URSHR (Vector) specifier combinations**

| <b>T</b> | <b>shift</b> |
|----------|--------------|
| 8B       | 1 to 8       |
| 16B      | 1 to 8       |
| 4H       | 1 to 16      |
| 8H       | 1 to 16      |
| 2S       | 1 to 32      |
| 4S       | 1 to 32      |
| 2D       | 1 to 64      |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.263 URSQRTE (vector) (A64)

Unsigned Reciprocal Square Root Estimate.

### Syntax

`URSQRTE Vd.T, Vn.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be either `2s` or `4s`.

**Vn**

Is the name of the SIMD and FP source register.

### Usage

Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD and FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are unsigned integer values.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.264 URSRA (vector) (A64)

Unsigned Rounding Shift Right and Accumulate (immediate).

### Syntax

`URSRA Vd.T, Vn.T, #shift`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to the element width in bits, and can be one of the values shown in Usage.

**Usage**

Unsigned Rounding Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of the destination SIMD and FP register. All the values in this instruction are unsigned integer values. The results are rounded. For truncated results, see [USRA \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-111: URSRA (Vector) specifier combinations**

| <b><i>T</i></b> | <b><i>shift</i></b> |
|-----------------|---------------------|
| 8B              | 1 to 8              |
| 16B             | 1 to 8              |
| 4H              | 1 to 16             |
| 8H              | 1 to 16             |
| 2S              | 1 to 32             |
| 4S              | 1 to 32             |
| 2D              | 1 to 64             |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.265 USHL (vector) (A64)

Unsigned Shift Left (register).

**Syntax**

USHL *Vd.T, Vn.T, Vm.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

**Usage**

Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD and FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD and FP register, places the results in a vector, and writes the vector to the destination SIMD and FP register.

If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift. For a rounding shift, see [URSHL \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.266 USHLL, USHLL2 (vector) (A64)

Unsigned Shift Left Long (immediate).

This instruction is used by the alias UXTL, UXTL2, UXTL, UXTL22.

**Syntax**

`USHLL2Vd.Ta, Vn.Tb, #shift`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**shift**

Is the left shift amount, in the range 0 to the source element width in bits minus 1, and can be one of the values shown in Usage.

## Usage

Unsigned Shift Left Long (immediate). This instruction reads each vector element in the lower or upper half of the source SIMD and FP register, shifts the unsigned integer value left by the specified number of bits, places the result into a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements.

The `USHLL` instruction extracts vector elements from the lower half of the source register, while the `USHLL2` instruction extracts vector elements from the upper half of the source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-112: USHLL, USHLL2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> | <b>shift</b> |
|------------------|-----------|-----------|--------------|
| -                | 8H        | 8B        | 0 to 7       |
| 2                | 8H        | 16B       | 0 to 7       |
| -                | 4S        | 4H        | 0 to 15      |
| 2                | 4S        | 8H        | 0 to 15      |
| -                | 2D        | 2S        | 0 to 31      |
| 2                | 2D        | 4S        | 0 to 31      |

## Related information

[UXTL, UXTL2 \(vector\) \(A64\)](#) on page 1416

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.267 USHR (vector) (A64)

Unsigned Shift Right (immediate).

### Syntax

`USHR Vd.T, Vn.T, #shift`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

***shift***

Is the right shift amount, in the range 1 to the element width in bits, and can be one of the values shown in Usage.

**Usage**

Unsigned Shift Right (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the destination SIMD and FP register. All the values in this instruction are unsigned integer values. The results are truncated. For rounded results, see [URSHR \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-113: USHR (Vector) specifier combinations**

| <b>T</b> | <b><i>shift</i></b> |
|----------|---------------------|
| 8B       | 1 to 8              |
| 16B      | 1 to 8              |
| 4H       | 1 to 16             |
| 8H       | 1 to 16             |
| 2S       | 1 to 32             |
| 4S       | 1 to 32             |
| 2D       | 1 to 64             |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.268 USQADD (vector) (A64)

Unsigned saturating Accumulate of Signed value.

**Syntax**

USQADD *Vd.T, Vn.T*

Where:

***Vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Vn***

Is the name of the SIMD and FP source register.

## Usage

Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD and FP register to corresponding unsigned integer values of the vector elements in the destination SIMD and FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD and FP register.

If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

# 21.269 USRA (vector) (A64)

Unsigned Shift Right and Accumulate (immediate).

## Syntax

`USRA Vd.T, Vn.T, #shift`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the SIMD and FP source register.

**shift**

Is the right shift amount, in the range 1 to the element width in bits, and can be one of the values shown in Usage.

## Usage

Unsigned Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD and FP register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of the destination SIMD and FP register. All the values in this instruction are unsigned integer values. The results are truncated. For rounded results, see [URSRA \(vector\) \(A64\)](#).

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-114: USRA (Vector) specifier combinations**

| <b>T</b> | <b>shift</b> |
|----------|--------------|
| 8B       | 1 to 8       |
| 16B      | 1 to 8       |
| 4H       | 1 to 16      |
| 8H       | 1 to 16      |
| 2S       | 1 to 32      |
| 4S       | 1 to 32      |
| 2D       | 1 to 64      |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.270 USUBL, USUBL2 (vector) (A64)

Unsigned Subtract Long.

**Syntax**

USUBL2Vd.*Ta*, Vn.*Tb*, Vm.*Tb*

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

**Tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vm**

Is the name of the second SIMD and FP source register.

**Usage**

Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD and FP register from the corresponding vector element of the first source SIMD and FP register, places the result into a vector, and writes the vector to the

destination SIMD and FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.

The `USUBL` instruction extracts each source vector from the lower half of each source register, while the `USUBL2` instruction extracts each source vector from the upper half of each source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-115: USUBL, USUBL2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> |
|------------------|-----------|-----------|
| -                | 8H        | 8B        |
| 2                | 8H        | 16B       |
| -                | 4S        | 4H        |
| 2                | 4S        | 8H        |
| -                | 2D        | 2S        |
| 2                | 2D        | 4S        |

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.271 USUBW, USUBW2 (vector) (A64)

Unsigned Subtract Wide.

### Syntax

`USUBW2Vd.Ta, Vn.Ta, Vm.Tb`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See `<Q>` in the Usage table.

**Vd**

Is the name of the SIMD and FP destination register.

**Ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**Vn**

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

***Tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

**Usage**

Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD and FP register from the corresponding vector element in the lower or upper half of the first source SIMD and FP register, places the result in a vector, and writes the vector to the SIMD and FP destination register. All the values in this instruction are signed integer values.

The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register.

The **USUBW** instruction extracts vector elements from the lower half of the first source register, while the **USUBW2** instruction extracts vector elements from the upper half of the first source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-116: USUBW, USUBW2 (Vector) specifier combinations**

| <b>&lt;Q&gt;</b> | <b>Ta</b> | <b>Tb</b> |
|------------------|-----------|-----------|
| -                | 8H        | 8B        |
| 2                | 8H        | 16B       |
| -                | 4S        | 4H        |
| 2                | 4S        | 8H        |
| -                | 2D        | 2S        |
| 2                | 2D        | 4S        |

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.272 UXTL, UXTL2 (vector) (A64)

Unsigned extend Long.

This instruction is an alias of **USHLL**, **USHLL2**.

The equivalent instruction is **USHLL2Vd.Ta, Vn.Tb, #0**.

**Syntax**

**UXTL2Vd.Ta, Vn.Tb**

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

**vd**

Is the name of the SIMD and FP destination register.

**ta**

Is an arrangement specifier, and can be one of the values shown in Usage.

**vn**

Is the name of the SIMD and FP source register.

**tb**

Is an arrangement specifier, and can be one of the values shown in Usage.

## Usage

Unsigned extend Long. This instruction copies each vector element from the lower or upper half of the source SIMD and FP register into a vector, and writes the vector to the destination SIMD and FP register. The destination vector elements are twice as long as the source vector elements.

The `uxtl` instruction extracts vector elements from the lower half of the source register, while the `uxtl2` instruction extracts vector elements from the upper half of the source register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-117: UXTL, UXTL2 (Vector) specifier combinations**

| <Q> | <b>ta</b> | <b>tb</b> |
|-----|-----------|-----------|
| -   | 8H        | 8B        |
| 2   | 8H        | 16B       |
| -   | 4S        | 4H        |
| 2   | 4S        | 8H        |
| -   | 2D        | 2S        |
| 2   | 2D        | 4S        |

## Related information

[USHLL, USHLL2 \(vector\) \(A64\)](#) on page 1410

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.273 UZP1 (vector) (A64)

Unzip vectors (primary).

### Syntax

`UZP1 Vd.T, Vn.T, Vm.T`

Where:

**Vd**

Is the name of the SIMD and FP destination register.

**T**

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

**Vn**

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

### Usage

Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD and FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD and FP register.



This instruction can be used with `UZP2` to de-interleave two vectors.

Note

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

### Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.274 UZP2 (vector) (A64)

Unzip vectors (secondary).

### Syntax

`UZP2 Vd.T, Vn.T, Vm.T`

Where:

***vd***

Is the name of the SIMD and FP destination register.

***t***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***vn***

Is the name of the first SIMD and FP source register.

***vm***

Is the name of the second SIMD and FP source register.

**Usage**

Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD and FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD and FP register.



This instruction can be used with `UZP1` to de-interleave two vectors.

Note

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

**Related information**

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.275 XTN, XTN2 (vector) (A64)

Extract Narrow.

**Syntax**

`XTN2Vd.Tb, vn.Ta`

Where:

**2**

Is the second and upper half specifier. If present it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements. See <Q> in the Usage table.

***vd***

Is the name of the SIMD and FP destination register.

***Tb***

Is an arrangement specifier, and can be one of the values shown in Usage.

***Vn***

Is the name of the SIMD and FP source register.

***Ta***

Is an arrangement specifier, and can be one of the values shown in Usage.

## Usage

Extract Narrow. This instruction reads each vector element from the source SIMD and FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD and FP register. The destination vector elements are half as long as the source vector elements.

The `XTN` instruction writes the vector to the lower half of the destination register and clears the upper half, while the `XTN2` instruction writes the vector to the upper half of the destination register without affecting the other bits of the register.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

The following table shows the valid specifier combinations:

**Table 21-118: XTN, XTN2 (Vector) specifier combinations**

| <Q> | <b><i>Tb</i></b> | <b><i>Ta</i></b> |
|-----|------------------|------------------|
| -   | 8B               | 8H               |
| 2   | 16B              | 8H               |
| -   | 4H               | 4S               |
| 2   | 8H               | 4S               |
| -   | 2S               | 2D               |
| 2   | 4S               | 2D               |

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.276 ZIP1 (vector) (A64)

Zip vectors (primary).

### Syntax

`ZIP1 Vd.T, Vn.T, Vm.T`

Where:

***vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

***Vm***

Is the name of the second SIMD and FP source register.

## Usage

Zip vectors (primary). This instruction reads adjacent vector elements from the upper half of two source SIMD and FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD and FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.



This instruction can be used with `ZIP2` to interleave two vectors.

Note

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

## 21.277 ZIP2 (vector) (A64)

Zip vectors (secondary).

### Syntax

`ZIP2 vd.T, Vn.T, Vm.T`

Where:

***vd***

Is the name of the SIMD and FP destination register.

***T***

Is an arrangement specifier, and can be one of 8B, 16B, 4H, 8H, 2S, 4S or 2D.

***Vn***

Is the name of the first SIMD and FP source register.

**Vm**

Is the name of the second SIMD and FP source register.

## Usage

Zip vectors (secondary). This instruction reads adjacent vector elements from the lower half of two source SIMD and FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD and FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.



Note

This instruction can be used with `ZIP1` to interleave two vectors.

Depending on the settings in the CPACR\_EL1, CPTR\_EL2, and CPTR\_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.

## Related information

[A64 SIMD Vector instructions in alphabetical order](#) on page 1135

# 22. Directives Reference

Describes the directives that are provided by the Arm assembler, armasm.

## 22.1 Alphabetical list of directives

The Arm assembler, armasm, provides various directives.

The following table lists them:

**Table 22-1: List of directives**

| Directive           | Directive                 | Directive                |
|---------------------|---------------------------|--------------------------|
| ALIAS               | EQU                       | LTORG                    |
| ALIGN               | EXPORT or GLOBAL          | MACRO and MEND           |
| ARM or CODE32       | EXPORTAS                  | MAP                      |
| AREA                | EXTERN                    | MEND (see MACRO)         |
| ASSERT              | FIELD                     | MEXIT                    |
| ATTR                | FRAME ADDRESS             | NOFP                     |
| CN                  | FRAME POP                 | OPT                      |
| CODE16              | FRAME PUSH                | PRESERVE8 (see REQUIRE8) |
| COMMON              | FRAME REGISTER            | PROC see FUNCTION        |
| CP                  | FRAME RESTORE             | -                        |
| DATA                | FRAME SAVE                | RELOC                    |
| DCB                 | FRAME STATE REMEMBER      | REQUIRE                  |
| DCD and DCDU        | FRAME STATE RESTORE       | REQUIRE8 and PRESERVE8   |
| DCDO                | FRAME UNWIND ON or OFF    | RLIST                    |
| DCFD and DCFDU      | FUNCTION or PROC          | RN                       |
| DCFS and DCFSU      | GBLA, GBLL, and GBLS      | ROUT                     |
| DCI                 | GET or INCLUDE            | SETA, SETL, and SETS     |
| DCQ and DCQU        | GLOBAL (see EXPORT)       | SN                       |
| DCW and DCWU        | IF, ELSE, ENDIF, and ELIF | SPACE or FILL            |
| DN                  | IMPORT                    | SUBT                     |
| ELIF, ELSE (see IF) | INCBIN                    | THUMB                    |
| END                 | INCLUDE see GET           | TTL                      |
| ENDFUNC or ENDP     | INFO                      | WHILE and WEND           |
| ENDIF (see IF)      | KEEP                      | WN and XN                |
| ENTRY               | LCLA, LCLL, and LCLS      | -                        |

## 22.2 About assembly control directives

Some assembler directives control conditional assembly, looping, inclusions, and macros.

These directives are as follows:

- `MACRO` and `MEND`.
- `MEXIT`.
- `IF`, `ELSE`, `ENDIF`, and `ELIF`.
- `WHILE` and `WEND`.

### Nesting directives

The following structures can be nested to a total depth of 256:

- `MACRO` definitions.
- `WHILE...WEND` loops.
- `IF...ELSE...ENDIF` conditional structures.
- `INCLUDE` file inclusions.

The limit applies to all structures taken together, regardless of how they are nested. The limit is not 256 of each type of structure.

### Related information

[MACRO and MEND](#) on page 1471

[MEXIT](#) on page 1475

[IF, ELSE, ENDIF, and ELIF](#) on page 1462

[WHILE and WEND](#) on page 1487

## 22.3 About frame directives

Frame directives enable debugging and profiling of assembly language functions. They also enable the stack usage of functions to be calculated.

Correct use of these directives:

- Enables the `armasm` option `--callgraph` to calculate stack usage of assembler functions.

The following are the rules that determine stack usage:

- If a function is not marked with `PROC` or `ENDP`, stack usage is unknown.
- If a function is marked with `PROC` or `ENDP` but with no `FRAME PUSH` or `FRAME POP`, stack usage is assumed to be zero. This means that there is no requirement to manually add `FRAME PUSH 0` or `FRAME POP 0`.
- If a function is marked with `PROC` or `ENDP` and with `FRAME PUSH n` or `FRAME POP n`, stack usage is assumed to be `n` bytes.

- Helps you to avoid errors in function construction, particularly when you are modifying existing code.
- Enables the assembler to alert you to errors in function construction.
- Enables backtracing of function calls during debugging.
- Enables the debugger to profile assembler functions.

If you require profiling of assembler functions, but do not want frame description directives for other purposes:

- You must use the `FUNCTION` and `ENDFUNC`, or `PROC` and `ENDP`, directives.
- You can omit the other `FRAME` directives.
- You only have to use the `FUNCTION` and `ENDFUNC` directives for the functions you want to profile.

In DWARF, the canonical frame address is an address on the stack specifying where the call frame of an interrupted function is located.

## Related information

[FRAME ADDRESS](#) on page 1452

[FRAME POP](#) on page 1453

[FRAME PUSH](#) on page 1454

[FRAME REGISTER](#) on page 1455

[FRAME RESTORE](#) on page 1455

[FRAME RETURN ADDRESS](#) on page 1456

[FRAME SAVE](#) on page 1456

[FRAME STATE REMEMBER](#) on page 1457

[FRAME STATE RESTORE](#) on page 1458

[FRAME UNWIND ON](#) on page 1458

[FRAME UNWIND OFF](#) on page 1459

[FUNCTION or PROC](#) on page 1459

[ENDFUNC or ENDP](#) on page 1446

## 22.4 ALIAS

The ALIAS directive creates an alias for a symbol.

### Syntax

`ALIAS name, aliasname`

where:

**`name`**

is the name of the symbol to create an alias for.

***aliasname***

is the name of the alias to be created.

**Usage**

The symbol *name* must already be defined in the source file before creating an alias for it. Properties of *name* set by the `EXPORT` directive are not inherited by *aliasname*, so you must use `EXPORT` on *aliasname* if you want to make the alias available outside the current source file. Apart from the properties set by the `EXPORT` directive, *name* and *aliasname* are identical.

**Correct example**

```
baz
bar PROC
    BX lr
    ENDP
    ALIAS bar,foo      ; foo is an alias for bar
    EXPORT bar
    EXPORT foo          ; foo and bar have identical properties
                        ; because foo was created using ALIAS
    EXPORT baz          ; baz and bar are not identical
                        ; because the size field of baz is not set
```

**Incorrect example**

```
EXPORT bar
IMPORT car
ALIAS bar,foo ; ERROR - bar is not defined yet
ALIAS car,boo ; ERROR - car is external
bar PROC
    BX lr
    ENDP
```

**Related information**

[EXPORT or GLOBAL](#) on page 1448

## 22.5 ALIGN

The ALIGN directive aligns the current location to a specified boundary by padding with zeros or NOP instructions.

**Syntax**

```
ALIGN {expr {,{<offset{,pad}>} {,padsze}}}}
```

where:

***expr***

is a numeric expression evaluating to any power of 2 from 20 to 231

***offset***

can be any numeric expression

***pad***

can be any numeric expression

***padsiz*e**

can be 1, 2 or 4.

**Operation**

The current location is aligned to the next lowest address of the form:

*offset* + *n* \* *expr*

*n* is any integer which the assembler selects to minimise padding.

If *expr* is not specified, `ALIGN` sets the current location to the next word (four byte) boundary. The unused space between the previous and the new current location are filled with:

- Copies of *pad*, if *pad* is specified.
- `NOP` instructions, if all the following conditions are satisfied:
  - *pad* is not specified.
  - The `ALIGN` directive follows A32 or T32 instructions.
  - The current section has the `CODEALIGN` attribute set on the `AREA` directive.
- Zeros otherwise.

*pad* is treated as a byte, halfword, or word, according to the value of *padsiz*e. If *padsiz*e is not specified, *pad* defaults to bytes in data sections, halfwords in T32 code, or words in A32 code.

**Usage**

Use `ALIGN` to ensure that your data and code is aligned to appropriate boundaries. This is typically required in the following circumstances:

- The `ADR` T32 pseudo-instruction can only load addresses that are word aligned, but a label within T32 code might not be word aligned. Use `ALIGN 4` to ensure four-byte alignment of an address within T32 code.
- Use `ALIGN` to take advantage of caches on some Arm processors. For example, the Arm940T has a cache with 16-byte lines. Use `ALIGN 16` to align function entries on 16-byte boundaries and maximize the efficiency of the cache.
- A label on a line by itself can be arbitrarily aligned. Following A32 code is word-aligned (T32 code is halfword aligned). The label therefore does not address the code correctly. Use `ALIGN 4` (or `ALIGN 2` for T32) before the label.

Alignment is relative to the start of the ELF section where the routine is located. The section must be aligned to the same, or coarser, boundaries. The `ALIGN` attribute on the `AREA` directive is specified differently.

**Examples**

```
rout1    AREA      cacheable, CODE, ALIGN=3
          ; code           ; aligned on 8-byte boundary
```

```

; code
MOV    pc,lr ; aligned only on 4-byte boundary
ALIGN 8      ; now aligned on 8-byte boundary
rout2 ; code

```

In the following example, the `ALIGN` directive tells the assembler that the next instruction is word aligned and offset by 3 bytes. The 3 byte offset is counted from the previous word aligned address, resulting in the second `DCB` placed in the last byte of the same word and 2 bytes of padding are to be added.

```

AREA  OffsetExample, CODE
DCB   1      ; This example places the two bytes in the first
ALIGN 4,3    ; and fourth bytes of the same word.
DCB   1      ; The second DCB is offset by 3 bytes from the
             ; first DCB.

```

In the following example, the `ALIGN` directive tells the assembler that the next instruction is word aligned and offset by 2 bytes. Here, the 2 byte offset is counted from the next word aligned address, so the value `n` is set to 1 (`n=0` clashes with the third `DCB`). This time three bytes of padding are to be added.

```

AREA  OffsetExample1, CODE
DCB   1      ; In this example, n cannot be 0 because it
DCB   1      ; clashes with the 3rd DCB. The assembler
DCB   1      ; sets n to 1.
ALIGN 4,2    ; The next instruction is word aligned and
DCB   2      ; offset by 2.

```

In the following example, the `DCB` directive makes the PC misaligned. The `ALIGN` directive ensures that the label `subroutine1` and the following instruction are word aligned.

```

start  AREA  Example, CODE, READONLY
       LDR   r6,=label1
       ; code
       MOV   pc,lr
label1 DCB   1      ; PC now misaligned
       ALIGN  ; ensures that subroutine1 addresses
subroutine1          ; the following instruction.
       MOV   r5,#0x5

```

## Related information

[AREA](#) on page 1428

## 22.6 AREA

The `AREA` directive instructs the assembler to assemble a new code or data section.

### Syntax

`AREA sectionname ,{attr}, {attr}...`

where:

**`sectionname`**

is the name to give to the section. Sections are independent, named, indivisible chunks of code or data that are manipulated by the linker.

You can choose any name for your sections. However, names starting with a non-alphabetic character must be enclosed in bars or a missing section name error is generated. For example, `|1_DataArea|`.

Certain names are conventional. For example, `|.text|` is used for code sections produced by the C compiler, or for code sections otherwise associated with the C library.

**`attr`**

are one or more comma-delimited section attributes. Valid attributes are:

**`ALIGN=expression`**

By default, ELF sections are aligned on a four-byte boundary. `expression` can have any integer value from 0 to 31. The section is aligned on a  $2^{\text{expression}}$ -byte boundary. For example, if `expression` is 10, the section is aligned on a 1KB boundary.

This is not the same as the way that the `ALIGN` directive is specified.



Do not use `ALIGN=0` or `ALIGN=1` for A32 code sections.

Do not use `ALIGN=0` for T32 code sections.

**`ASSOC=section`**

`section` specifies an associated ELF section. `sectionname` must be included in any link that includes `section`.

**`CODE`**

Contains machine instructions. `READONLY` is the default.

**`CODEALIGN`**

Causes `armasm` to insert `NOP` instructions when the `ALIGN` directive is used after A32 or T32 instructions within the section, unless the `ALIGN` directive specifies a different padding. `CODEALIGN` is the default for execute-only sections.

**`COMDEF`**

Is a common section definition. This ELF section can contain code or data. It must be identical to any other section of the same name in other source files.

Identical ELF sections with the same name are overlaid in the same section of memory by the linker. If any are different, the linker generates a warning and does not overlay the sections.

**`COMGROUP=symbol_name`**

Is the signature that makes the `AREA` part of the named ELF section group. See the `GROUP=symbol_name` for more information. The `comgroup` attribute marks the ELF section group with the `GRP_COMDAT` flag.

**COMMON**

Is a common data section. You must not define any code or data in it. It is initialized to zeros by the linker. All common sections with the same name are overlaid in the same section of memory by the linker. They do not all have to be the same size. The linker allocates as much space as is required by the largest common section of each name.

**DATA**

Contains data, not instructions. `READWRITE` is the default.

**EXECONLY**

Indicates that the section is execute-only. Execute-only sections must also have the `CODE` attribute, and must not have any of the following attributes:

- `READONLY`.
- `READWRITE`.
- `DATA`.
- `ZEROALIGN`.

`armasm` faults if any of the following occur in an execute-only section:

- Explicit data definitions, for example `DCD` and `DCB`.
- Implicit data definitions, for example `LDR r0, =0xaabbccdd`.
- Literal pool directives, for example `LTORG`, if there is literal data to be emitted.
- `INCBIN` or `SPACE` directives.
- `ALIGN` directives, if the required alignment cannot be accomplished by padding with `NOP` instructions. `armasm` implicitly applies the `CODEALIGN` attribute to sections with the `EXECONLY` attribute.

**FINI\_ARRAY**

Sets the ELF type of the current area to `SHT_FINI_ARRAY`.

**GROUP=*symbol\_name***

Is the signature that makes the `AREA` part of the named ELF section group. It must be defined by the source file, or a file included by the source file. All `AREAS` with the same `symbol_name` signature are part of the same group. Sections within a group are kept or discarded together.

**INIT\_ARRAY**

Sets the ELF type of the current area to `SHT_INIT_ARRAY`.

**LINKORDER=*section***

Specifies a relative location for the current section in the image. It ensures that the order of all the sections with the `LINKORDER` attribute, with respect to each other, is the same as the order of the corresponding named `sections` in the image.

**MERGE=*n***

Indicates that the linker can merge the current section with other sections with the `MERGE=:samp:{n}` attribute. *n* is the size of the elements in the section, for example

*n* is 1 for characters. You must not assume that the section is merged, because the attribute does not force the linker to merge the sections.

**NOALLOC**

Indicates that no memory on the target system is allocated to this area.

**NOINIT**

Indicates that the data section is uninitialized, or initialized to zero. It contains only space reservation directives `SPACE` or `DCB`, `DCD`, `DCDU`, `DCQ`, `DCQU`, `DCW`, or `DCWU` with initialized values of zero. You can decide at link time whether an area is uninitialized or zero-initialized.



Arm® Compiler does not support systems with ECC or parity protection where the memory is not initialized.

**PREINIT\_ARRAY**

Sets the ELF type of the current area to `SHT_PREINIT_ARRAY`.

**READONLY**

Indicates that this section must not be written to. This is the default for Code areas.

**READWRITE**

Indicates that this section can be read from and written to. This is the default for Data areas.

**SECFLAGS=*n***

Adds one or more ELF flags, denoted by *n*, to the current section.

**SECTYPE=*n***

Sets the ELF type of the current section to *n*.

**STRINGS**

Adds the `SHF_STRINGS` flag to the current section. To use the `STRINGS` attribute, you must also use the `MERGE=1` attribute. The contents of the section must be strings that are nul-terminated using the `DCB` directive.

**ZEROALIGN**

Causes `armasm` to insert zeros when the `ALIGN` directive is used after A32 or T32 instructions within the section, unless the `ALIGN` directive specifies a different padding. `ZEROALIGN` is the default for sections that are not execute-only.

## Usage

Use the `AREA` directive to subdivide your source file into ELF sections. You can use the same name in more than one `AREA` directive. All areas with the same name are placed in the same ELF section. Only the attributes of the first `AREA` directive of a particular name are applied.

In general, Arm recommends that you use separate ELF sections for code and data. However, you can put data in code sections. Large programs can usually be conveniently divided into several code sections. Large independent data sets are also usually best placed in separate sections.

The scope of numeric local labels is defined by `AREA` directives, optionally subdivided by `ROUT` directives.

There must be at least one `AREA` directive for an assembly.



`armasm` emits `R_ARM_TARGET1` relocations for the `DCD` and `DCDU` directives if the directive uses PC-relative expressions and is in any of the `PREINIT_ARRAY`, `FINI_ARRAY`, or `INIT_ARRAY` ELF sections. You can override the relocation using the `RELOC` directive after each `DCD` or `DCDU` directive. If this relocation is used, read-write sections might become read-only sections at link time if the platform ABI permits this.

## Example

The following example defines a read-only code section named `Example`:

```
AREA      Example, CODE, READONLY    ; An example code section.  
; code
```

## Related information

[ALIGN](#) on page 1426

[RELOC](#) on page 1478

[DCD and DCDU](#) on page 1439

[ELF sections and the AREA directive](#) on page 99

[Information about image structure and generation](#)

## 22.7 ARM or CODE32 directive

The `ARM` directive instructs the assembler to interpret subsequent instructions as A32 instructions, using either the UAL or the pre-UAL Arm assembler language syntax. `CODE32` is a synonym for `ARM`.



Not supported for AArch64 state.

## Syntax

`ARM`

## Usage

In files that contain code using different instruction sets, the `ARM` directive must precede any A32 code.

If necessary, this directive also inserts up to three bytes of padding to align to the next word boundary.

This directive does not assemble to any instructions. It also does not change the state. It only instructs `armasm` to assemble A32 instructions as appropriate, and inserts padding if necessary.

## Example

This example shows how you can use `ARM` and `THUMB` directives to switch state and assemble both A32 and T32 instructions in a single area.

```

        AREA ToT32, CODE, READONLY      ; Name this block of code
        ENTRY                         ; Mark first instruction to execute
        ARM                            ; Subsequent instructions are A32
start
        ADR    r0, into_t32 + 1       ; Processor starts in A32 state
        BX     r0                     ; Inline switch to T32 state
        THUMB                          ; Subsequent instructions are T32
into_t32
        MOVS   r0, #10                ; New-style T32 instructions

```

## Related information

[CODE16 directive](#) on page 1436

[THUMB directive](#) on page 1486

[A-Profile Architectures](#)

## 22.8 ASSERT

The `ASSERT` directive generates an error message during assembly if a given assertion is false.

### Syntax

`ASSERT logical-expression`

where:

#### *logical-expression*

is an assertion that can evaluate to either `{TRUE}` or `{FALSE}`.

### Usage

Use `ASSERT` to ensure that any necessary condition is met during assembly.

If the assertion is false an error message is generated and assembly fails.

## Example

```

ASSERT  label1 <= label2      ; Tests if the address
                                ; represented by label1
                                ; is <= the address
                                ; represented by label2.

```

## Related information

[INFO](#) on page 1467

## 22.9 ATTR

The `ATTR` set directives set values for the ABI build attributes. The `ATTR` scope directives specify the scope for which the set value applies to.



`ATTR` is supported only for AArch32 state.

**Note**

### Syntax

`ATTR FILESCOPE`

`ATTR SCOPE name`

`ATTR settype tagid, value`

where:

***name***

is a section name or symbol name.

***settype***

can be any of:

- `SETVALUE`.
- `SETSTRING`.
- `SETCOMPATWITHVALUE`.
- `SETCOMPATWITHSTRING`.

***tagid***

is an attribute tag name (or its numerical value) defined in the ABI for the Arm® Architecture.

***value***

depends on `settype`:

- is a 32-bit integer value when `settype` is `SETVALUE` or `SETCOMPATWITHVALUE`.
- is a nul-terminated string when `settype` is `SETSTRING` or `SETCOMPATWITHSTRING`.

### Usage

The `ATTR` set directives following the `ATTR FILESCOPE` directive apply to the entire object file. The `ATTR` set directives following the `ATTR SCOPE name` directive apply only to the named section or symbol.

For tags that expect an integer, you must use `SETVALUE` or `SETCOMPATWITHVALUE`. For tags that expect a string, you must use `SETSTRING` or `SETCOMPATWITHSTRING`.

Use `SETCOMPATWITHVALUE` and `SETCOMPATWITHSTRING` to set tag values which the object file is also compatible with.

## Examples

```
ATTR SETSTRING Tag_CPU_raw_name, "Cortex-A8"
ATTR SETVALUE Tag_VFP_arch, 3 ; VFPv3 instructions permitted.
ATTR SETVALUE 10, 3          ; 10 is the numerical value of
                            ; Tag_VFP_arch.
```

## Related information

[Addenda to, and Errata in, the ABI for the Arm Architecture](#)

## 22.10 CN

The `CN` directive defines a name for a coprocessor register.

### Syntax

`name CNexpr`

where:

**name**

is the name to be defined for the coprocessor register. `name` cannot be the same as any of the predefined names.

**expr**

evaluates to a coprocessor register number from 0 to 15.

### Usage

Use `CN` to allocate convenient names to registers, to help you remember what you use each register for.



Avoid conflicting uses of the same register under different names.

Note

The names `c0` to `c15` are predefined.

### Example

```
power      CN  6           ; defines power as a symbol for
                            ; coprocessor register 6
```

## Related information

[Predeclared core register names in AArch32 state](#) on page 81  
[Predeclared extension register names in AArch32 state](#) on page 81

## 22.11 CODE16 directive

The `CODE16` directive instructs the assembler to interpret subsequent instructions as T32 instructions, using the UAL syntax.



Not supported for AArch64 state.

Note

### Syntax

`CODE16`

### Usage

In files that contain code using different instruction sets, `CODE16` must precede T32 code written in pre-UAL syntax.

If necessary, this directive also inserts one byte of padding to align to the next halfword boundary.

This directive does not assemble to any instructions. It also does not change the state. It only instructs `armasm` to assemble T32 instructions as appropriate, and inserts padding if necessary.

### Related information

[ARM or CODE32 directive](#) on page 1432  
[THUMB directive](#) on page 1486

## 22.12 COMMON

The `COMMON` directive allocates a block of memory of the defined size, at the specified symbol.

### Syntax

```
COMMON symbol1, {size, {alignment} } [{attr} ] }
```

where:

***symbol1***

is the symbol name. The symbol name is case-sensitive.

***size***

is the number of bytes to reserve.

***alignment***

is the alignment.

***attr***

can be any one of:

**DYNAMIC**

sets the ELF symbol visibility to `STV_DEFAULT`.

**PROTECTED**

sets the ELF symbol visibility to `STV_PROTECTED`.

**HIDDEN**

sets the ELF symbol visibility to `STV_HIDDEN`.

**INTERNAL**

sets the ELF symbol visibility to `STV_INTERNAL`.

**Usage**

You specify how the memory is aligned. If the alignment is omitted, the default alignment is four. If the size is omitted, the default size is zero.

You can access this memory as you would any other memory, but no space is allocated by the assembler in object files. The linker allocates the required space as zero-initialized memory during the link stage.

You cannot define, `IMPORT` or `EXTERN` a symbol that has already been created by the `COMMON` directive. In the same way, if a symbol has already been defined or used with the `IMPORT` or `EXTERN` directive, you cannot use the same symbol for the `COMMON` directive.

**Correct example**

```
LDR      r0, =xyz
COMMON  xyz,255,4 ; defines 255 bytes of ZI store, word-aligned
```

**Incorrect example**

```
COMMON  foo,4,4
COMMON  bar,4,4
foo    DCD    0          ; cannot define label with same name as COMMON
IMPORT  bar      ; cannot import label with same name as COMMON
```

## 22.13 CP

The `CP` directive defines a name for a specified coprocessor.

**Syntax**

`name CP expr`

where:

***name***

is the name to be assigned to the coprocessor. *name* cannot be the same as any of the predefined names.

***expr***

evaluates to a coprocessor number within the range 0 to 15.

**Usage**

Use `CP` to allocate convenient names to coprocessors, to help you to remember what you use each one for.



Avoid conflicting uses of the same coprocessor under different names.

Note

The names `p0` to `p15` are predefined for coprocessors 0 to 15.

**Example**

```
dmu      CP   6          ; defines dmu as a symbol for
                           ; coprocessor 6
```

**Related information**

[Predeclared core register names in AArch32 state](#) on page 81

[Predeclared extension register names in AArch32 state](#) on page 81

## 22.14 DATA

The `DATA` directive is no longer required. It is ignored by the assembler.

## 22.15 DCB

The `DCB` directive allocates one or more bytes of memory, and defines the initial runtime contents of the memory.

**Syntax**

```
{label} DCB expr,{expr}...
```

where:

***expr***

is either:

- A numeric expression that evaluates to an integer in the range -128 to 255.

- A quoted string. The characters of the string are loaded into consecutive bytes of store.

## Usage

If `DCB` is followed by an instruction, use an `ALIGN` directive to ensure that the instruction is aligned.

`=` is a synonym for `DCB`.

## Example

Unlike C strings, `armasm` syntax assembler strings are not nul-terminated. You can construct a nul-terminated C string using `DCB` as follows:

```
C_string    DCB    "C_string",0
```

## Related information

[Numeric expressions](#) on page 228

[DCD and DCDU](#) on page 1439

[DCQ and DCQU](#) on page 1443

[DCW and DCWU](#) on page 1444

[SPACE or FILL](#) on page 1485

[ALIGN](#) on page 1426

## 22.16 DCD and DCDU

The `DCD` directive allocates one or more words of memory, aligned on four-byte boundaries, and defines the initial runtime contents of the memory. `DCDU` is the same, except that the memory alignment is arbitrary.

### Syntax

```
{label } DCDU expr,{expr}
```

where:

#### `expr`

is either:

- A numeric expression.
- A PC-relative expression.

## Usage

`DCD` inserts up to three bytes of padding before the first defined word, if necessary, to achieve four-byte alignment.

Use `DCDU` if you do not require alignment.

`&` is a synonym for `DCD`.

## Examples

```

data1  DCD    1,5,20      ; Defines 3 words containing
                           ; decimal values 1, 5, and 20
data2  DCD    mem06 + 4   ; Defines 1 word containing 4 +
                           ; the address of the label mem06
                           AREA MyData, DATA, READWRITE
                           DCB    255          ; Now misaligned ...
data3  DCDU   1,5,20      ; Defines 3 words containing
                           ; 1, 5 and 20, not word aligned

```

## Related information

[DCB](#) on page 1438

[DCQ and DCQU](#) on page 1443

[DCW and DCWU](#) on page 1444

[SPACE or FILL](#) on page 1485

[Numeric expressions](#) on page 228

[DCI](#) on page 1442

## 22.17 DCDO

The `DCDO` directive allocates one or more words of memory, aligned on four-byte boundaries, and defines the initial runtime contents of the memory as an offset from the static base register, sb (R9).

### Syntax

```
{label} DCDO expr, {expr}...
```

where:

#### **expr**

is a register-relative expression or label. The base register must be sb.

### Usage

Use `DCDO` to allocate space in memory for static base register relative relocatable addresses.

### Example

```

IMPORT externsym
DCDO    externsym    ; 32-bit word relocated by offset of
                     ; externsym from base of SB section.

```

## 22.18 DCFD and DCFDU

The `DCFD` directive allocates memory for word-aligned double-precision floating-point numbers, and defines the initial runtime contents of the memory. `DCFDU` is the same, except that the memory alignment is arbitrary.

### Syntax

```
{label} DCFDU fpliteral,{fpliteral}...
```

where:

#### **fpliteral**

is a double-precision floating-point literal.

### Usage

Double-precision numbers occupy two words and must be word aligned to be used in arithmetic operations. The assembler inserts up to three bytes of padding before the first defined number, if necessary, to achieve four-byte alignment.

Use `DCFDU` if you do not require alignment.

The word order used when converting `fpliteral` to internal form is controlled by the floating-point architecture selected. You cannot use `DCFD` or `DCFDU` if you select the `--fpu none` option.

The range for double-precision numbers is:

- Maximum 1.79769313486231571e+308.
- Minimum 2.22507385850720138e-308.

### Examples

```
DCFD    1E308,-4E-100
DCFUD  10000,-.1,3.1E26
```

### Related information

[DCFS and DCFSU](#) on page 1441

[Syntax of floating-point literals](#) on page 230

## 22.19 DCFS and DCFSU

The `DCFS` directive allocates memory for word-aligned single-precision floating-point numbers, and defines the initial runtime contents of the memory. `DCFSU` is the same, except that the memory alignment is arbitrary.

### Syntax

```
{label} DCFSU fpliteral,{fpliteral}...
```

where:

#### **fpliteral**

is a single-precision floating-point literal.

### Usage

Single-precision numbers occupy one word and must be word aligned to be used in arithmetic operations. DCFS inserts up to three bytes of padding before the first defined number, if necessary to achieve four-byte alignment.

Use DCFSU if you do not require alignment.

The range for single-precision values is:

- Maximum 3.40282347e+38.
- Minimum 1.17549435e-38.

### Examples

```
DCFS      1E3,-4E-9
DCFSU    1.0,-.1,3.1E6
```

### Related information

[DCFD and DCFDU](#) on page 1440

[Syntax of floating-point literals](#) on page 230

## 22.20 DCI

The DCI directive allocates memory that is aligned and defines the initial runtime contents of the memory.

In A32 code, it allocates one or more words of memory, aligned on four-byte boundaries.

In T32 code, it allocates one or more halfwords of memory, aligned on two-byte boundaries.

### Syntax

```
{label} DCI.W expr,{expr}
```

where:

#### **expr**

is a numeric expression.

#### **.W**

if present, indicates that four bytes must be inserted in T32 code.

## Usage

The `DCI` directive is very like the `DCD` or `DCW` directives, but the location is marked as code instead of data. Use `DCI` when writing macros for new instructions not supported by the version of the assembler you are using.

In A32 code, `DCI` inserts up to three bytes of padding before the first defined word, if necessary, to achieve four-byte alignment. In T32 code, `DCI` inserts an initial byte of padding, if necessary, to achieve two-byte alignment.

You can use `DCI` to insert a bit pattern into the instruction stream. For example, use:

```
DCI 0x46c0
```

to insert the T32 operation `MOV r8,r8`.

## Example macro

```
MACRO      ; this macro translates newinstr Rd,Rm
           ; to the appropriate machine code
newinst    $Rd,$Rm
DCI        0xe16f0f10 :OR: ($Rd:SHL:12) :OR: $Rm
MEND
```

## 32-bit T32 example

```
DCI.W 0xf3af8000 ; inserts 32-bit NOP, 2-byte aligned.
```

## Related information

[Numeric expressions](#) on page 228

[DCD and DCDU](#) on page 1439

[DCW and DCWU](#) on page 1444

## 22.21 DCQ and DCQU

The `DCQ` directive allocates one or more eight-byte blocks of memory, aligned on four-byte boundaries, and defines the initial runtime contents of the memory. `DCQU` is the same, except that the memory alignment is arbitrary.

### Syntax

```
{label} DCQU -literal,{literal...}
{label} DCQU expr,{expr...}
```

where:

#### **literal**

is a 64-bit numeric literal.

The range of numbers permitted is 0 to  $2^{64}-1$ .

In addition to the characters normally permitted in a numeric literal, you can prefix *literal* with a minus sign. In this case, the range of numbers permitted is  $-2^{63}$  to -1.

The result of specifying  $-n$  is the same as the result of specifying  $2^{64}-n$ .

#### **expr**

is either:

- A numeric expression.
- A PC-relative expression.



armasm accepts expressions in DCQ and DCQU directives only when you are assembling for AArch64 targets.

Note

## Usage

DCQ inserts up to three bytes of padding before the first defined eight-byte block, if necessary, to achieve four-byte alignment.

Use DCQU if you do not require alignment.

### Correct example

```
data     AREA      MiscData, DATA, READWRITE
        DCQ       -225,2_101    ; 2_101 means binary 101.
```

### Incorrect example

```
number  EQU      2
        DCQU     number      ; This code assembles for AArch64 targets only.
                           ; For AArch32 targets, DCQ and DCQU only accept
                           ; literals, not expressions.
```

## Related information

[DCB](#) on page 1438

[DCD and DCDU](#) on page 1439

[DCW and DCWU](#) on page 1444

[SPACE or FILL](#) on page 1485

[Numeric expressions](#) on page 228

## 22.22 DCW and DCWU

The `DCW` directive allocates one or more halfwords of memory, aligned on two-byte boundaries, and defines the initial runtime contents of the memory. `DCWU` is the same, except that the memory alignment is arbitrary.

### Syntax

```
{label} DCWU expr, {expr}...
```

where:

#### **expr**

is a numeric expression that evaluates to an integer in the range -32768 to 65535.

### Usage

`DCW` inserts a byte of padding before the first defined halfword if necessary to achieve two-byte alignment.

Use `DCWU` if you do not require alignment.

### Examples

```
data    DCW      -225,2*number ; number must already be defined
        DCWU    number+4
```

### Related information

[DCB](#) on page 1438

[DCD and DCDU](#) on page 1439

[DCQ and DCQU](#) on page 1443

[SPACE or FILL](#) on page 1485

[Numeric expressions](#) on page 228

## 22.23 END

The `END` directive informs the assembler that it has reached the end of a source file.

### Syntax

```
END
```

### Usage

Every assembly language source file must end with `END` on a line by itself.

If the source file has been included in a parent file by a `GET` directive, the assembler returns to the parent file and continues assembly at the first line following the `GET` directive.

If `END` is reached in the top-level source file during the first pass without any errors, the second pass begins.

If `END` is reached in the top-level source file during the second pass, the assembler finishes the assembly and writes the appropriate output.

### Related information

[GET or INCLUDE](#) on page 1462

## 22.24 ENDFUNC or ENDP

The `ENDFUNC` directive marks the end of an AAPCS-conforming function. `ENDP` is a synonym for `ENDFUNC`.

### Related information

[FUNCTION or PROC](#) on page 1459

## 22.25 ENTRY

The `ENTRY` directive declares an entry point to a program.

### Syntax

`ENTRY`

### Usage

A program must have an entry point. You can specify an entry point in the following ways:

- Using the `ENTRY` directive in assembly language source code.
- Providing a `main()` function in C or C++ source code.
- Using the `armlink --entry` command-line option.

You can declare more than one entry point in a program, although a source file cannot contain more than one `ENTRY` directive. For example, a program could contain multiple assembly language source files, each with an `ENTRY` directive. Or it could contain a C or C++ file with a `main()` function and one or more assembly source files with an `ENTRY` directive.

If the program contains multiple entry points, then you must select one of them. You do this by exporting the symbol for the `ENTRY` directive that you want to use as the entry point, then using the `armlink --entry` option to select the exported symbol.

### Example

```
AREA    ARMEx, CODE, READONLY
ENTRY   ; Entry point for the application.
EXPORT ep1 ; Export the symbol so the linker can find it
ep1      ; in the object file.
```

```
; code
END
```

When you invoke `armlink`, if other entry points are declared in the program, then you must specify `--entry=ep1`, to select `ep1`.

## Related information

[Image entry points](#)  
`--entry=location`

## 22.26 EQU

The EQU directive gives a symbolic name to a numeric constant, a register-relative value or a PC-relative value.

### Syntax

```
name EQU expr, {type}
```

where:

**name**

is the symbolic name to assign to the value.

**expr**

is a register-relative address, a PC-relative address, an absolute address, or a 32-bit integer constant.

**type**

is optional. `type` can be any one of:

- ARM.
- THUMB.
- CODE32.
- CODE16.
- DATA.

You can use only if `type` is an absolute address. If `name` is exported, the `name` entry in the symbol table in the object file is marked as ARM, THUMB, CODE32, CODE16, or DATA, according to `type`. This can be used by the linker.

### Usage

Use `EQU` to define constants. This is similar to the use of `#define` to define a constant in C.

\* is a synonym for `EQU`.

## Examples

```
abc EQU 2           ; Assigns the value 2 to the symbol abc.
xyz EQU label+8    ; Assigns the address (label+8) to the
; symbol xyz.
fiq EQU 0x1C, CODE32 ; Assigns the absolute address 0x1C to
; the symbol fiq, and marks it as code.
```

## Related information

[KEEP](#) on page 1468

[EXPORT or GLOBAL](#) on page 1448

## 22.27 EXPORT or GLOBAL

The `EXPORT` directive declares a symbol that can be used by the linker to resolve symbol references in separate object and library files. `GLOBAL` is a synonym for `EXPORT`.

### Syntax

```
EXPORT { [WEAK] }  
  
EXPORT symbol [{SIZE={n}}]  
  
EXPORT symbol [{type ,{set}}]  
  
EXPORT symbol [{attr,{type,{set}}},{SIZE={n}}]  
  
EXPORT symbol [{WEAK ,{attr},{type,{set}}},{SIZE={n}}]
```

where:

***symbol***

is the symbol name to export. The symbol name is case-sensitive. If *symbol* is omitted, all symbols are exported.

***WEAK***

*symbol* is only imported into other sources if no other source exports an alternative *symbol*. If `[WEAK]` is used without *symbol*, all exported symbols are weak.

***attr***

can be any one of:

**DYNAMIC**

sets the ELF symbol visibility to `STV_DEFAULT`.

**PROTECTED**

sets the ELF symbol visibility to `STV_PROTECTED`.

**HIDDEN**

sets the ELF symbol visibility to `STV_HIDDEN`.

**INTERNAL**

sets the ELF symbol visibility to `STV_INTERNAL`.

**type**

specifies the symbol type:

**DATA**

`symbol` is treated as data when the source is assembled and linked.

**CODE**

`symbol` is treated as code when the source is assembled and linked.

**ELFTYPE=n**

`symbol` is treated as a particular ELF symbol, as specified by the value of `n`, where `n` can be any number from 0 to 15.

If unspecified, the assembler determines the most appropriate type. Usually the assembler determines the correct type so you are not required to specify it.

**set**

specifies the instruction set:

**ARM**

`symbol` is treated as an A32 symbol.

**THUMB**

`symbol` is treated as a T32 symbol.

If unspecified, the assembler determines the most appropriate set.

**n**

specifies the size and can be any 32-bit value. If the `SIZE` attribute is not specified, the assembler calculates the size:

- For `PROC` and `FUNCTION` symbols, the size is set to the size of the code until its `ENDP` or `ENDFUNC`.
- For other symbols, the size is the size of instruction or data on the same source line. If there is no instruction or data, the size is zero.

## Usage

Use `EXPORT` to give code in other files access to symbols in the current file.

Use the `[WEAK]` attribute to inform the linker that a different instance of `symbol` takes precedence over this one, if a different one is available from another source. You can use the `[WEAK]` attribute with any of the symbol visibility attributes.

## Examples

```
AREA   Example, CODE, READONLY
EXPORT DoAdd           ; Export the function name
                  ; to be used by external modules.
DoAdd  ADD    r0, r0, r1
```

Symbol visibility can be overridden for duplicate exports. In the following example, the last `EXPORT` takes precedence for both binding and visibility:

```
EXPORT  SymA [WEAK]      ; Export as weak-hidden
EXPORT  SymA [DYNAMIC]   ; SymA becomes non-weak dynamic.
```

The following examples show the use of the `SIZE` attribute:

```
EXPORT symA [SIZE=4]
EXPORT symA [DATA, SIZE=4]
```

## Related information

[IMPORT and EXTERN](#) on page 1465

[ELF for the Arm Architecture](#)

## 22.28 EXPORTAS

The `EXPORTAS` directive enables you to export a symbol from the object file, corresponding to a different symbol in the source file.

### Syntax

```
EXPORTAS symbol1, symbol2
```

where:

#### *symbol1*

is the symbol name in the source file. *symbol1* must have been defined already. It can be any symbol, including an area name, a label, or a constant.

#### *symbol2*

is the symbol name you want to appear in the object file.

The symbol names are case-sensitive.

### Usage

Use `EXPORTAS` to change a symbol in the object file without having to change every instance in the source file.

### Examples

```
AREA data1, DATA      ; Starts a new area data1.
AREA data2, DATA      ; Starts a new area data2.
EXPORTAS data2, data1 ; The section symbol referred to as data2
                      ; appears in the object file string table as data1.
one EQU 2
EXPORTAS one, two     ; The symbol 'two' appears in the object
                      ; file's symbol table with the value 2.
EXPORT one
```

## Related information

[EXPORT or GLOBAL](#) on page 1448

## 22.29 FIELD

The **FIELD** directive describes space within a storage map that has been defined using the **MAP** directive.

### Syntax

```
{label} FIELD expr
```

where:

#### **label**

is an optional label. If specified, **label** is assigned the value of the storage location counter, **{VAR}**. The storage location counter is then incremented by the value of **expr**.

#### **expr**

is an expression that evaluates to the number of bytes to increment the storage counter.

### Usage

If a storage map is set by a **MAP** directive that specifies a **base-register**, the base register is implicit in all labels defined by following **FIELD** directives, until the next **MAP** directive. These register-relative labels can be quoted in load and store instructions.

# is a synonym for **FIELD**.

### Examples

The following example shows how register-relative labels are defined using the **MAP** and **FIELD** directives:

```

MAP      0, r9          ; set {VAR} to the address stored in R9
FIELD    4              ; increment {VAR} by 4 bytes
Lab     FIELD 4         ; set Lab to the address [R9 + 4]
                      ; and then increment {VAR} by 4 bytes
LDR      r0, Lab        ; equivalent to LDR r0,[r9,#4]

```

When using the **MAP** and **FIELD** directives, you must ensure that the values are consistent in both passes. The following example shows a use of **MAP** and **FIELD** that causes inconsistent values for the symbol **x**. In the first pass **sym** is not defined, so **x** is at **04 + R9**. In the second pass, **sym** is defined, so **x** is at **00 + R0**. This example results in an assembly error.

```

MAP 0, r0
if :LNOT: :DEF: sym
  MAP 0, r9
  FIELD 4 ; x  is at 0x04+R9  in first pass
ENDIF
x  FIELD 4 ; x  is at 0x00+R0  in second pass
sym LDR r0, x ; inconsistent values for x results in assembly error

```

## Related information

[MAP](#) on page 1474

[How the assembler works](#) on page 62

[Directives that can be omitted in pass 2 of the assembler](#) on page 63

## 22.30 FRAME ADDRESS

The `FRAME ADDRESS` directive describes how to calculate the canonical frame address for the following instructions.

### Syntax

```
FRAME ADDRESS reg, {offset}
```

where:

**reg**

is the register on which the canonical frame address is to be based. This is SP unless the function uses a separate frame pointer.

**offset**

is the offset of the canonical frame address from `reg`. If `offset` is zero, you can omit it.

### Usage

Use `FRAME ADDRESS` if your code alters which register the canonical frame address is based on, or if it changes the offset of the canonical frame address from the register. You must use `FRAME ADDRESS` immediately after the instruction that changes the calculation of the canonical frame address.

You can only use `FRAME ADDRESS` in functions with `FUNCTION` and `ENDFUNC` OR `PROC` and `ENDP` directives.



If your code uses a single instruction to save registers and alter the stack pointer, you can use `FRAME PUSH` instead of using both `FRAME ADDRESS` and `FRAME SAVE`.

If your code uses a single instruction to load registers and alter the stack pointer, you can use `FRAME POP` instead of using both `FRAME ADDRESS` and `FRAME RESTORE`.

### Example

```
_fn      FUNCTION          ; CFA (Canonical Frame Address) is value
        ; of SP on entry to function
PUSH    {r4,fp,ip,lr,pc}
FRAME PUSH {r4,fp,ip,lr,pc}
SUB    sp,sp,#4           ; CFA offset now changed
FRAME ADDRESS sp,24       ; - so we correct it
ADD    fp,sp,#20
FRAME ADDRESS fp,4        ; New base register
; code using fp to base call-frame on, instead of SP
```

## Related information

[FRAME POP](#) on page 1453

[FRAME PUSH](#) on page 1454

## 22.31 FRAME POP

The `FRAME POP` directive informs the assembler when the callee reloads registers.

### Syntax

The following are alternative syntaxes for `FRAME POP`:

```
FRAME POP {reglist}
```

```
FRAME POP {reglist},n
```

```
FRAME POP n
```

where:

#### ***reglist***

is a list of registers restored to the values they had on entry to the function. There must be at least one register in the list.

#### ***n***

is the number of bytes that the stack pointer moves.

### Usage

`FRAME POP` is equivalent to a `FRAME ADDRESS` and a `FRAME RESTORE` directive. You can use it when a single instruction loads registers and alters the stack pointer.

You must use `FRAME POP` immediately after the instruction it refers to.

You can only use it within functions with `FUNCTION` and `ENDFUNC` or `PROC` and `ENDP` directives. You do not have to do this after the last instruction in a function.

If *n* is not specified or is zero, the assembler calculates the new offset for the canonical frame address from *reglist*. It assumes that:

- Each AArch32 register popped occupies four bytes on the stack.
- Each VFP single-precision register popped occupies four bytes on the stack, plus an extra four-byte word for each list.
- Each VFP double-precision register popped occupies eight bytes on the stack, plus an extra four-byte word for each list.

## Related information

[FRAME ADDRESS](#) on page 1452

[FRAME RESTORE](#) on page 1455

## 22.32 FRAME PUSH

The `FRAME PUSH` directive informs the assembler when the callee saves registers, normally at function entry.

### Syntax

The following are alternative syntaxes for `FRAME PUSH`:

```
FRAME PUSH {reglist}
```

```
FRAME PUSH {reglist},n
```

```
FRAME PUSH n
```

where:

#### ***reglist***

is a list of registers stored consecutively below the canonical frame address. There must be at least one register in the list.

#### ***n***

is the number of bytes that the stack pointer moves.

### Usage

`FRAME PUSH` is equivalent to a `FRAME ADDRESS` and a `FRAME SAVE` directive. You can use it when a single instruction saves registers and alters the stack pointer.

You must use `FRAME PUSH` immediately after the instruction it refers to.

You can only use it within functions with `FUNCTION` and `ENDFUNC` OR `PROC` and `ENDP` directives.

If *n* is not specified or is zero, the assembler calculates the new offset for the canonical frame address from *reglist*. It assumes that:

- Each AArch32 register pushed occupies four bytes on the stack.
- Each VFP single-precision register pushed occupies four bytes on the stack, plus an extra four-byte word for each list.
- Each VFP double-precision register popped occupies eight bytes on the stack, plus an extra four-byte word for each list.

### Example

```
p    PROC ; Canonical frame address is SP + 0
      EXPORT p
      PUSH   {r4-r6,lr}
              ; SP has moved relative to the canonical frame address,
              ; and registers R4, R5, R6 and LR are now on the stack
      FRAME PUSH {r4-r6,lr}
              ; Equivalent to:
              ; FRAME ADDRESS    sp,16       ; 16 bytes in {R4-R6,LR}
```

```
; FRAME SAVE {r4-r6,lr},-16
```

## Related information

[FRAME ADDRESS](#) on page 1452

[FRAME SAVE](#) on page 1456

## 22.33 FRAME REGISTER

The `FRAME REGISTER` directive maintains a record of the locations of function arguments held in registers.

### Syntax

```
FRAME REGISTER reg1, reg2
```

where:

***reg1***

is the register that held the argument on entry to the function.

***reg2***

is the register in which the value is preserved.

### Usage

Use the `FRAME REGISTER` directive when you use a register to preserve an argument that was held in a different register on entry to a function.

You can only use it within functions with `FUNCTION` and `ENDFUNC` or `PROC` and `ENDP` directives.

## 22.34 FRAME RESTORE

The `FRAME RESTORE` directive informs the assembler that the contents of specified registers have been restored to the values they had on entry to the function.

### Syntax

```
FRAME RESTORE {reglist}
```

where:

***reglist***

is a list of registers whose contents have been restored. There must be at least one register in the list.

## Usage

You can only use `FRAME RESTORE` within functions with `FUNCTION` and `ENDFUNC` or `PROC` and `ENDP` directives. Use it immediately after the callee reloads registers from the stack. You do not have to do this after the last instruction in a function.

*reglist* can contain integer registers or floating-point registers, but not both.



If your code uses a single instruction to load registers and alter the stack pointer, you can use `FRAME POP` instead of using both `FRAME RESTORE` and `FRAME ADDRESS`.

## Related information

[FRAME POP](#) on page 1453

## 22.35 FRAME RETURN ADDRESS

The `FRAME RETURN ADDRESS` directive provides for functions that use a register other than LR for their return address.

### Syntax

```
FRAME RETURN ADDRESS reg
```

where:

**reg**

is the register used for the return address.

### Usage

Use the `FRAME RETURN ADDRESS` directive in any function that does not use LR for its return address. Otherwise, a debugger cannot backtrace through the function.

You can only use `FRAME RETURN ADDRESS` within functions with `FUNCTION` and `ENDFUNC` or `PROC` and `ENDP` directives. Use it immediately after the `FUNCTION` or `PROC` directive that introduces the function.



Any function that uses a register other than LR for its return address is not AAPCS compliant. Such a function must not be exported.

## 22.36 FRAME SAVE

The `FRAME SAVE` directive describes the location of saved register contents relative to the canonical frame address.

### Syntax

```
FRAME SAVE {reglist}, offset
```

where:

#### `reglist`

is a list of registers stored consecutively starting at `offset` from the canonical frame address.  
There must be at least one register in the list.

### Usage

You can only use `FRAME SAVE` within functions with `FUNCTION` and `ENDFUNC` OR `PROC` and `ENDP` directives.

Use it immediately after the callee stores registers onto the stack.

`reglist` can include registers which are not required for backtracing. The assembler determines which registers it requires to record in the DWARF call frame information.



Note

If your code uses a single instruction to save registers and alter the stack pointer, you can use `FRAME PUSH` instead of using both `FRAME SAVE` and `FRAME ADDRESS`.

### Related information

[FRAME PUSH](#) on page 1454

## 22.37 FRAME STATE REMEMBER

The `FRAME STATE REMEMBER` directive saves the current information on how to calculate the canonical frame address and locations of saved register values.

### Syntax

```
FRAME STATE REMEMBER
```

### Usage

During an inline exit sequence the information about calculation of canonical frame address and locations of saved register values can change. After the exit sequence another branch can continue using the same information as before. Use `FRAME STATE REMEMBER` to preserve this information, and `FRAME STATE RESTORE` to restore it.

These directives can be nested. Each `FRAME STATE RESTORE` directive must have a corresponding `FRAME STATE REMEMBER` directive.

You can only use `FRAME STATE REMEMBER` within functions with `FUNCTION` and `ENDFUNC` or `PROC` and `ENDP` directives.

### Example

```
; function code
FRAME STATE REMEMBER
    ; save frame state before in-line exit sequence
    POP      {r4-r6,pc}
    ; do not have to FRAME POP here, as control has
    ; transferred out of the function
    FRAME STATE RESTORE
        ; end of exit sequence, so restore state
exitB   ; code for exitB
        POP      {r4-r6,pc}
    ENDP
```

### Related information

[FRAME STATE RESTORE](#) on page 1458

[FUNCTION or PROC](#) on page 1459

## 22.38 FRAME STATE RESTORE

The `FRAME STATE RESTORE` directive restores information about how to calculate the canonical frame address and locations of saved register values.

### Syntax

`FRAME STATE RESTORE`

### Usage

You can only use `FRAME STATE RESTORE` within functions with `FUNCTION` and `ENDFUNC` or `PROC` and `ENDP` directives.

### Related information

[FRAME STATE REMEMBER](#) on page 1457

[FUNCTION or PROC](#) on page 1459

## 22.39 FRAME UNWIND ON

The `FRAME UNWIND ON` directive instructs the assembler to produce unwind tables for this and subsequent functions.

### Syntax

`FRAME UNWIND ON`

## Usage

You can use this directive outside functions. In this case, the assembler produces unwind tables for all following functions until it reaches a `FRAME UNWIND OFF` directive.



A `FRAME UNWIND` directive is not sufficient to turn on exception table generation. Furthermore a `FRAME UNWIND` directive, without other `FRAME` directives, is not sufficient information for the assembler to generate the unwind information.

## Related information

- [--exceptions, --no\\_exceptions](#) on page 202
- [--exceptions\\_unwind, --no\\_exceptions\\_unwind](#) on page 202

## 22.40 FRAME UNWIND OFF

The `FRAME UNWIND OFF` directive instructs the assembler to produce no unwind tables for this and subsequent functions.

### Syntax

```
FRAME UNWIND OFF
```

### Usage

You can use this directive outside functions. In this case, the assembler produces no unwind tables for all following functions until it reaches a `FRAME UNWIND ON` directive.

### Related information

- [--exceptions, --no\\_exceptions](#) on page 202
- [--exceptions\\_unwind, --no\\_exceptions\\_unwind](#) on page 202

## 22.41 FUNCTION or PROC

The `FUNCTION` directive marks the start of a function. `PROC` is a synonym for `FUNCTION`.

### Syntax

```
label FUNCTION [{reglist1} [, {reglist2}]]
```

where:

#### **reglist1**

is an optional list of callee-saved AArch32 registers. If `reglist1` is not present, and your debugger checks register usage, it assumes that the AAPCS is in use. If you use empty brackets, this informs the debugger that all AArch32 registers are caller-saved.

***reglist2***

is an optional list of callee-saved VFP registers. If you use empty brackets, this informs the debugger that all VFP registers are caller-saved.

**Usage**

Use `FUNCTION` to mark the start of functions. The assembler uses `FUNCTION` to identify the start of a function when producing DWARF call frame information for ELF.

`FUNCTION` sets the canonical frame address to be R13 (SP), and the frame state stack to be empty.

Each `FUNCTION` directive must have a matching `ENDFUNC` directive. You must not nest `FUNCTION` and `ENDFUNC` pairs, and they must not contain `PROC` or `ENDP` directives.

You can use the optional parameters to inform the debugger about an alternative procedure call standard, if you are using your own. Not all debuggers support this feature. See your debugger documentation for details.

If you specify an empty , using {}, this indicates that all registers for the function are caller-saved. Typically you do this when writing a reset vector where the values in all registers are unknown on execution. This avoids problems in a debugger if it tries to construct a backtrace from the values in the registers.



Note

`FUNCTION` does not automatically cause alignment to a word boundary (or halfword boundary for T32). Use `ALIGN` if necessary to ensure alignment, otherwise the call frame might not point to the start of the function.

**Examples**

```

dadd    ALIGN      ; Ensures alignment.
        FUNCTION   ; Without the ALIGN directive this might not be word-aligned.
        EXPORT    dadd
        PUSH     {r4-r6,lr}    ; This line automatically word-aligned.
        FRAME PUSH {r4-r6,lr}
        ; subroutine body
        POP      {r4-r6,pc}
        ENDFUNC
func6   PROC {r4-r8,r12},{D1-D3} ; Non-AAPCS-conforming function.
        ...
        ENDP
func7   FUNCTION {}   ; Another non-AAPCS-conforming function.
        ...
        ENDFUNC

```

**Related information**

[FRAME STATE RESTORE](#) on page 1458

[FRAME ADDRESS](#) on page 1452

[ALIGN](#) on page 1426

## 22.42 GBLA, GBLL, and GBLS

The `GBLA`, `GBLL`, and `GBLS` directives declare and initialize global variables.

### Syntax

`gblx variable`

where:

**`gblx`**

is one of `GBLA`, `GBLL`, or `GBLS`.

**`variable`**

is the name of the variable. `variable` must be unique among symbols within a source file.

### Usage

The `GBLA` directive declares a global arithmetic variable, and initializes its value to 0.

The `GBLL` directive declares a global logical variable, and initializes its value to `{FALSE}`.

The `GBLS` directive declares a global string variable and initializes its value to a null string, `""`.

Using one of these directives for a variable that is already defined re-initializes the variable.

The scope of the variable is limited to the source file that contains it.

Set the value of the variable with a `SETA`, `SETL`, or `SETS` directive.

Global variables can also be set with the `--predefine` assembler command-line option.

### Examples

The following example declares a variable `objectsize`, sets the value of `objectsize` to FF, and then uses it later in a `SPACE` directive:

```
objectsize    GBLA    objectsize      ; declare the variable name
              SETA    0xFF       ; set its value
              .
              .
              .
              SPACE   objectsize   ; quote the variable
```

The following example shows how to declare and set a variable when you invoke `armasm`. Use this when you want to set the value of a variable at assembly time. `--pd` is a synonym for `--predefine`.

```
armasm --cpu=8-A.32 --predefine "objectsize SETA 0xFF" sourcefile -o objectfile
```

### Related information

[LCLA, LCLL, and LCLS](#) on page 1469

[SETA, SETL, and SETS](#) on page 1483

--predefine "directive" on page 211

## 22.43 GET or INCLUDE

The `GET` directive includes a file within the file being assembled. The included file is assembled at the location of the `GET` directive. `INCLUDE` is a synonym for `GET`.

### Syntax

`GET filename`

where:

***filename***

is the name of the file to be included in the assembly. The assembler accepts pathnames in either UNIX or MS-DOS format.

### Usage

`GET` is useful for including macro definitions, `EQU`, and storage maps in an assembly. When assembly of the included file is complete, assembly continues at the line following the `GET` directive.

By default the assembler searches the current place for included files. The current place is the directory where the calling file is located. Use the `-i` assembler command line option to add directories to the search path. File names and directory names containing spaces must not be enclosed in double quotes ("").

The included file can contain additional `GET` directives to include other files.

If the included file is in a different directory from the current place, this becomes the current place until the end of the included file. The previous current place is then restored.

You cannot use `GET` to include object files.

### Examples

```
AREA Example, CODE, READONLY
GET file1.s ; includes file1 if it exists in the current
place
GET c:\project\file2.s ; includes file2
GET c:\Program files\file3.s ; space is permitted
```

### Related information

[INCBIN](#) on page 1467

[About assembly control directives](#) on page 1423

## 22.44 IF, ELSE, ENDIF, and ELIF

The `IF`, `ELSE`, `ENDIF`, and `ELIF` directives allow you to conditionally assemble sequences of instructions and directives.

### Syntax

```
IF logical-expression
    ...
{ELSE
    ...
ENDIF
```

where:

***logical-expression***

is an expression that evaluates to either `{TRUE}` or `{FALSE}`.

### Usage

Use `IF` with `ENDIF`, and optionally with `ELSE`, for sequences of instructions or directives that are only to be assembled or acted on under a specified condition.

`IF...ENDIF` conditions can be nested.

The `IF` directive introduces a condition that controls whether to assemble a sequence of instructions and directives. `[` is a synonym for `IF`.

The `ELSE` directive marks the beginning of a sequence of instructions or directives that you want to be assembled if the preceding condition fails. `|` is a synonym for `ELSE`.

The `ENDIF` directive marks the end of a sequence of instructions or directives that you want to be conditionally assembled. `]` is a synonym for `ENDIF`.

The `ELIF` directive creates a structure equivalent to `ELSE IF`, without the requirement for nesting or repeating the condition.

### Using ELIF

Without using `ELIF`, you can construct a nested set of conditional instructions like this:

```
IF logical-expression
    instructions
ELSE
    IF logical-expression2
        instructions
    ELSE
        IF logical-expression3
            instructions
        ENDIF
    ENDIF
ENDIF
```

A nested structure like this can be nested up to 256 levels deep.

You can write the same structure more simply using **ELIF**:

```
IF logical-expression
    instructions
ELIF logical-expression2
    instructions
ELIF logical-expression3
    instructions
ENDIF
```

This structure only adds one to the current nesting depth, for the **IF...ENDIF** pair.

## Examples

The following example assembles the first set of instructions if **NEWVERSION** is defined, or the alternative set otherwise:

### Assembly conditional on a variable being defined

```
IF :DEF:NEWVERSION
    ; first set of instructions or directives
ELSE
    ; alternative set of instructions or directives
ENDIF
```

Invoking **armasm** as follows defines **NEWVERSION**, so the first set of instructions and directives are assembled:

```
armasm --cpu=8-A.32 --predefine "NEWVERSION SETL {TRUE}" test.s
```

Invoking **armasm** as follows leaves **NEWVERSION** undefined, so the second set of instructions and directives are assembled:

```
armasm --cpu=8-A.32 test.s
```

The following example assembles the first set of instructions if **NEWVERSION** has the value **{TRUE}**, or the alternative set otherwise:

### Assembly conditional on a variable value

```
IF NEWVERSION = {TRUE}
    ; first set of instructions or directives
ELSE
    ; alternative set of instructions or directives
ENDIF
```

Invoking **armasm** as follows causes the first set of instructions and directives to be assembled:

```
armasm --cpu=8-A.32 --predefine "NEWVERSION SETL {TRUE}" test.s
```

Invoking **armasm** as follows causes the second set of instructions and directives to be assembled:

```
armasm --cpu=8-A.32 --predefine "NEWVERSION SETL {FALSE}" test.s
```

## Related information

[Relational operators](#) on page 236

[About assembly control directives](#) on page 1423

## 22.45 IMPORT and EXTERN

The `IMPORT` and `EXTERN` directives provide the assembler with a name that is not defined in the current assembly.

### Syntax

```
{directive symbol} {[SIZE={<>n>}]}  
{directive symbol} {[type]}  
{directive symbol} [attr{},type{}{,SIZE=n}]  
directive symbol [WEAK {,attr {}{,type{}{,SIZE=n}}}]
```

where:

**directive**

can be either:

**IMPORT**

imports the symbol unconditionally.

**EXTERN**

imports the symbol only if it is referred to in the current assembly.

**symbol**

is a symbol name defined in a separately assembled source file, object file, or library. The symbol name is case-sensitive.

**WEAK**

prevents the linker generating an error message if the symbol is not defined elsewhere. It also prevents the linker searching libraries that are not already included.

**attr**

can be any one of:

**DYNAMIC**

sets the ELF symbol visibility to `STV_DEFAULT`.

**PROTECTED**

sets the ELF symbol visibility to `STV_PROTECTED`.

**HIDDEN**

sets the ELF symbol visibility to `STV_HIDDEN`.

**INTERNAL**

sets the ELF symbol visibility to `STV_INTERNAL`.

**type**

specifies the symbol type:

**DATA**

`symbol` is treated as data when the source is assembled and linked.

**CODE**

`symbol` is treated as code when the source is assembled and linked.

**ELFTYPE=n**

`symbol` is treated as a particular ELF symbol, as specified by the value of `n`, where `n` can be any number from 0 to 15.

If unspecified, the linker determines the most appropriate type.

**n**

specifies the size and can be any 32-bit value. If the `SIZE` attribute is not specified, the assembler calculates the size:

- For `PROC` and `FUNCTION` symbols, the size is set to the size of the code until its `ENDP` or `ENDFUNC`.
- For other symbols, the size is the size of instruction or data on the same source line. If there is no instruction or data, the size is zero.

## Usage

The name is resolved at link time to a symbol defined in a separate object file. The symbol is treated as a program address. If `[WEAK]` is not specified, the linker generates an error if no corresponding symbol is found at link time.

If `[WEAK]` is specified and no corresponding symbol is found at link time:

- If the reference is the destination of a `B` or `BL` instruction, the value of the symbol is taken as the address of the following instruction. This makes the `B` or `BL` instruction effectively a `NOP`.
- Otherwise, the value of the symbol is taken as zero.

## Example

The example tests to see if the C++ library has been linked, and branches conditionally on the result.

```

AREA   Example, CODE, READONLY
EXTERN  __CPP_INITIALIZE[WEAK]    ; If C++ library linked, gets the
                                  ; address of __CPP_INITIALIZE
                                  ; function.
LDR    r0,=__CPP_INITIALIZE      ; If not linked, address is zeroed.
CMP    r0,#0                     ; Test if zero.
BEQ    nocplusplus              ; Branch on the result.

```

The following examples show the use of the `SIZE` attribute:

```
EXTERN symA [SIZE=4]
EXTERN symA [DATA, SIZE=4]
```

## Related information

[EXPORT or GLOBAL](#) on page 1448

[ELF for the Arm Architecture](#)

## 22.46 INCBIN

The `INCBIN` directive includes a file within the file being assembled. The file is included as it is, without being assembled.

### Syntax

```
INCBIN filename
```

where:

#### ***filename***

is the name of the file to be included in the assembly. The assembler accepts pathnames in either UNIX or MS-DOS format.

### Usage

You can use `INCBIN` to include data, such as executable files, literals, or any arbitrary data. The contents of the file are added to the current ELF section, byte for byte, without being interpreted in any way. Assembly continues at the line following the `INCBIN` directive.

By default, the assembler searches the current place for included files. The current place is the directory where the calling file is located. Use the `-i` assembler command-line option to add directories to the search path. File names and directory names containing spaces must not be enclosed in double quotes ("").

### Example

```
AREA Example, CODE, READONLY
INCBIN file1.dat           ; Includes file1 if it exists in the current
place
INCBIN c:\project\file2.txt ; Includes file2.
```

## 22.47 INFO

The `INFO` directive supports diagnostic generation on either pass of the assembly.

### Syntax

```
INFO numeric-expression, string-expression [, severity]
```

where:

#### ***numeric-expression***

is a numeric expression that is evaluated during assembly. If the expression evaluates to zero:

- No action is taken during pass one.
- *string-expression* is printed as a warning during pass two if *severity* is 1.
- *string-expression* is printed as a message during pass two if *severity* is 0 or not specified.

If the expression does not evaluate to zero:

*string-expression* is printed as an error message and the assembly fails irrespective of whether *severity* is specified or not (non-zero values for *severity* are reserved in this case).

#### ***string-expression***

is an expression that evaluates to a string.

#### ***severity***

is an optional number that controls the severity of the message. Its value can be either 0 or 1. All other values are reserved.

### Usage

`INFO` provides a flexible means of creating custom error messages.

`!` is very similar to `INFO`, but has less detailed reporting.

### Examples

```
INFO    0, "Version 1.0"
IF endofdata <= label1
    INFO    4, "Data overrun at label1"
ENDIF
```

### Related information

[ASSERT](#) on page 1433

[String expressions](#) on page 227

[Numeric expressions](#) on page 228

## 22.48 KEEP

The `KEEP` directive instructs the assembler to retain named local labels in the symbol table in the object file.

### Syntax

```
KEEP {label}
```

where:

**label**

is the name of the local label to keep. If `label` is not specified, all named local labels are kept except register-relative labels.

### Usage

By default, the only labels that the assembler describes in its output object file are:

- Exported labels.
- Labels that are relocated against.

Use `KEEP` to preserve local labels. This can help when debugging. Kept labels appear in the Arm® debuggers and in linker map files.

`KEEP` cannot preserve register-relative labels or numeric local labels.

### Example

```
label    ADC      r2,r3,r4
        KEEP     label      ; makes label available to debuggers
        ADD      r2,r2,r5
```

### Related information

[MAP](#) on page 1474

[Numeric local labels](#) on page 225

## 22.49 LCLA, LCLL, and LCLS

The `LCLA`, `LCLL`, and `LCLS` directives declare and initialize local variables.

### Syntax

```
lclx variable
```

where:

**lclx**

is one of `LCLA`, `LCLL`, or `LCLS`.

**variable**

is the name of the variable. *variable* must be unique within the macro that contains it.

**Usage**

The `LCIA` directive declares a local arithmetic variable, and initializes its value to 0.

The `LCIL` directive declares a local logical variable, and initializes its value to `{FALSE}`.

The `LCIS` directive declares a local string variable, and initializes its value to a null string, `" "`.

Using one of these directives for a variable that is already defined re-initializes the variable.

The scope of the variable is limited to a particular instantiation of the macro that contains it.

Set the value of the variable with a `SETA`, `SETL`, or `SETS` directive.

**Example**

```

$label MACRO
      message $a          ; Declare a macro
      LCIS    err           ; Macro prototype line
      ; Declare local string
      ; variable err.
err   SETS    "error no: " ; Set value of err
$label ; code
      INFO    0, "err":CC::STR:$a ; Use string
      MEND

```

**Related information**

[GBLA](#), [G BLL](#), and [GBLS](#) on page 1460

[SETA](#), [SETL](#), and [SETS](#) on page 1483

## 22.50 LTORG

The `LTORG` directive instructs the assembler to assemble the current literal pool immediately.

**Syntax**

`LTORG`

**Usage**

The assembler assembles the current literal pool at the end of every code section. The end of a code section is determined by the `AREA` directive at the beginning of the following section, or the end of the assembly.

These default literal pools can sometimes be out of range of some `LDR`, `VLDR`, and `WLDR` pseudo-instructions. Use `LTORG` to ensure that a literal pool is assembled within range.

Large programs can require several literal pools. Place `LTORG` directives after unconditional branches or subroutine return instructions so that the processor does not attempt to execute the constants as instructions.

The assembler word-aligns data in literal pools.

### Example

```

start  AREA   Example, CODE, READONLY
       BL     func1           ; function body
func1
       ; code
       LDR   r1,=0x55555555  ; => LDR R1, [pc, #offset to Literal Pool 1]
       ; code
       MOV   pc,lr            ; end function
       LTORG
       SPACE 4200            ; Literal Pool 1 contains literal &55555555.
data   ; Clears 4200 bytes of memory starting at current
       location.
       END                ; Default literal pool is empty.

```

### Related information

[LDR pseudo-instruction \(A32\)](#) on page 319

[VLDR pseudo-instruction \(A32\)](#) on page 554

## 22.51 MACRO and MEND

The `MACRO` directive marks the start of the definition of a macro. Macro expansion terminates at the `MEND` directive.

### Syntax

These two directives define a macro. The syntax is:

```

MACRO
{$label}  macroname{ $cond} { $parameter{ ,$parameter}...}
       ; code
MEND

```

where:

**\$label**

is a parameter that is substituted with a symbol given when the macro is invoked. The symbol is usually a label.

**macroname**

is the name of the macro. It must not begin with an instruction or directive name.

**\$cond**

is a special parameter designed to contain a condition code. Values other than valid condition codes are permitted.

**\$parameter**

is a parameter that is substituted when the macro is invoked. A default value for a parameter can be set using this format:

```
$ parameter="default value"
```

Double quotes must be used if there are any spaces within, or at either end of, the default value.

**Usage**

If you start any `WHILE...WEND` loops or `IF...ENDIF` conditions within a macro, they must be closed before the `MEND` directive is reached. You can use `MEXIT` to enable an early exit from a macro, for example, from within a loop.

Within the macro body, parameters such as `$label`, `$parameter` or `$cond` can be used in the same way as other variables. They are given new values each time the macro is invoked. Parameters must begin with `$` to distinguish them from ordinary symbols. Any number of parameters can be used.

`$label` is optional. It is useful if the macro defines internal labels. It is treated as a parameter to the macro. It does not necessarily represent the first instruction in the macro expansion. The macro defines the locations of any labels.

Use `|` as the argument to use the default value of a parameter. An empty string is used if the argument is omitted.

In a macro that uses several internal labels, it is useful to define each internal label as the base label with a different suffix.

Use a dot between a parameter and following text, or a following parameter, if a space is not required in the expansion. Do not use a dot between preceding text and a parameter.

You can use the `$cond` parameter for condition codes. Use the unary operator `:REVERSE_cc:` to find the inverse condition code, and `:cc_ENCODING:` to find the 4-bit encoding of the condition code.

Macros define the scope of local variables.

Macros can be nested.

**Examples**

A macro that uses internal labels to implement loops:

```
; macro definition
$label      MACRO          ; start macro definition
            xmac    $p1,$p2
$label.loop1   ; code
              ; code
              ; code
$label.loop2   BGE    $label.loop1
              ; code
              BL     $p1
              BGT    $label.loop2
```

```

; code
ADR      $p2
; code
MEND          ; end macro definition

; macro invocation
abc      xmac    subr1,de    ; invoke macro
; code
; code
; code
; code
BGE     abcloop1   ; this is what is
; is produced when
; the xmac macro is
; expanded
abcloop2
; code
BL      subr1
BGT     abcloop2
; code
ADR      de
; code

```

A macro that produces assembly-time diagnostics:

```

MACRO
diagnose $param1="default" ; Macro definition
INFO    0,"$param1"        ; This macro produces
MEND          ; assembly-time diagnostics
; (on second assembly pass)

; macro expansion
diagnose           ; Prints blank line at assembly-time
diagnose "hello"    ; Prints "hello" at assembly-time
diagnose |         ; Prints "default" at assembly-time

```

When variables are being passed in as arguments, use of `|` might leave some variables unsubstituted. To work around this, define the `|` in a `LCLS` or `GBLS` variable and pass this variable as an argument instead of `|`. For example:

```

MACRO
m2 $a,$b=r1,$c      ; Macro definition
add $a,$b,$c        ; The default value for $b is r1
; The macro adds $b and $c and puts result in $a.
MEND          ; Macro end

MACRO
m1 $a,$b      ; Macro definition
; This macro adds $b to r1 and puts result in $a.
LCLS def
; Declare a local string variable for |
def      SETS "|"
; Define |
m2 $a,$def,$b    ; Invoke macro m2 with $def instead of |
; to use the default value for the second argument.
; Macro end

```

A macro that uses a condition code parameter:

```

AREA    codx, CODE, READONLY
; macro definition
MACRO
Return$cond
[ {ARCHITECTURE} <> "4"
BX$cond lr
|
MOV$cond pc,lr
]
MEND

; macro invocation
fun      PROC
CMP      r0,#0
MOVEQ   r0,#1
ReturnEQ
MOV      r0,#0

```

```
Return
ENDP
END
```

## Related information

[MEXIT](#) on page 1475

[GBLA, GBLL, and GBLS](#) on page 1460

[LCLA, LCLL, and LCLS](#) on page 1469

[Use of macros](#) on page 124

[Assembly time substitution of variables](#) on page 221

## 22.52 MAP

The `MAP` directive sets the origin of a storage map to a specified address.

### Syntax

```
MAP expr { ,base-register}
```

where:

#### ***expr***

is a numeric or PC-relative expression:

- If *base-register* is not specified, *expr* evaluates to the address where the storage map starts. The storage map location counter is set to this address.
- If *expr* is PC-relative, you must have defined the label before you use it in the map. The map requires the definition of the label during the first pass of the assembler.

#### ***base-register***

specifies a register. If *base-register* is specified, the address where the storage map starts is the sum of *expr*, and the value in *base-register* at runtime.

### Usage

Use the `MAP` directive in combination with the `FIELD` directive to describe a storage map.

Specify *base-register* to define register-relative labels. The base register becomes implicit in all labels defined by following `FIELD` directives, until the next `MAP` directive. The register-relative labels can be used in load and store instructions.

The `MAP` directive can be used any number of times to define multiple storage maps.

The storage-map location counter, `{VAR}`, is set to the same address as that specified by the `MAP` directive. The `{VAR}` counter is set to zero before the first `MAP` directive is used.

`^` is a synonym for `MAP`.

## Examples

```
MAP      0,r9
MAP      0xff,r9
```

## Related information

[FIELD](#) on page 1451

[How the assembler works](#) on page 62

[Directives that can be omitted in pass 2 of the assembler](#) on page 63

## 22.53 MEXIT

The `MEXIT` directive exits a macro definition before the end.

### Usage

Use `MEXIT` when you require an exit from within the body of a macro. Any unclosed `WHILE...WEND` loops or `IF...ENDIF` conditions within the body of the macro are closed by the assembler before the macro is exited.

### Example

```
MACRO
$abc    example abc      $param1,$param2
; code
WHILE condition1
; code
  IF condition2
; code
    MEXIT
  ELSE
; code
  ENDIF
WEND
; code
MEND
```

## Related information

[MACRO](#) and [MEND](#) on page 1471

## 22.54 NOFP

The `NOFP` directive ensures that there are no floating-point instructions in an assembly language source file.

### Syntax

```
NOFP
```

## Usage

Use `NOFP` to ensure that no floating-point instructions are used in situations where there is no support for floating-point instructions either in software or in target hardware.

If a floating-point instruction occurs after the `NOFP` directive, an `Unknown opcode` error is generated and the assembly fails.

If a `NOFP` directive occurs after a floating-point instruction, the assembler generates the error:

```
Too late to ban floating point instructions
```

and the assembly fails.

## 22.55 OPT

The `OPT` directive sets listing options from within the source code.

### Syntax

`OPT n`

where:

`n`

is the `OPT` directive setting. The following table lists the valid settings:

**Table 22-2: OPT directive settings**

| OPT n | Effect                                                                                     |
|-------|--------------------------------------------------------------------------------------------|
| 1     | Turns on normal listing.                                                                   |
| 2     | Turns off normal listing.                                                                  |
| 4     | Page throw. Issues an immediate form feed and starts a new page.                           |
| 8     | Resets the line number counter to zero.                                                    |
| 16    | Turns on listing for <code>SET</code> , <code>GBL</code> and <code>LCL</code> directives.  |
| 32    | Turns off listing for <code>SET</code> , <code>GBL</code> and <code>LCL</code> directives. |
| 64    | Turns on listing of macro expansions.                                                      |
| 128   | Turns off listing of macro expansions.                                                     |
| 256   | Turns on listing of macro invocations.                                                     |
| 512   | Turns off listing of macro invocations.                                                    |
| 1024  | Turns on the first pass listing.                                                           |
| 2048  | Turns off the first pass listing.                                                          |
| 4096  | Turns on listing of conditional directives.                                                |
| 8192  | Turns off listing of conditional directives.                                               |
| 16384 | Turns on listing of <code>MEND</code> directives.                                          |
| 32768 | Turns off listing of <code>MEND</code> directives.                                         |

## Usage

Specify the `--list=` assembler option to turn on listing.

By default the `--list=` option produces a normal listing that includes variable declarations, macro expansions, call-conditioned directives, and `MEND` directives. The listing is produced on the second pass only. Use the `OPT` directive to modify the default listing options from within your code.

You can use `OPT` to format code listings. For example, you can specify a new page before functions and sections.

## Example

```
start      AREA    Example, CODE, READONLY
          ; code
          ; code
          BL      func1
          ; code
          OPT 4           ; places a page break before func1
func1      ; code
```

## Related information

[-list=file](#) on page 208

## 22.56 QN, DN, and SN

The `QN`, `DN`, and `SN` directives define names for Advanced SIMD and floating-point registers.

### Syntax

`name directive expr. {type} [{x}]`

where:

**directive**

is `QN`, `DN`, or `SN`.

**name**

is the name to be assigned to the extension register. `name` cannot be the same as any of the predefined names.

**expr**

Can be:

- An expression that evaluates to a number in the range:
  - 0-15 if you are using `QN` in A32/T32 Advanced SIMD code.
  - 0-31 otherwise.
- A predefined register name, or a register name that has already been defined in a previous directive.

***type***

is any Advanced SIMD or floating-point datatype.

**[*x*]**

is only available for Advanced SIMD code. [*x*] is a scalar index into a register.

*type* and [*x*] are Extended notation.

**Usage**

Use **QN**, **DN**, or **SN** to allocate convenient names to extension registers, to help you to remember what you use each one for.

The **QN** directive defines a name for a specified 128-bit extension register.

The **DN** directive defines a name for a specified 64-bit extension register.

The **SN** directive defines a name for a specified single-precision floating-point register.



Avoid conflicting uses of the same register under different names.

Note

You cannot specify a vector length in a **DN** or **SN** directive.

**Examples**

```
energy  DN  6    ; defines energy as a symbol for
                  ; floating-point double-precision register 6
mass     SN  16   ; defines mass as a symbol for
                  ; floating-point single-precision register 16
```

**Extended notation examples**

```
varA    DN      d1.U16
varB    DN      d2.U16
varC    DN      d3.U16
index   VADD   varA,varB,varC      ; VADD.U16 d1,d2,d3
result  QN      q5.I32
        VMULL  result,varA,index   ; VMULL.U16 q5,d1,d4[0]
```

**Related information**

[Predeclared core register names in AArch32 state](#) on page 81

[Predeclared extension register names in AArch32 state](#) on page 81

[Advanced SIMD data types in A32/T32 instructions](#) on page 168

[Extended notation extension for Advanced SIMD in A32/T32 code](#) on page 172

## 22.57 RELOC

The `RELOC` directive explicitly encodes an ELF relocation in an object file.

### Syntax

```
RELOC n , symbol
```

```
RELOC n
```

where:

***n***

must be an integer in the range 0 to 255 or one of the relocation names defined in the Application Binary Interface for the Arm® Architecture.

***symbol***

can be any PC-relative label.

### Usage

Use `RELOC`, to create a relocation with respect to the address labeled by *symbol*.

If used immediately after an A32 or T32 instruction, `RELOC` results in a relocation at that instruction. If used immediately after a `DCB`, `DCW`, or `DCD`, or any other data generating directive, `RELOC` results in a relocation at the start of the data. Any addend to be applied must be encoded in the instruction or in the data.

If the assembler has already emitted a relocation at that place, the relocation is updated with the details in the `RELOC` directive, for example:

```
DCD      sym2 ; R_ARM_ABS32 to sym32
RELOC    55     ; ... makes it R_ARM_ABS32 NOI
```

`RELOC` is faulted in all other cases, for example, after any non-data generating directive, `LTOORG`, `ALIGN`, or as the first thing in an `AREA`.

Use `RELOC` to create a relocation with respect to the anonymous symbol, that is, symbol 0 of the symbol table. If you use `RELOC` without a preceding assembler generated relocation, the relocation is with respect to the anonymous symbol.

### Examples

```
IMPORT  impsym
LDR     r0,[pc,#-8]
RELOC   4, impsym
DCD     0
RELOC   2, sym
DCD     0,1,2,3,4      ; the final word is relocated
RELOC   38,sym2        ; R_ARM_TARGET1
DCD     impsym
RELOC   R_ARM_TARGET1 ; relocation code 38
```

## Related information

[Application Binary Interface \(ABI\)](#)

## 22.58 REQUIRE

The `REQUIRE` directive specifies a dependency between sections.

### Syntax

```
REQUIRE label
```

where:

***label***

is the name of the required label.

### Usage

Use `REQUIRE` to ensure that a related section is included, even if it is not directly called. If the section containing the `REQUIRE` directive is included in a link, the linker also includes the section containing the definition of the specified label.

## 22.59 REQUIRE8 and PRESERVE8

The `REQUIRE8` and `PRESERVE8` directives specify that the current file requires or preserves eight-byte alignment of the stack.



This directive is required to support non-ABI conforming toolchains. It has no effect on AArch64 assembly and is not required when targeting AArch64.

### Syntax

```
REQUIRE8 {bool}
```

```
PRESERVE8 {bool}
```

where:

***bool***

is an optional Boolean constant, either `{TRUE}` or `{FALSE}`.

### Usage

Where required, if your code preserves eight-byte alignment of the stack, use `PRESERVE8` to set the `PRES8` build attribute on your file. If your code does not preserve eight-byte alignment of the stack, use `PRESERVE8 {FALSE}` to ensure that the `PRES8` build attribute is not set. Use `REQUIRE8` to set the

`REQ8` build attribute. If there are multiple `REQUIRE8` or `PRESERVE8` directives in a file, the assembler uses the value of the last directive.

The linker checks that any code that requires eight-byte alignment of the stack is only called, directly or indirectly, by code that preserves eight-byte alignment of the stack.

---

If you omit both `PRESERVE8` and `PRESERVE8 {FALSE}`, the assembler decides whether to set the `PRES8` build attribute or not, by examining instructions that modify the SP. Arm recommends that you specify `PRESERVE8` explicitly.



You can enable a warning by using the `--diag_warning 1546` option when invoking `armasm`. This option gives you warnings like such as:

```
"test.s", line 37: Warning: A1546W: Stack pointer update potentially  
breaks 8 byte stack alignment
```

```
37 00000044      STMFD    sp!, {r2,r3,lr}
```

## Examples

```
REQUIRE8
REQUIRE8 {TRUE}      ; equivalent to REQUIRE8
REQUIRE8 {FALSE}     ; equivalent to absence of REQUIRE8
PRESERVE8 {TRUE}    ; equivalent to PRESERVE8
PRESERVE8 {FALSE}   ; NOT exactly equivalent to absence of PRESERVE8
```

## Related information

[--diag\\_warning=tag\[,tag,...\]](#) on page 200

## 22.60 RLIST

The `RLIST` (register list) directive gives a name to a set of general-purpose registers in A32/T32 code.

### Syntax

```
name RLIST {list-of-registers}
```

where:

#### **name**

is the name to be given to the set of registers. `name` cannot be the same as any of the predefined names.

#### **list-of-registers**

is a comma-delimited list of register names and register ranges. The register list must be enclosed in braces.

## Usage

Use `RLIST` to give a name to a set of registers to be transferred by the `LDM` or `STM` instructions.

`LDM` and `STM` always put the lowest physical register numbers at the lowest address in memory, regardless of the order they are supplied to the `LDM` or `STM` instruction. If you have defined your own symbolic register names it can be less apparent that a register list is not in increasing register order.

Use the `--diag_warning 1206` assembler option to ensure that the registers in a register list are supplied in increasing register order. If registers are not supplied in increasing register order, a warning is issued.

## Example

```
Context RLIST {r0-r6,r8,r10-r12,pc}
```

## Related information

[Predeclared core register names in AArch32 state](#) on page 81

[Predeclared extension register names in AArch32 state](#) on page 81

## 22.61 RN

The `RN` directive defines a name for a specified register.

### Syntax

`name RNexpr`

where:

**name**

is the name to be assigned to the register. `name` cannot be the same as any of the predefined names.

**expr**

evaluates to a register number from 0 to 15.

## Usage

Use `RN` to allocate convenient names to registers, to help you to remember what you use each register for. Be careful to avoid conflicting uses of the same register under different names.

## Examples

```
regname    RN  11 ; defines regname for register 11
sqr4      RN  r6 ; defines sqr4 for register 6
```

## Related information

[Predeclared core register names in AArch32 state](#) on page 81

[Predeclared extension register names in AArch32 state](#) on page 81

## 22.62 ROUT

The `ROUT` directive marks the boundaries of the scope of numeric local labels.

### Syntax

```
{name} ROUT
```

where:

***name***

is the name to be assigned to the scope.

### Usage

Use the `ROUT` directive to limit the scope of numeric local labels. This makes it easier for you to avoid referring to a wrong label by accident. The scope of numeric local labels is the whole area if there are no `ROUT` directives in it.

Use the `name` option to ensure that each reference is to the correct numeric local label. If the name of a label or a reference to a label does not match the preceding `ROUT` directive, the assembler generates an error message and the assembly fails.

### Example

```

routineA    ; code
ROUT        ; ROUT is not necessarily a routine
            ; code
3routineA   ; code
            ; this label is checked
            ; code
BEQ         %4routineA ; this reference is checked
            ; code
BGE         %3      ; refers to 3 above, but not checked
            ; code
4routineA   ; code
            ; this label is checked
            ; code
otherstuff  ROUT    ; start of next scope

```

### Related information

[AREA](#) on page 1428

[Numeric local labels](#) on page 225

## 22.63 SETA, SETL, and SETS

The `SETA`, `SETL`, and `SETS` directives set the value of a local or global variable.

### Syntax

```
variable setx expr
```

where:

**variable**

is the name of a variable declared by a `GBLA`, `GBLL`, `GBLS`, `LCLA`, `LCLL`, or `LCLS` directive.

**setx**

is one of `SETA`, `SETL`, or `SETS`.

**expr**

is an expression that is:

- Numeric, for `SETA`.
- Logical, for `SETL`.
- String, for `SETS`.

## Usage

The `SETA` directive sets the value of a local or global arithmetic variable.

The `SETL` directive sets the value of a local or global logical variable.

The `SETS` directive sets the value of a local or global string variable.

You must declare `variable` using a global or local declaration directive before using one of these directives.

You can also predefined variable names on the command line.

## Restrictions

The value you can specify using a `SETA` directive is limited to 32 bits. If you exceed this limit, the assembler reports an error. A possible workaround in A64 code is to use an `EQU` directive instead of `SETA`, although `EQU` defines a constant, whereas `GBLA` and `SETA` define a variable.

For example, replace the following code:

|                        |                   |                                 |
|------------------------|-------------------|---------------------------------|
|                        | <code>GBLA</code> | <code>MyAddress</code>          |
| <code>MyAddress</code> | <code>SETA</code> | <code>0x0000008000000000</code> |

with:

|                        |                  |                                 |
|------------------------|------------------|---------------------------------|
| <code>MyAddress</code> | <code>EQU</code> | <code>0x0000008000000000</code> |
|------------------------|------------------|---------------------------------|

## Examples

|                            |                   |                            |
|----------------------------|-------------------|----------------------------|
| <code>VersionNumber</code> | <code>GBLA</code> | <code>VersionNumber</code> |
|                            | <code>SETA</code> | <code>21</code>            |
|                            | <code>GBLL</code> | <code>Debug</code>         |
| <code>Debug</code>         | <code>SETL</code> | <code>{TRUE}</code>        |
|                            | <code>GBLS</code> | <code>VersionString</code> |
| <code>VersionString</code> | <code>SETS</code> | <code>"Version 1.0"</code> |

## Related information

[GBLA, GBL, and GBLS](#) on page 1460  
[LCLA, LCLL, and LCLS](#) on page 1469  
[--predefine "directive"](#) on page 211  
[String expressions](#) on page 227  
[Numeric expressions](#) on page 228  
[Logical expressions](#) on page 231

## 22.64 SPACE or FILL

The `SPACE` directive reserves a zeroed block of memory. The `FILL` directive reserves a block of memory to fill with a given value.

### Syntax

```
{label} SPACE expr
{label} FILL expr[,value[,valuesize]}
```

where:

**label**

is an optional label.

**expr**

evaluates to the number of bytes to fill or zero.

**value**

evaluates to the value to fill the reserved bytes with. `value` is optional and if omitted, it is 0. `value` must be 0 in a `NOINIT` area.

**valuesize**

is the size, in bytes, of `value`. It can be any of 1, 2, or 4. `valuesize` is optional and if omitted, it is 1.

### Usage

Use the `ALIGN` directive to align any code following a `SPACE` or `FILL` directive.

% is a synonym for `SPACE`.

### Example

```
        AREA      MyData, DATA, READWRITE
data1   SPACE    255      ; defines 255 bytes of zeroed store
data2   FILL     50,0xAB,1 ; defines 50 bytes containing 0xAB
```

## Related information

[ALIGN](#) on page 1426  
[DCB](#) on page 1438

[DCD and DCDU](#) on page 1439  
[DCQ and DCQU](#) on page 1443  
[DCW and DCWU](#) on page 1444  
[Numeric expressions](#) on page 228

## 22.65 THUMB directive

The `THUMB` directive instructs the assembler to interpret subsequent instructions as T32 instructions, using the UAL syntax.



Not supported for AArch64 state.

Note

### Syntax

`THUMB`

### Usage

In files that contain code using different instruction sets, the `THUMB` directive must precede T32 code written in UAL syntax.

If necessary, this directive also inserts one byte of padding to align to the next halfword boundary.

This directive does not assemble to any instructions. It also does not change the state. It only instructs `armasm` to assemble T32 instructions as appropriate, and inserts padding if necessary.

### Example

This example shows how you can use `ARM` and `THUMB` directives to switch state and assemble both A32 and T32 instructions in a single area.

```

        AREA ToT32, CODE, READONLY      ; Name this block of code
        ENTRY                         ; Mark first instruction to execute
        ARM                            ; Subsequent instructions are A32
start
        ADR    r0, into_t32 + 1       ; Processor starts in A32 state
        BX     r0                     ; Inline switch to T32 state
        THUMB                         ; Subsequent instructions are T32
into_t32
        MOVS   r0, #10                ; New-style T32 instructions

```

### Related information

[ARM or CODE32 directive](#) on page 1432  
[CODE16 directive](#) on page 1436

## 22.66 TTL and SUBT

The `TTL` directive inserts a title at the start of each page of a listing file. The `SUBT` directive places a subtitle on the pages of a listing file.

### Syntax

```
TTL title
```

```
SUBT subtitle
```

where:

***title***

is the title.

***subtitle***

is the subtitle.

### Usage

Use the `TTL` directive to place a title at the top of each page of a listing file. If you want the title to appear on the first page, the `TTL` directive must be on the first line of the source file.

Use additional `TTL` directives to change the title. Each new `TTL` directive takes effect from the top of the next page.

Use `SUBT` to place a subtitle at the top of each page of a listing file. Subtitles appear in the line below the titles. If you want the subtitle to appear on the first page, the `SUBT` directive must be on the first line of the source file.

Use additional `SUBT` directives to change subtitles. Each new `SUBT` directive takes effect from the top of the next page.

### Examples

```
TTL First Title ; places title on first and subsequent pages of listing
file.
SUBT First Subtitle ; places subtitle on second and subsequent pages of listing
file.
```

## 22.67 WHILE and WEND

The `WHILE` directive starts a sequence of instructions or directives that are to be assembled repeatedly. The sequence is terminated with a `WEND` directive.

### Syntax

```
WHILE logical-expression
; code
WEND
```

where:

***logical-expression***

is an expression that can evaluate to either {TRUE} or {FALSE}.

## Usage

Use the `WHILE` directive, together with the `WEND` directive, to assemble a sequence of instructions many times. The number of repetitions can be zero.

You can use `IF...ENDIF` conditions within `WHILE...WEND` loops.

`WHILE...WEND` loops can be nested.

## Example

```

count    GBLA count          ; declare local variable
        SETA   1              ; you are not restricted to
        WHILE   count <= 4      ; such simple conditions
        SETA   count+1         ; In this case, this code is
        ; code                 ; executed four times
        ; code
        WEND

```

## Related information

[Logical expressions](#) on page 231

[About assembly control directives](#) on page 1423

## 22.68 WN and XN

The `WN`, and `XN` directives define names for registers in A64 code.

The `WN` directive defines a name for a specified 32-bit register.

The `XN` directive defines a name for a specified 64-bit register.

## Syntax

`namedirective expr`

where:

***name***

is the name to be assigned to the register. `name` cannot be the same as any of the predefined names.

***directive***

is `WN` or `XN`.

***expr***

evaluates to a register number from 0 to 30.

## Usage

Use `WN` and `XN` to allocate convenient names to registers in A64 code, to help you to remember what you use each register for. Be careful to avoid conflicting uses of the same register under different names.

## Examples

```
sqr4      WN w16 ; defines sqr4 for register w16
regname   XN 21  ; defines regname for register x21
```

## Related information

[Predeclared core register names in AArch64 state](#) on page 91

[Predeclared extension register names in AArch64 state](#) on page 92

# 23. Via File Syntax

Describes the syntax of via files accepted by the `armasm`, `armlink`, `fromelf`, and `armar` tools.

## 23.1 Overview of via files

Via files are plain text files that allow you to specify command-line arguments and options for the `armasm`, `armlink`, `fromelf`, and `armar` tools.

Typically, you use a via file to overcome the command-line length limitations. However, you might want to create multiple via files that:

- Group similar arguments and options together.
- Contain different sets of arguments and options to be used in different scenarios.



In general, you can use a via file to specify any command-line option to a tool, including `--via`. Therefore, you can call multiple nested via files from within a via file.

### Via file evaluation

When you invoke the `armasm`, `armlink`, `fromelf`, or `armar`, the tool:

1. Replaces the first specified `--via via_file` argument with the sequence of argument words that are extracted from the via file, including recursively processing any nested `--via` commands in the via file.
2. Processes any subsequent `--via via_file` arguments in the same way, in the order they are presented.

That is, via files are processed in the order that you specify them. Each via file is processed completely, including any nested via files contained in that file, before processing the next via file.

### Related information

[Via file syntax rules](#) on page 1490

[-via=filename](#) on page 216

## 23.2 Via file syntax rules

Via files must conform to some syntax rules.

- A via file is a text file containing a sequence of words. Each word in the text file is converted into an argument string and passed to the tool.

- Words are separated by whitespace, or the end of a line, except in delimited strings, for example:

```
--bigend --reduce_paths (two words)
```

```
--bigend--reduce_paths (one word)
```

- The end of a line is treated as whitespace, for example:

```
--bigend  
--reduce_paths
```

This is equivalent to:

```
--bigend --reduce_paths
```

- Strings enclosed in quotation marks ("), or apostrophes ('') are treated as a single word. Within a quoted word, an apostrophe is treated as an ordinary character. Within an apostrophe delimited word, a quotation mark is treated as an ordinary character.

Use quotation marks to delimit filenames or path names that contain spaces, for example:

```
--errors C:\\My Project\\errors.txt (three words)
```

```
--errors "C:\\My Project\\errors.txt" (two words)
```

Use apostrophes to delimit words that contain quotes, for example:

```
-DNAME='''Arm Compiler''' (one word)
```

- Characters enclosed in parentheses are treated as a single word, for example:

```
--option(x, y, z) (one word)
```

```
--option (x, y, z) (two words)
```

- Within quoted or apostrophe delimited strings, you can use a backslash (\) character to escape the quote, apostrophe, and backslash characters.
- A word that occurs immediately next to a delimited word is treated as a single word, for example:

```
--errors "C:\\Project\\errors.txt"
```

This is treated as the single word:

```
--errorsC:\\Project\\errors.txt
```

- Lines beginning with a semicolon (;) or a hash (#) character as the first nonwhitespace character are comment lines. A semicolon or hash character that appears anywhere else in a line is not treated as the start of a comment, for example:

```
-o objectname.axf ;this is not a comment
```

A comment ends at the end of a line, or at the end of the file. There are no multi-line comments, and there are no part-line comments.

## Related information

[Overview of via files](#) on page 1490

[--via=filename](#) on page 216

# 24. armasm User Guide Changes

Describes the technical changes that have been made to the *armasm User Guide*.

## 24.1 Changes for the armasm User Guide

Changes that have been made to the *armasm User Guide* are listed with the latest version first.

**Table 24-1: Changes between 6.6.5 (revision L) and 6.6.4 (revision K)**

| Change                                                                                                    | Topics affected                                       |
|-----------------------------------------------------------------------------------------------------------|-------------------------------------------------------|
| [SDCOMP-58428] Added notes about build attribute compatibility checking being supported only for AArch32. | <ul style="list-style-type: none"><li>ATTR.</li></ul> |