Skip to content

Text-to-binary processor with its own language

Notifications You must be signed in to change notification settings

efficios/normand

Repository files navigation

Normand

normand logo

normand

Normand is a text-to-binary processor with its own language.

This package offers both a portable Python 3 module and a command-line tool.

Warning
This version of Normand is 0.23, meaning both the Normand language and the module/CLI interface aren’t stable.

Introduction

The purpose of Normand is to consume human-readable text representing bytes and to produce the corresponding binary data.

Example 1. Simple bytes input.

Consider the following Normand input:

4f 55 32 bb $167 fe %10100111 a9 $-32

The generated nine bytes are:

4f 55 32 bb a7 fe a7 a9  e0

As you can see in the last example, the fundamental unit of the Normand language is the byte. The order in which you list bytes will be the order of the generated data.

The Normand language is more than simple lists of bytes, though. Its main features are:

Comments, including a bunch of insignificant symbols which may improve readability

Input:

ff bb %1101:0010 # This is a comment
78 29 af $192 # This too # 99 $-80
fe80::6257:18ff:fea3:4229
60:57:18:a3:42:29
10839636-5d65-4a68-8e6a-21608ddf7258

Output:

ff bb d2 78 29 af c0 99  b0 fe 80 62 57 18 ff fe
a3 42 29 60 57 18 a3 42  29 10 83 96 36 5d 65 4a
68 8e 6a 21 60 8d df 72  58
Hexadecimal, decimal, and binary byte constants

Input:

aa bb $247 $-89 %0011_0010 %11.01= 10/10

Output:

aa bb f7 a7 32 da
Strings

Input:

"hello world!" 00
u16le"stress\nverdict 🤣"
s:latin3{hex(ICITTE)}

Output:

68 65 6c 6c 6f 20 77 6f  72 6c 64 21 00 73 00 74  ┆ hello world!•s•t
00 72 00 65 00 73 00 73  00 0a 00 76 00 65 00 72  ┆ •r•e•s•s•••v•e•r
00 64 00 69 00 63 00 74  00 20 00 3e d8 23 dd 30  ┆ •d•i•c•t• •>•#•0
78 32 66                                          ┆ x2f
Labels: special variables holding the offset where they’re defined
<beg> b2 52 e3 bc 91 05
$100 $50 <chair> 33 9f fe
25 e9 89 8a <end>
Variables
5e 65 {tower = 47} c6 7f f2 c4
44 {hurl = tower - 14} b5 {tower = hurl} 26 2d

The value of a variable assignment is the evaluation of a valid Python 3 expression which may include label and variable names.

Fixed-length number with a given length (8 bits to 64 bits) and byte order

Input:

{strength = 4}
!be 67 <lbl> 44 $178 [(end - lbl) * 8 + strength : 16] $99 <end>
!le [-1993 : 32]
[-3.141593 : 64be]

Output:

67 44 b2 00 2c 63 37 f8  ff ff c0 09 21 fb 82 c2
bd 7f

The encoded number is the evaluation of a valid Python 3 expression which may include label and variable names.

LEB128 integer

Input:

aa bb cc [-1993 : sleb128] <meow> dd ee ff
[meow * 199 : uleb128]

Output:

aa bb cc b7 70 dd ee ff e3 07

The encoded integer is the evaluation of a valid Python 3 expression which may include label and variable names.

Conditional

Input:

aa bb cc

(
  "foo"

  !if {ICITTE > 10}
    "bar"
  !else
    "fight"
  !end
) * 4

Output:

aa bb cc 66 6f 6f 66 69  67 68 74 66 6f 6f 66 69  ┆ •••foofightfoofi
67 68 74 66 6f 6f 62 61  72 66 6f 6f 62 61 72     ┆ ghtfoobarfoobar
Repetition

Input:

aa bb * 5 cc <zoom> "yeah\0" * {zoom * 3}

!repeat 3
  ff ee "juice"
!end

Output:

aa bb bb bb bb bb cc 79  65 61 68 00 79 65 61 68  ┆ •••••••yeah•yeah
00 79 65 61 68 00 79 65  61 68 00 79 65 61 68 00  ┆ •yeah•yeah•yeah•
79 65 61 68 00 79 65 61  68 00 79 65 61 68 00 79  ┆ yeah•yeah•yeah•y
65 61 68 00 79 65 61 68  00 79 65 61 68 00 79 65  ┆ eah•yeah•yeah•ye
61 68 00 79 65 61 68 00  79 65 61 68 00 79 65 61  ┆ ah•yeah•yeah•yea
68 00 79 65 61 68 00 79  65 61 68 00 79 65 61 68  ┆ h•yeah•yeah•yeah
00 79 65 61 68 00 79 65  61 68 00 79 65 61 68 00  ┆ •yeah•yeah•yeah•
ff ee 6a 75 69 63 65 ff  ee 6a 75 69 63 65 ff ee  ┆ ••juice••juice••
6a 75 69 63 65                                    ┆ juice
Alignment

Input:

!be

        [199:32]
@64     [43:64]
@16     [-123:16]
@32~255 [5584:32]

Output:

00 00 00 c7 00 00 00 00  00 00 00 00 00 00 00 2b
ff 85 ff ff 00 00 15 d0
Filling

Input:

!le
[0xdeadbeef:32]
[-1993:16]
[9:16]
+0x40
[ICITTE:8]
"meow mix"
+200~FFh
[ICITTE:8]

Output:

ef be ad de 37 f8 09 00  00 00 00 00 00 00 00 00  ┆ ••••7•••••••••••
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ┆ ••••••••••••••••
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ┆ ••••••••••••••••
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ┆ ••••••••••••••••
40 6d 65 6f 77 20 6d 69  78 ff ff ff ff ff ff ff  ┆ @meow mix•••••••
ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  ┆ ••••••••••••••••
ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  ┆ ••••••••••••••••
ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  ┆ ••••••••••••••••
ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  ┆ ••••••••••••••••
ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  ┆ ••••••••••••••••
ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  ┆ ••••••••••••••••
ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  ┆ ••••••••••••••••
ff ff ff ff ff ff ff ff  c8                       ┆ •••••••••
Transformation

Input:

"end of file @ " [end:8]

!transform gzip
  "this part will be gzipped"
!end

<end>

Output:

65 6e 64 20 6f 66 20 66  69 6c 65 20 40 20 3c 1f  ┆ end of file @ <•
8b 08 00 7b 7b 26 65 02  ff 2b c9 c8 2c 56 28 48  ┆ •••{{&e••+••,V(H
2c 2a 51 28 cf cc c9 51  48 4a 55 48 af ca 2c 28  ┆ ,*Q(•••QHJUH••,(
48 4d 01 00 d4 cc 5b 8a  19 00 00 00              ┆ HM••••[•••••
Multilevel grouping

Input:

ff ((aa bb "zoom" cc) * 5) * 3 $-34 * 4

Output:

ff aa bb 7a 6f 6f 6d cc  aa bb 7a 6f 6f 6d cc aa  ┆ •••zoom•••zoom••
bb 7a 6f 6f 6d cc aa bb  7a 6f 6f 6d cc aa bb 7a  ┆ •zoom•••zoom•••z
6f 6f 6d cc aa bb 7a 6f  6f 6d cc aa bb 7a 6f 6f  ┆ oom•••zoom•••zoo
6d cc aa bb 7a 6f 6f 6d  cc aa bb 7a 6f 6f 6d cc  ┆ m•••zoom•••zoom•
aa bb 7a 6f 6f 6d cc aa  bb 7a 6f 6f 6d cc aa bb  ┆ ••zoom•••zoom•••
7a 6f 6f 6d cc aa bb 7a  6f 6f 6d cc aa bb 7a 6f  ┆ zoom•••zoom•••zo
6f 6d cc aa bb 7a 6f 6f  6d cc de de de de        ┆ om•••zoom•••••
Macros

Input:

!macro hello(world)
  "hello"
  !if world " world" !end
!end

!repeat 17
  ff ff ff ff
  m:hello({ICITTE > 15 and ICITTE < 60})
!end

Output:

ff ff ff ff 68 65 6c 6c  6f ff ff ff ff 68 65 6c  ┆ ••••hello••••hel
6c 6f ff ff ff ff 68 65  6c 6c 6f 20 77 6f 72 6c  ┆ lo••••hello worl
64 ff ff ff ff 68 65 6c  6c 6f 20 77 6f 72 6c 64  ┆ d••••hello world
ff ff ff ff 68 65 6c 6c  6f 20 77 6f 72 6c 64 ff  ┆ ••••hello world•
ff ff ff 68 65 6c 6c 6f  ff ff ff ff 68 65 6c 6c  ┆ •••hello••••hell
6f ff ff ff ff 68 65 6c  6c 6f ff ff ff ff 68 65  ┆ o••••hello••••he
6c 6c 6f ff ff ff ff 68  65 6c 6c 6f ff ff ff ff  ┆ llo••••hello••••
68 65 6c 6c 6f ff ff ff  ff 68 65 6c 6c 6f ff ff  ┆ hello••••hello••
ff ff 68 65 6c 6c 6f ff  ff ff ff 68 65 6c 6c 6f  ┆ ••hello••••hello
ff ff ff ff 68 65 6c 6c  6f ff ff ff ff 68 65 6c  ┆ ••••hello••••hel
6c 6f ff ff ff ff 68 65  6c 6c 6f                 ┆ lo••••hello
Precise error reporting
/tmp/meow.normand:10:24 - Expecting a bit (`0` or `1`).
/tmp/meow.normand:32:6 - Unexpected character `k`.
/tmp/meow.normand:24:19 - Illegal (unknown or unreachable) variable/label name `meow` in expression `(meow - 45) // 8`; the legal names are {`ICITTE`, `mix`, `zoom`}.
/tmp/meow.normand:32:19 - While expanding the macro `meow`:
/tmp/meow.normand:35:5 - While expanding the macro `zzz`:
/tmp/meow.normand:18:9 - Value 315 is outside the 8-bit range when evaluating expression `end - ICITTE`.

You can use Normand to track data source files in your favorite VCS instead of raw binary files. The binary files that Normand generates can be used to test file format decoding, including malformatted data, for example, as well as for education.

See Learn Normand to explore all the Normand features.

Install Normand

Normand requires Python ≥ 3.4.

To install Normand:

$ python3 -m pip install --user normand

See Installing to the User Site to learn more about a user site installation.

Note

Normand has a single module file, normand.py, which you can copy as is to your project to use it (both the normand.parse() function and the command-line tool).

normand.py has no external dependencies, but if you’re using Python 3.4 or Python 3.5, you’ll need a local copy of the standard typing module.

Design goals

The design goals of Normand are:

Portability

We’re making sure normand.py works with Python ≥ 3.4 and doesn’t have any external dependencies so that you may just copy the module as is to your own project.

Ease of use

The most basic Normand input is a sequence of hexadecimal constants (for example, 4e6f726d616e64) which produce exactly what you’d expect.

Most Normand features map to programming language concepts you already know and understand: constant integers, literal strings, variables, conditionals, repetitions/loops, and the rest.

Concise and readable input

We could have chosen XML or YAML as the input format, but having a DSL here makes a Normand input compact and easy to read, two important traits when using Normand to write tests, for example.

Compare the following Normand input and some hypothetical XML equivalent, for example:

Actual Normand input.
ff dd 01 ab $192 $-128 %1101:0011

[end:8]

{iter = 1}

!if {not something}
  # five times because xyz
  !repeat 5
    "hello world " [iter:8]
    {iter = iter + 1}
  !end
!end

<end>
Hypothetical Normand XML input.
<?xml version="1.0" encoding="utf-8" ?>
<group>
  <byte base="x" val="ff" />
  <byte base="x" val="dd" />
  <byte base="x" val="1" />
  <byte base="x" val="ab" />
  <byte base="d" val="192" />
  <byte base="d" val="-128" />
  <byte base="b" val="11010011" />
  <fixed-len-num expr="end" len="8" />
  <var-assign name="iter" expr="1" />
  <cond expr="not something">
    <!-- five times because xyz -->
    <repeat expr="5">
      <str>hello world </str>
      <fixed-len-num expr="iter" len="8" />
      <var-assign name="iter" expr="iter + 1" />
    </repeat>
  </cond>
  <label name="end" />
</group>

Learn Normand

A Normand text input is a sequence of items which represent a sequence of raw bytes.

During the processing of items to data, Normand relies on a current state:

State variable Description Initial value: Python 3 API Initial value: CLI

Current offset

The current offset has an effect on the value of labels and of the special ICITTE name in fixed-length number, LEB128 integer, string, filling, variable assignment, conditional block, repetition block, macro expansion, and post-item repetition expression evaluation.

Each generated byte increments the current offset.

A current offset setting may change the current offset without generating data.

An current offset alignment generates padding bytes to make the current offset satisfy a given alignment.

init_offset parameter of the parse() function.

--offset option.

Current byte order

The current byte order can have an effect on the encoding of fixed-length numbers.

A current byte order setting may change the current byte order.

init_byte_order parameter of the parse() function.

--byte-order option.

Labels

Mapping of label names to integral values.

init_labels parameter of the parse() function.

One or more --label options.

Variables

Mapping of variable names to integral or floating point number values.

init_variables parameter of the parse() function.

One or more --var or --var-str options.

The available items are:

Moreover, you can repeat many items above a constant or variable number of times with the * operator after the item to repeat. This is called a post-item repetition.

A Normand comment may exist pretty much anywhere between tokens.

A comment is anything between two # characters on the same line, or from # until the end of the line. Whitespaces are also considered comments. The following symbols are also considered comments around and between items, as well as between hexadecimal nibbles and binary bits of byte constants:

& , - . / : ; = ? \ _ |

The latter serve to improve readability so that you may write, for example, a MAC address or a UUID as is.

Many items require a constant integer, possibly negative, in which case it may start with - for a negative integer. A positive constant integer is any of:

Decimal

One or mode digits (0 to 9).

Hexadecimal

One of:

  • The 0x or 0X prefix followed with one or more hexadecimal digits (0 to 9, a to f, or A to F).

  • One or more hexadecimal digits followed with the h or H suffix.

Octal

One of:

  • The 0o or 0O prefix followed with one or more octal digits (0 to 7).

  • One or more octal digits followed with the o, O, q, or Q suffix.

Binary

One of:

  • The 0b or 0B prefix followed with one or more bits (0 or 1).

  • One or more bits followed with the b or B suffix.

In general, anything between { and } is a Python 3 expression.

You can test the examples of this section with the normand command-line tool as such:

$ normand file | hexdump -C

where file is the name of a file containing the Normand input.

Byte constant

A byte constant represents one or more constant bytes.

A byte constant is:

Hexadecimal form

Two consecutive hexadecimal digits representing a single byte.

Decimal form

One or more digits after the $ prefix representing a single byte.

Binary form

  1. N % prefixes (at least one).

    The number of % characters is the number of subsequent expected bytes.

  2. N × 8 bits (0 or 1).

Input:

ab cd (3d 8F) CC

Output:

ab cd 3d 8f cc

Input:

$192 %1100/0011 $ -77

Output:

c0 c3 b3

Input:

58f64689-6316-4d55-8a1a-04cada366172
fe80::6257:18ff:fea3:4229

Output:

58 f6 46 89 63 16 4d 55  8a 1a 04 ca da 36 61 72  ┆ X•F•c•MU•••••6ar
fe 80 62 57 18 ff fe a3  42 29                    ┆ ••bW••••B)

Input:

%01110011 %01100001 %01101100 %01110101 %01110100
%%%1101:0010 11111111 #A#11 #B#00 #C#011 #D#1

Output:

73 61 6c 75 74 d2 ff c7  ┆ salut•••

Literal string

A literal string represents the encoded bytes of a literal string using the UTF-8, UTF-16, UTF-32, or Latin-1 to Latin-10 encoding.

The string to encode isn’t implicitly null-terminated: use \0 at the end of the string to add a null character.

A literal string is:

  1. Optional: one of the following encodings instead of the default UTF-8:

    s:u8
    u8

    UTF-8.

    s:u16be
    u16be

    UTF-16BE.

    s:u16le
    u16le

    UTF-16LE.

    s:u32be
    u32be

    UTF-32BE.

    s:u32le
    u32le

    UTF-32LE.

    s:latin1

    ISO/IEC 8859-1.

    s:latin2

    ISO/IEC 8859-2.

    s:latin3

    ISO/IEC 8859-3.

    s:latin4

    ISO/IEC 8859-4.

    s:latin5

    ISO/IEC 8859-9.

    s:latin6

    ISO/IEC 8859-10.

    s:latin7

    ISO/IEC 8859-13.

    s:latin8

    ISO/IEC 8859-14.

    s:latin9

    ISO/IEC 8859-15.

    s:latin10

    ISO/IEC 8859-16.

  2. The " prefix.

  3. A sequence of zero or more characters, possibly containing escape sequences.

    An escape sequence is the \ character followed by one of:

    0

    Null (U+0000)

    a

    Alert (U+0007)

    b

    Backspace (U+0008)

    e

    Escape (U+001B)

    f

    Form feed (U+000C)

    n

    End of line (U+000A)

    r

    Carriage return (U+000D)

    t

    Character tabulation (U+0009)

    v

    Line tabulation (U+000B)

    \

    Reverse solidus (U+005C)

    "

    Quotation mark (U+0022)

  4. The " suffix.

Input:

"coucou tout le monde!"

Output:

63 6f 75 63 6f 75 20 74  6f 75 74 20 6c 65 20 6d  ┆ coucou tout le m
6f 6e 64 65 21                                    ┆ onde!

Input:

u16le"I am not young enough to know everything."

Output:

49 00 20 00 61 00 6d 00  20 00 6e 00 6f 00 74 00  ┆ I• •a•m• •n•o•t•
20 00 79 00 6f 00 75 00  6e 00 67 00 20 00 65 00  ┆  •y•o•u•n•g• •e•
6e 00 6f 00 75 00 67 00  68 00 20 00 74 00 6f 00  ┆ n•o•u•g•h• •t•o•
20 00 6b 00 6e 00 6f 00  77 00 20 00 65 00 76 00  ┆  •k•n•o•w• •e•v•
65 00 72 00 79 00 74 00  68 00 69 00 6e 00 67 00  ┆ e•r•y•t•h•i•n•g•
2e 00                                             ┆ .•

Input:

s:u32be "\"illusion is the first\nof all pleasures\" 🦉"

Output:

00 00 00 22 00 00 00 69  00 00 00 6c 00 00 00 6c  ┆ •••"•••i•••l•••l
00 00 00 75 00 00 00 73  00 00 00 69 00 00 00 6f  ┆ •••u•••s•••i•••o
00 00 00 6e 00 00 00 20  00 00 00 69 00 00 00 73  ┆ •••n••• •••i•••s
00 00 00 20 00 00 00 74  00 00 00 68 00 00 00 65  ┆ ••• •••t•••h•••e
00 00 00 20 00 00 00 66  00 00 00 69 00 00 00 72  ┆ ••• •••f•••i•••r
00 00 00 73 00 00 00 74  00 00 00 0a 00 00 00 6f  ┆ •••s•••t•••••••o
00 00 00 66 00 00 00 20  00 00 00 61 00 00 00 6c  ┆ •••f••• •••a•••l
00 00 00 6c 00 00 00 20  00 00 00 70 00 00 00 6c  ┆ •••l••• •••p•••l
00 00 00 65 00 00 00 61  00 00 00 73 00 00 00 75  ┆ •••e•••a•••s•••u
00 00 00 72 00 00 00 65  00 00 00 73 00 00 00 22  ┆ •••r•••e•••s•••"
00 00 00 20 00 01 f9 89                           ┆ ••• ••••

Input:

s:latin1 "Paul Piché"

Output:

50 61 75 6c 20 50 69 63  68 e9  ┆ Paul Pich•

Current byte order setting

This special item sets the current byte order.

The two accepted forms are:

!be

Set the current byte order to big endian.

!le

Set the current byte order to little endian.

Fixed-length number

A fixed-length number represents a fixed number of bytes encoding either:

  • An unsigned or signed integer (two’s complement).

    The available lengths are 8, 16, 24, 32, 40, 48, 56, and 64.

  • A floating point number (IEEE 754-2008).

    The available lengths are 32 (binary32) and 64 (binary64).

The value is the result of evaluating a Python 3 expression.

The byte order to use to encode the value is either directly specified or is the current byte order.

A fixed-length number is:

  1. The [ prefix.

  2. A valid Python 3 expression.

    For a fixed-length number at some source location L, this expression may contain the name of any accessible label (not within a nested group), including the name of a label defined after L (except within a transformation block), as well as the name of any variable known at L.

    The value of the special name ICITTE (int type) in this expression is the current offset (before encoding the number).

  3. The : character.

  4. An encoding length in bits amongst:

    The expression evaluates to an int or bool value

    8, 16, 24, 32, 40, 48, 56, and 64.

    Note
    Normand automatically converts a bool value to int.
    The expression evaluates to a float value

    32 and 64.

  5. Optional: a suffix of the previous encoding length, without any whitespace, amongst:

    be

    Encode in big endian.

    le

    Encode in little endian.

    Without this suffix, the encoding byte order is the current byte order which must be defined if the encoding length is greater than eight.

  6. The ] suffix.

Input:

[345:16le]
[-0xabcd:32be]

Output:

59 01 ff ff 54 33

Input:

!be

# String length in bits
[8 * (str_end - str_beg) : 16]

# String
<str_beg>
  "hello world!"
<str_end>

Output:

00 60 68 65 6c 6c 6f 20  77 6f 72 6c 64 21  ┆ •`hello world!

Input:

[20 - ICITTE : 8] * 10

Output:

14 13 12 11 10 0f 0e 0d  0c 0b

Input:

[2 * 0.0529 : 32le]

Output:

ac ad d8 3d

LEB128 integer

An LEB128 integer represents a variable number of bytes encoding an unsigned or signed integer which is the result of evaluating a Python 3 expression following the LEB128 format.

An LEB128 integer is:

  1. The [ prefix.

  2. A valid Python 3 expression of which the evaluation result type is int or bool (automatically converted to int).

    For an LEB128 integer at some source location L, this expression may contain:

    • The name of any label defined before L which isn’t within a nested group.

    • The name of any variable known at L.

    The value of the special name ICITTE (int type) in this expression is the current offset (before encoding the integer).

  3. The : character.

  4. One of:

    uleb128

    Use the unsigned LEB128 format.

    sleb128

    Use the signed LEB128 format.

  5. The ] suffix.

Input:

[624485 : uleb128]

Output:

e5 8e 26

Input:

aa bb cc dd
<meow>
ee ff
[-981238311 + (meow * -23) : sleb128]
"hello"

Output:

aa bb cc dd ee ff fd fa  8d ac 7c 68 65 6c 6c 6f  ┆ ••••••••••|hello

String

A string represents a variable number of bytes encoding a string which is the result of evaluating a Python 3 expression using the UTF-8, UTF-16, UTF-32, or Latin-1 to Latin-10 encoding.

A string has two possible forms:

Encoding prefix form

  1. An encoding amongst:

    s:u8
    u8

    UTF-8.

    s:u16be
    u16be

    UTF-16BE.

    s:u16le
    u16le

    UTF-16LE.

    s:u32be
    u32be

    UTF-32BE.

    s:u32le
    u32le

    UTF-32LE.

    s:latin1

    ISO/IEC 8859-1.

    s:latin2

    ISO/IEC 8859-2.

    s:latin3

    ISO/IEC 8859-3.

    s:latin4

    ISO/IEC 8859-4.

    s:latin5

    ISO/IEC 8859-9.

    s:latin6

    ISO/IEC 8859-10.

    s:latin7

    ISO/IEC 8859-13.

    s:latin8

    ISO/IEC 8859-14.

    s:latin9

    ISO/IEC 8859-15.

    s:latin10

    ISO/IEC 8859-16.

  2. The { prefix.

  3. A valid Python 3 expression of which the evaluation result type is bool, int, float, or str (the first three automatically converted to str).

    For a string at some source location L, this expression may contain:

    • The name of any label defined before L which isn’t within a nested group.

    • The name of any variable known at L.

    The value of the special name ICITTE (int type) in this expression is the current offset (before encoding the string).

  4. The } suffix.

Encoding suffix form

  1. The [ prefix.

  2. A valid Python 3 expression of which the evaluation result type is bool, int, float, or str (the first three automatically converted to str).

    For a string at some source location L, this expression may contain:

    • The name of any label defined before L which isn’t within a nested group.

    • The name of any variable known at L.

    The value of the special name ICITTE (int type) in this expression is the current offset (before encoding the string).

  3. The : character.

  4. A string encoding amongst:

    s:u8

    UTF-8.

    s:u16be

    UTF-16BE.

    s:u16le

    UTF-16LE.

    s:u32be

    UTF-32BE.

    s:u32le

    UTF-32LE.

    s:latin1

    ISO/IEC 8859-1.

    s:latin2

    ISO/IEC 8859-2.

    s:latin3

    ISO/IEC 8859-3.

    s:latin4

    ISO/IEC 8859-4.

    s:latin5

    ISO/IEC 8859-9.

    s:latin6

    ISO/IEC 8859-10.

    s:latin7

    ISO/IEC 8859-13.

    s:latin8

    ISO/IEC 8859-14.

    s:latin9

    ISO/IEC 8859-15.

    s:latin10

    ISO/IEC 8859-16.

  5. The ] suffix.

Input:

{iter = 1}

!repeat 10
  u8{iter} " "
  {iter = iter + 1}
!end

Output:

31 20 32 20 33 20 34 20  35 20 36 20 37 20 38 20  ┆ 1 2 3 4 5 6 7 8
39 20 31 30 20                                    ┆ 9 10

Input:

{meow = 'salut jérémie'}
[meow.upper() : s:latin1]

Output:

53 41 4c 55 54 20 4a c9  52 c9 4d 49 45  ┆ SALUT J•R•MIE

Current offset setting

This special item sets the current offset.

A current offset setting is:

  1. The < prefix.

  2. A positive constant integer which is the new current offset.

  3. The > suffix.

Input:

       [ICITTE : 8] * 8
<0x61> [ICITTE : 8] * 8

Output:

00 01 02 03 04 05 06 07  61 62 63 64 65 66 67 68  ┆ ••••••••abcdefgh

Input:

aa bb cc dd <meow> ee ff
<12> 11 22 33 <mix> 44 55
[meow : 8] [mix : 8]

Output:

aa bb cc dd ee ff 11 22  33 44 55 04 0f  ┆ •••••••"3DU••

Current offset alignment

A current offset alignment represents zero or more padding bytes to make the current offset meet a given alignment value.

More specifically, for an alignment value of N bits, a current offset alignment represents the required padding bytes until the current offset is a multiple of N / 8.

A current offset alignment is:

  1. The @ prefix.

  2. A positive constant integer which is the alignment value in bits.

    This value must be greater than zero and a multiple of 8.

  3. Optional:

    1. The ~ prefix.

    2. A positive constant integer which is the value of the byte to use as padding to align the current offset.

    Without this section, the padding byte value is zero.

Input:

11 22 (@32 aa bb cc) * 3

Output:

11 22 00 00 aa bb cc 00  aa bb cc 00 aa bb cc

Input:

!le
77 88
@32~0xcc [-893.5:32]
@128~0x55 "meow"

Output:

77 88 cc cc 00 60 5f c4  55 55 55 55 55 55 55 55  ┆ w••••`_•UUUUUUUU
6d 65 6f 77                                       ┆ meow

Input:

aa bb cc <29> @64~255 "zoom"

Output:

aa bb cc ff ff ff 7a 6f  6f 6d  ┆ ••••••zoom

Filling

A filling represents zero or more padding bytes to make the current offset reach a given value.

A filling is:

  1. The + prefix.

  2. One of:

    • A positive constant integer which is the current offset target.

    • The { prefix, a valid Python 3 expression of which the evaluation result type is int or bool (automatically converted to int), and the } suffix.

      For a filling at some source location L, this expression may contain:

      • The name of any label defined before L which isn’t within a nested group.

      • The name of any variable known at L.

      The value of the special name ICITTE (int type) in this expression is the current offset (before handling the items to repeat).

    • A valid Python 3 name.

      For the name NAME, this is equivalent to the {NAME} form above.

    This value must be greater than or equal to the current offset where it’s used.

  3. Optional:

    1. The ~ prefix.

    2. A positive constant integer which is the value of the byte to use as padding to reach the current offset target.

    Without this section, the padding byte value is zero.

Input:

aa bb cc dd
+0x40
"hello world"

Output:

aa bb cc dd 00 00 00 00  00 00 00 00 00 00 00 00  ┆ ••••••••••••••••
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ┆ ••••••••••••••••
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ┆ ••••••••••••••••
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ┆ ••••••••••••••••
68 65 6c 6c 6f 20 77 6f  72 6c 64                 ┆ hello world

Input:

!macro part(iter, fill)
  <0> "particular security " [ord('0') + iter : 8] +fill~0x80
!end

{iter = 1}

!repeat 5
  m:part(iter, {32 + 4 * iter})
  {iter = iter + 1}
!end

Output:

70 61 72 74 69 63 75 6c  61 72 20 73 65 63 75 72  ┆ particular secur
69 74 79 20 31 80 80 80  80 80 80 80 80 80 80 80  ┆ ity 1•••••••••••
80 80 80 80 70 61 72 74  69 63 75 6c 61 72 20 73  ┆ ••••particular s
65 63 75 72 69 74 79 20  32 80 80 80 80 80 80 80  ┆ ecurity 2•••••••
80 80 80 80 80 80 80 80  80 80 80 80 70 61 72 74  ┆ ••••••••••••part
69 63 75 6c 61 72 20 73  65 63 75 72 69 74 79 20  ┆ icular security
33 80 80 80 80 80 80 80  80 80 80 80 80 80 80 80  ┆ 3•••••••••••••••
80 80 80 80 80 80 80 80  70 61 72 74 69 63 75 6c  ┆ ••••••••particul
61 72 20 73 65 63 75 72  69 74 79 20 34 80 80 80  ┆ ar security 4•••
80 80 80 80 80 80 80 80  80 80 80 80 80 80 80 80  ┆ ••••••••••••••••
80 80 80 80 80 80 80 80  70 61 72 74 69 63 75 6c  ┆ ••••••••particul
61 72 20 73 65 63 75 72  69 74 79 20 35 80 80 80  ┆ ar security 5•••
80 80 80 80 80 80 80 80  80 80 80 80 80 80 80 80  ┆ ••••••••••••••••
80 80 80 80 80 80 80 80  80 80 80 80              ┆ ••••••••••••

Label

A label associates a name to the current offset.

All the labels of a whole Normand input must have unique names.

A label must not share the name of a variable name.

A label is:

  1. The < prefix.

  2. A valid Python 3 name which is not ICITTE.

  3. The > suffix.

Variable assignment

A variable assignment associates a name to the integral result of an evaluated Python 3 expression.

A variable assignment is:

  1. The { prefix.

  2. A valid Python 3 name which is not ICITTE.

  3. The = character.

  4. A valid Python 3 expression of which the evaluation result type is int, float, or bool (automatically converted to int), or str.

    For a variable assignment at some source location L, this expression may contain:

    • The name of any label defined before L which isn’t within a nested group.

    • The name of any variable known at L.

    The value of the special name ICITTE (int type) in this expression is the current offset.

  5. The } suffix.

Input:

{mix = 101} !le
{meow = 42} 11 22 [meow:8] 33 {meow = ICITTE + 17}
"yooo" [meow + mix : 16]

Output:

11 22 2a 33 79 6f 6f 6f  7a 00  ┆ •"*3yoooz•

Group

A group is a scoped sequence of items.

The labels within a group aren’t visible outside of it.

The main purpose of a group is to repeat more than a single item and to isolate labels.

A group is:

  1. The (, !group, or !g opening.

  2. Zero or more items except, recursively, a macro definition block.

  3. Depending on the group opening:

    (

    The ) closing.

    !group
    !g

    The !end closing.

Input:

((aa bb cc) dd () ee) "leclerc"

Output:

aa bb cc dd ee 6c 65 63  6c 65 72 63  ┆ •••••leclerc

Input:

!group
  (aa bb cc) * 3 dd ee
!end * 5

Output:

aa bb cc aa bb cc aa bb  cc dd ee aa bb cc aa bb
cc aa bb cc dd ee aa bb  cc aa bb cc aa bb cc dd
ee aa bb cc aa bb cc aa  bb cc dd ee aa bb cc aa
bb cc aa bb cc dd ee

Input:

!be
(
  <str_beg> u16le"sébastien diaz" <str_end>
  [ICITTE - str_beg : 8]
  [(end - str_beg) * 5 : 24]
) * 3
<end>

Output:

73 00 e9 00 62 00 61 00  73 00 74 00 69 00 65 00  ┆ s•••b•a•s•t•i•e•
6e 00 20 00 64 00 69 00  61 00 7a 00 1c 00 01 e0  ┆ n• •d•i•a•z•••••
73 00 e9 00 62 00 61 00  73 00 74 00 69 00 65 00  ┆ s•••b•a•s•t•i•e•
6e 00 20 00 64 00 69 00  61 00 7a 00 1c 00 01 40  ┆ n• •d•i•a•z••••@
73 00 e9 00 62 00 61 00  73 00 74 00 69 00 65 00  ┆ s•••b•a•s•t•i•e•
6e 00 20 00 64 00 69 00  61 00 7a 00 1c 00 00 a0  ┆ n• •d•i•a•z•••••

Conditional block

A conditional block represents either the bytes of zero or more items if some expression is true, or the bytes of zero or more other items if it’s false.

A conditional block is:

  1. The !if opening.

  2. One of:

    • The { prefix, a valid Python 3 expression of which the evaluation result type is int or bool (automatically converted to int), and the } suffix.

      For a conditional block at some source location L, this expression may contain:

      • The name of any label defined before L which isn’t within a nested group.

      • The name of any variable known at L.

      The value of the special name ICITTE (int type) in this expression is the current offset (before handling the contained items).

    • A valid Python 3 name.

      For the name NAME, this is equivalent to the {NAME} form above.

  3. Zero or more items to be handled when the condition is true except, recursively, a macro definition block.

  4. Optional:

    1. The !else opening.

    2. Zero or more items to be handled when the condition is false except, recursively, a macro definition block

  5. The !end closing.

Input:

{at = 1}
{rep_count = 9}

!repeat rep_count
  "meow "

  !if {ICITTE > 25}
    "mix"
  !else
    "zoom"
  !end

  !if {at < rep_count} 20 !end

  {at = at + 1}
!end

Output:

6d 65 6f 77 20 7a 6f 6f  6d 20 6d 65 6f 77 20 7a  ┆ meow zoom meow z
6f 6f 6d 20 6d 65 6f 77  20 7a 6f 6f 6d 20 6d 65  ┆ oom meow zoom me
6f 77 20 6d 69 78 20 6d  65 6f 77 20 6d 69 78 20  ┆ ow mix meow mix
6d 65 6f 77 20 6d 69 78  20 6d 65 6f 77 20 6d 69  ┆ meow mix meow mi
78 20 6d 65 6f 77 20 6d  69 78 20 6d 65 6f 77 20  ┆ x meow mix meow
6d 69 78                                          ┆ mix

Input:

<str_beg>
u16le"meow mix!"
<str_end>

!if {str_end - str_beg > 10}
  " BIG"
!end

Output:

6d 00 65 00 6f 00 77 00  20 00 6d 00 69 00 78 00  ┆ m•e•o•w• •m•i•x•
21 00 20 42 49 47                                 ┆ !• BIG

Repetition block

A repetition block represents the bytes of one or more items repeated a given number of times.

A repetition block is:

  1. The !repeat or !r opening.

  2. One of:

    • A positive constant integer which is the number of times to repeat the previous item.

    • The { prefix, a valid Python 3 expression of which the evaluation result type is int or bool (automatically converted to int), and the } suffix.

      For a repetition block at some source location L, this expression may contain:

      • The name of any label defined before L which isn’t within a nested group.

      • The name of any variable known at L.

      The value of the special name ICITTE (int type) in this expression is the current offset (before handling the items to repeat).

    • A valid Python 3 name.

      For the name NAME, this is equivalent to the {NAME} form above.

  3. Zero or more items except, recursively, a macro definition block.

  4. The !end closing.

You may also use a post-item repetition after some items. The form !repeat X ITEMS !end is equivalent to (ITEMS) * X.

Input:

!repeat 0o400
  [end - ICITTE - 1 : 8]
!end

<end>

Output:

ff fe fd fc fb fa f9 f8  f7 f6 f5 f4 f3 f2 f1 f0  ┆ ••••••••••••••••
ef ee ed ec eb ea e9 e8  e7 e6 e5 e4 e3 e2 e1 e0  ┆ ••••••••••••••••
df de dd dc db da d9 d8  d7 d6 d5 d4 d3 d2 d1 d0  ┆ ••••••••••••••••
cf ce cd cc cb ca c9 c8  c7 c6 c5 c4 c3 c2 c1 c0  ┆ ••••••••••••••••
bf be bd bc bb ba b9 b8  b7 b6 b5 b4 b3 b2 b1 b0  ┆ ••••••••••••••••
af ae ad ac ab aa a9 a8  a7 a6 a5 a4 a3 a2 a1 a0  ┆ ••••••••••••••••
9f 9e 9d 9c 9b 9a 99 98  97 96 95 94 93 92 91 90  ┆ ••••••••••••••••
8f 8e 8d 8c 8b 8a 89 88  87 86 85 84 83 82 81 80  ┆ ••••••••••••••••
7f 7e 7d 7c 7b 7a 79 78  77 76 75 74 73 72 71 70  ┆ •~}|{zyxwvutsrqp
6f 6e 6d 6c 6b 6a 69 68  67 66 65 64 63 62 61 60  ┆ onmlkjihgfedcba`
5f 5e 5d 5c 5b 5a 59 58  57 56 55 54 53 52 51 50  ┆ _^]\[ZYXWVUTSRQP
4f 4e 4d 4c 4b 4a 49 48  47 46 45 44 43 42 41 40  ┆ ONMLKJIHGFEDCBA@
3f 3e 3d 3c 3b 3a 39 38  37 36 35 34 33 32 31 30  ┆ ?>=<;:9876543210
2f 2e 2d 2c 2b 2a 29 28  27 26 25 24 23 22 21 20  ┆ /.-,+*)('&%$#"!
1f 1e 1d 1c 1b 1a 19 18  17 16 15 14 13 12 11 10  ┆ ••••••••••••••••
0f 0e 0d 0c 0b 0a 09 08  07 06 05 04 03 02 01 00  ┆ ••••••••••••••••

Input:

{times = 1}

aa bb cc dd

!repeat 3
  <here>

  !repeat {here + 1}
    ee ff
  !end

  11 22 !repeat times 33 !end

  {times = times + 1}
!end

"coucou!"

Output:

aa bb cc dd ee ff ee ff  ee ff ee ff ee ff 11 22  ┆ •••••••••••••••"
33 ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ 3•••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff 11 22 33  33 ee ff ee ff ee ff ee  ┆ ••••••"33•••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff 11 22 33  ┆ ••••••••••••••"3
33 33 63 6f 75 63 6f 75  21                       ┆ 33coucou!

Transformation block

A transformation block represents the bytes of one or more items transformed into other bytes by a function.

As of this version, Normand only offers a predetermined set of transformation functions.

An encoded block is:

  1. The !transform or !t opening.

  2. A transformation function name amongst:

    base64
    b64

    Standard Base64.

    base64u
    b64u

    URL-safe Base64, using - instead of + and _ instead of /.

    base32
    b32

    Standard Base32.

    base16
    b16

    Standard Base16.

    ascii85
    a85

    Ascii85 without padding.

    ascii85p
    a85p

    Ascii85 with padding.

    base85
    b85

    Base85 (like Git-style binary diffs) without padding.

    base85p
    b85p

    Base85 with padding.

    quopri
    qp

    MIME quoted-printable without quoted whitespaces.

    quoprit
    qpt

    MIME quoted-printable with quoted whitespaces.

    gzip
    gz

    gzip.

    bzip2
    bz2

    bzip2.

  3. Zero or more items except, recursively, a macro definition block.

    Any Python 3 expression within any of those items may not refer to a future label.

    The value of the special name ICITTE in any Python 3 expression within any of those items is the current offset before Normand applies the transformation function. Therefore, labels defined within those items also have the current offset value before Normand applies the transformation function.

  4. The !end closing.

The current offset after having handled the last item of a transformation block is the value of the current offset before handling the first item plus the size of the generated (transformed) bytes. In other words, current offset settings within the items of the block have no impact outside said block.

Input:

aa bb cc dd

"size of compressed section: " [end - start : 8]

<start>

!transform bzip2
  "this will be compressed!"
  89*100 00*5000
!end

<end>

"yes!"

Output:

aa bb cc dd 73 69 7a 65  20 6f 66 20 63 6f 6d 70  ┆ ••••size of comp
72 65 73 73 65 64 20 73  65 63 74 69 6f 6e 3a 20  ┆ ressed section:
52 42 5a 68 39 31 41 59  26 53 59 68 e1 8c fc 00  ┆ RBZh91AY&SYh••••
00 33 d1 e0 c0 00 60 00  5e 66 dc 80 00 20 00 80  ┆ •3••••`•^f••• ••
00 08 20 00 31 40 d3 43  23 26 20 ca 87 a9 a1 e8  ┆ •• •1@•C#& •••••
18 29 44 80 9c 80 49 bf  cc b3 e8 45 ed e2 76 ad  ┆ •)D•••I••••E••v•
0f 12 8b 8a d6 cd 40 04  7e 2e e4 8a 70 a1 20 d1  ┆ ••••••@•~.••p• •
c3 19 f8 79 65 73 21                              ┆ •••yes!

Input:

88*16

!t a85
  "I am determined to be cheerful and happy in whatever situation "
  "I may find myself. For I have learned that the greater part of "
  "our misery or unhappiness is determined not by our circumstance "
  "but by our disposition."
!end

@128~99h

!t qp <beg> [ICITTE - beg : 8] * 50 !end

Output:

88 88 88 88 88 88 88 88  88 88 88 88 88 88 88 88  ┆ ••••••••••••••••
38 4b 5f 47 59 2b 43 6f  26 2a 41 54 44 58 25 44  ┆ 8K_GY+Co&*ATDX%D
49 6d 3f 24 46 44 69 3a  32 41 4b 59 4a 72 41 53  ┆ Im?$FDi:2AKYJrAS
23 6d 6f 46 5f 69 31 2f  44 49 61 6c 27 40 3b 70  ┆ #moF_i1/DIal'@;p
31 32 2b 44 47 5e 39 47  41 28 45 2c 41 54 68 58  ┆ 12+DG^9GA(E,AThX
2a 2b 45 4d 37 3d 46 5e  5d 42 2b 44 66 2d 5b 68  ┆ *+EM7=F^]B+Df-[h
2b 44 6b 50 34 2b 44 2c  3e 2a 41 30 3e 60 37 46  ┆ +DkP4+D,>*A0>`7F
28 4b 30 22 2f 67 2a 57  25 45 5a 64 70 72 42 4f  ┆ (K0"/g*W%EZdprBO
51 27 71 2b 44 62 55 74  45 63 2c 48 21 2b 45 56  ┆ Q'q+DbUtEc,H!+EV
3a 2a 46 3c 47 5b 3d 41  4b 59 57 2b 41 52 54 5b  ┆ :*F<G[=AKYW+ART[
6c 45 5a 66 3d 30 45 63  60 46 42 41 66 75 23 37  ┆ lEZf=0Ec`FBAfu#7
45 5a 66 34 35 46 28 4b  42 3b 2b 45 29 39 43 46  ┆ EZf45F(KB;+E)9CF
60 28 6c 24 45 2c 5d 4e  2f 41 54 4d 6f 38 42 6c  ┆ `(l$E,]N/ATMo8Bl
62 44 2d 41 54 56 4c 28  44 2f 21 6d 21 41 30 3e  ┆ bD-ATVL(D/!m!A0>
63 2e 46 3c 47 25 3c 2b  45 29 43 43 2b 43 66 2c  ┆ c.F<G%<+E)CC+Cf,
2b 40 73 29 58 30 46 43  42 26 73 41 4b 59 48 29  ┆ +@s)X0FCB&sAKYH)
46 3c 47 25 3c 2b 45 29  43 43 2b 43 6f 32 2d 45  ┆ F<G%<+E)CC+Co2-E
2c 54 66 33 46 44 35 5a  32 2f 63 99 99 99 99 99  ┆ ,Tf3FD5Z2/c•••••
3d 30 30 3d 30 31 3d 30  32 3d 30 33 3d 30 34 3d  ┆ =00=01=02=03=04=
30 35 3d 30 36 3d 30 37  3d 30 38 3d 30 39 0a 3d  ┆ 05=06=07=08=09•=
30 42 3d 30 43 0d 3d 30  45 3d 30 46 3d 31 30 3d  ┆ 0B=0C•=0E=0F=10=
31 31 3d 31 32 3d 31 33  3d 31 34 3d 31 35 3d 31  ┆ 11=12=13=14=15=1
36 3d 31 37 3d 31 38 3d  31 39 3d 31 41 3d 31 42  ┆ 6=17=18=19=1A=1B
3d 31 43 3d 31 44 3d 31  45 3d 31 46 20 21 22 23  ┆ =1C=1D=1E=1F !"#
24 25 26 27 28 29 2a 2b  2c 2d 3d 0a 2e 2f 30 31  ┆ $%&'()*+,-=•./01

Macro definition block

A macro definition block associates a name and parameter names to a group of items.

A macro definition block doesn’t lead to generated bytes itself: a macro expansion does so.

A macro definition may only exist at the root level, that is, not within a group, a repetition block, a conditional block, or another macro definition block.

All macro definitions must have unique names.

A macro definition is:

  1. The !macro or !m opening.

  2. A valid Python 3 name (the macro name).

  3. The ( parameter name list prefix.

  4. A comma-separated list of zero or more unique parameter names, each one being a valid Python 3 name.

  5. The ) parameter name list suffix.

  6. Zero or more items except, recursively, a macro definition block.

  7. The !end closing.

!macro bake()
  !le [ICITTE * 8 : 16]
  u16le"predict explode"
!end
!macro nail(rep, with_extra, val)
  {iter = 1}

  !repeat rep
    [val + iter : uleb128]
    [0xdeadbeef : 32]
    {iter = iter + 1}
  !end

  !if with_extra
    "meow mix\0"
  !end
!end

Macro expansion

A macro expansion expands the items of a defined macro.

The macro to expand must be defined before the expansion.

The state before handling the first item of the chosen macro is:

Current offset

Unchanged.

Current byte order

Unchanged.

Variables

The only available variables initially are the macro parameters.

Labels

None.

The state after having handled the last item of the chosen macro is:

Current offset

The one before handling the first item of the macro plus the size of the generated data of the macro expansion.

Important
This means current offset setting items within the expanded macro don’t impact the final current offset.
Current byte order

The one before handling the first item of the macro.

Variables

The ones before handling the first item of the macro.

Labels

The ones before handling the first item of the macro.

A macro expansion is:

  1. The m: prefix.

  2. A valid Python 3 name (the name of the macro to expand).

  3. The ( parameter value list prefix.

  4. A comma-separated list of zero or more unique parameter values.

    The number of parameter values must match the number of parameter names of the definition of the chosen macro.

    A parameter value is one of:

    • A constant integer, possibly negative.

    • A constant floating point number.

    • The { prefix, a valid Python 3 expression of which the evaluation result type is int or bool (automatically converted to int), and the } suffix.

      For a macro expansion at some source location L, this expression may contain:

      • The name of any label defined before L which isn’t within a nested group.

      • The name of any variable known at L.

      The value of the special name ICITTE (int type) in this expression is the current offset (before handling the items of the chosen macro).

    • A valid Python 3 name.

      For the name NAME, this is equivalent to the {NAME} form above.

  5. The ) parameter value list suffix.

Input:

!macro bake()
  !le [ICITTE * 8 : 16]
  u16le"predict explode"
!end

"hello [" m:bake() "] world"

m:bake() * 5

Output:

68 65 6c 6c 6f 20 5b 38  00 70 00 72 00 65 00 64  ┆ hello [8•p•r•e•d
00 69 00 63 00 74 00 20  00 65 00 78 00 70 00 6c  ┆ •i•c•t• •e•x•p•l
00 6f 00 64 00 65 00 5d  20 77 6f 72 6c 64 70 01  ┆ •o•d•e•] worldp•
70 00 72 00 65 00 64 00  69 00 63 00 74 00 20 00  ┆ p•r•e•d•i•c•t• •
65 00 78 00 70 00 6c 00  6f 00 64 00 65 00 70 02  ┆ e•x•p•l•o•d•e•p•
70 00 72 00 65 00 64 00  69 00 63 00 74 00 20 00  ┆ p•r•e•d•i•c•t• •
65 00 78 00 70 00 6c 00  6f 00 64 00 65 00 70 03  ┆ e•x•p•l•o•d•e•p•
70 00 72 00 65 00 64 00  69 00 63 00 74 00 20 00  ┆ p•r•e•d•i•c•t• •
65 00 78 00 70 00 6c 00  6f 00 64 00 65 00 70 04  ┆ e•x•p•l•o•d•e•p•
70 00 72 00 65 00 64 00  69 00 63 00 74 00 20 00  ┆ p•r•e•d•i•c•t• •
65 00 78 00 70 00 6c 00  6f 00 64 00 65 00 70 05  ┆ e•x•p•l•o•d•e•p•
70 00 72 00 65 00 64 00  69 00 63 00 74 00 20 00  ┆ p•r•e•d•i•c•t• •
65 00 78 00 70 00 6c 00  6f 00 64 00 65 00        ┆ e•x•p•l•o•d•e•

Input:

!macro A(val, is_be)
  !le

  !if is_be
    !be
  !end

  [val : 16]
!end

!macro B(rep, is_be)
  {iter = 1}

  !repeat rep
  m:A({iter * 3}, is_be)
  {iter = iter + 1}
  !end
!end

m:B(5, 1)
m:B(3, 0)

Output:

00 03 00 06 00 09 00 0c  00 0f 03 00 06 00 09 00

Input:

!macro flt32be(val) !be [val : 32] !end

"CHEETOS"
m:flt32be(-42.17)
m:flt32be(56.23e-4)

Output:

43 48 45 45 54 4f 53 c2  28 ae 14 3b b8 41 25     ┆ CHEETOS•(••;•A%

Post-item repetition

A post-item repetition represents the bytes of an item repeated a given number of times.

A post-item repetition is:

  1. One of those items:

  2. The * character.

  3. One of:

    • A positive integer (hexadecimal starting with 0x or 0X accepted) which is the number of times to repeat the previous item.

    • The { prefix, a valid Python 3 expression of which the evaluation result type is int or bool (automatically converted to int), and the } suffix.

      For a post-item repetition at some source location L, this expression may contain:

      • The name of any label defined before L which isn’t within a nested group and which isn’t part of the repeated item.

      • The name of any variable known at L, which isn’t part of its repeated item, and which doesn’t.

      The value of the special name ICITTE (int type) in this expression is the current offset (before handling the items to repeat).

    • A valid Python 3 name.

      For the name NAME, this is equivalent to the {NAME} form above.

You may also use a repetition block. The form ITEM * X is equivalent to !repeat X ITEM !end.

Input:

[end - ICITTE - 1 : 8] * 0x100 <end>

Output:

ff fe fd fc fb fa f9 f8  f7 f6 f5 f4 f3 f2 f1 f0  ┆ ••••••••••••••••
ef ee ed ec eb ea e9 e8  e7 e6 e5 e4 e3 e2 e1 e0  ┆ ••••••••••••••••
df de dd dc db da d9 d8  d7 d6 d5 d4 d3 d2 d1 d0  ┆ ••••••••••••••••
cf ce cd cc cb ca c9 c8  c7 c6 c5 c4 c3 c2 c1 c0  ┆ ••••••••••••••••
bf be bd bc bb ba b9 b8  b7 b6 b5 b4 b3 b2 b1 b0  ┆ ••••••••••••••••
af ae ad ac ab aa a9 a8  a7 a6 a5 a4 a3 a2 a1 a0  ┆ ••••••••••••••••
9f 9e 9d 9c 9b 9a 99 98  97 96 95 94 93 92 91 90  ┆ ••••••••••••••••
8f 8e 8d 8c 8b 8a 89 88  87 86 85 84 83 82 81 80  ┆ ••••••••••••••••
7f 7e 7d 7c 7b 7a 79 78  77 76 75 74 73 72 71 70  ┆ •~}|{zyxwvutsrqp
6f 6e 6d 6c 6b 6a 69 68  67 66 65 64 63 62 61 60  ┆ onmlkjihgfedcba`
5f 5e 5d 5c 5b 5a 59 58  57 56 55 54 53 52 51 50  ┆ _^]\[ZYXWVUTSRQP
4f 4e 4d 4c 4b 4a 49 48  47 46 45 44 43 42 41 40  ┆ ONMLKJIHGFEDCBA@
3f 3e 3d 3c 3b 3a 39 38  37 36 35 34 33 32 31 30  ┆ ?>=<;:9876543210
2f 2e 2d 2c 2b 2a 29 28  27 26 25 24 23 22 21 20  ┆ /.-,+*)('&%$#"!
1f 1e 1d 1c 1b 1a 19 18  17 16 15 14 13 12 11 10  ┆ ••••••••••••••••
0f 0e 0d 0c 0b 0a 09 08  07 06 05 04 03 02 01 00  ┆ ••••••••••••••••

Input:

{times = 1}
aa bb cc dd
(
  <here>
  (ee ff) * {here + 1}
  11 22 33 * {times}
  {times = times + 1}
) * 3
"coucou!"

Output:

aa bb cc dd ee ff ee ff  ee ff ee ff ee ff 11 22  ┆ •••••••••••••••"
33 ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ 3•••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff 11 22 33  33 ee ff ee ff ee ff ee  ┆ ••••••"33•••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff ee ff ee  ┆ ••••••••••••••••
ff ee ff ee ff ee ff ee  ff ee ff ee ff 11 22 33  ┆ ••••••••••••••"3
33 33 63 6f 75 63 6f 75  21                       ┆ 33coucou!

Command-line tool

If you installed the normand package, then you can use the normand command-line tool:

$ normand <<< '"ma gang de malades"' | hexdump -C
00000000  6d 61 20 67 61 6e 67 20  64 65 20 6d 61 6c 61 64  |ma gang de malad|
00000010  65 73                                             |es|

If you copy the normand.py module to your own project, then you can run the module itself:

$ python3 -m normand <<< '"ma gang de malades"' | hexdump -C
00000000  6d 61 20 67 61 6e 67 20  64 65 20 6d 61 6c 61 64  |ma gang de malad|
00000010  65 73                                             |es|

Without a path argument, the normand tool reads from the standard input.

The normand tool prints the generated binary data to the standard output.

Various options control the initial state of the processor: use the --help option to learn more.

Python 3 API

The whole normand package/module public API is:

# Byte order.
class ByteOrder(enum.Enum):
    # Big endian.
    BE = ...

    # Little endian.
    LE = ...


# Text location.
class TextLocation:
    # Line number.
    @property
    def line_no(self) -> int:
        ...

    # Column number.
    @property
    def col_no(self) -> int:
        ...


# Parsing error message.
class ParseErrorMessage:
    # Message text.
    @property
    def text(self):
        ...

    # Source text location.
    @property
    def text_location(self):
        ...


# Parsing error.
class ParseError(RuntimeError):
    # Parsing error messages.
    #
    # The first message is the most _specific_ one.
    @property
    def messages(self):
        ...


# Variables dictionary type (for type hints).
VariablesT = typing.Dict[str, typing.Union[int, float]]


# Labels dictionary type (for type hints).
LabelsT = typing.Dict[str, int]


# Parsing result.
class ParseResult:
    # Generated data.
    @property
    def data(self) -> bytearray:
        ...

    # Updated variable values.
    @property
    def variables(self) -> SymbolsT:
        ...

    # Updated main group label values.
    @property
    def labels(self) -> SymbolsT:
        ...

    # Final offset.
    @property
    def offset(self) -> int:
        ...

    # Final byte order.
    @property
    def byte_order(self) -> typing.Optional[ByteOrder]:
        ...


# Parses the `normand` input using the initial state defined by
# `init_variables`, `init_labels`, `init_offset`, and `init_byte_order`,
# and returns the corresponding parsing result.
def parse(normand: str,
          init_variables: typing.Optional[SymbolsT] = None,
          init_labels: typing.Optional[SymbolsT] = None,
          init_offset: int = 0,
          init_byte_order: typing.Optional[ByteOrder] = None) -> ParseResult:
    ...

The normand parameter is the actual Normand input while the other parameters control the initial state.

The parse() function raises a ParseError instance should it fail to parse the normand string for any reason.

Development

Normand is a Poetry project.

To develop it, install it through Poetry and enter the virtual environment:

$ poetry install
$ poetry shell
$ normand <<< '"lol" * 10 0a'

normand.py is processed by:

Licensing and copyright follows the REUSE specification and is checked with the reuse tool.

Testing

Use pytest to test Normand once the package is part of your virtual environment, for example:

$ poetry install
$ poetry run pip3 install pytest
$ poetry run pytest

The pytest project is currently not a development dependency in pyproject.toml due to backward compatibiliy issues with Python 3.4.

In the tests directory, each *.nt file is a test. The file name prefix indicates what it’s meant to test:

pass-

Everything above the --- line is the valid Normand input to test.

Everything below the --- line is the expected data (whitespace-separated hexadecimal bytes).

fail-

Everything above the --- line is the invalid Normand input to test.

Everything below the --- line is the expected error message having this form:

LINE:COL - MESSAGE

Contributing

Normand uses Gerrit for code review.

To report a bug, create a GitHub issue.

About

Text-to-binary processor with its own language

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages