Skip to content

WGML Device Functions

Jiří Malák edited this page Feb 19, 2021 · 3 revisions
Table of Contents

Introduction

This page is concerned with device functions in several contexts:

  • how they are encoded by gendev
  • how they are read by wgml
  • how they are used by wgml
  • how they are documented

It will be quite some time before all of the issues involved are resolved.

The FunctionsBlock

Common Features

FunctionsBlocks encode the those sub-blocks of both :DEVICE blocks and :DRIVER blocks which contain device functions and so produce CodeBlocks. These sections of the binary file actually have two structures, both of which must be considered when a binary file is read or written.

The first is common to all FunctionsBlocks and is the physical structure. This consists of a series of buffers containing 80 (0x50) bytes each. Each buffer is preceded by the count (0x50), which appears as the character "P" when the file is viewed in ASCII; each group of 81 bytes will therefore be referred to as a P-buffer. These "P" bytes are in the FunctionsBlock but are not part of it: when reading the block, they must be ignored. They can occur anywhere, interrupting any part of the FunctionsBlock. Note that the combined P-buffers which together contain a FunctionsBlock will be an even multiple of 81 bytes (count byte plus 80 data bytes), and will usually have invalid data following the end of the logical structure.

The second is the logical structure. This is the meaning of the bytes contained in the block. From this viewpoint, the FunctionsBlock is shorter than the sum of the lengths of the P-buffers since the count bytes and the invalid data at the end is ignored. This varies, depending on the type of FunctionsBlock. Three Variants are known to exist.

In the binary device file, a single Variant A FunctionsBlock is used. These properties were observed:

  • The :PAUSE CodeBlocks occur before the :FONTPAUSE CodeBlocks.
  • The :FONTPAUSE CodeBlocks occur in the same order that the :FONTPAUSE blocks are in in the source file.
  • If two or more :FONTPAUSE blocks have identical :VALUE blocks, only one CodeBlock will result.

In the binary driver file, two cases exist where the P-buffers encoding different blocks follow each other without any intervening bytes:

  • init, finish, newline, unknown, newpage, htab and fontswitches
  • the last P-buffer in fontstyle, absoluteaddress, and hline.

Making this even more interesting, the fontstyle_block is, in effect, an array of structs which begin with a length byte which, of course, could be 0x50, thus simulating the start of a P-buffer without actually being one.

This has clear implications for parsing the binary file.

FunctionsBlock Variant A

For the binary device file, this block encodes the :VALUE blocks in both :PAUSE and :FONTPAUSE blocks. For the binary driver file, it is used directly to encode these sub-blocks:

  • The :ABSOLUTEADDRESS block.
  • The :DBOX block.
  • The :HLINE block.
  • The :HTAB block.
  • The :NEWPAGE block.
  • The :VLINE block.

This is the structure of the Variant A FunctionsBlock:

FunctionsBlock {
    uint16_t       count;
    CodeBlock      code_blocks[count];
};

The field count contains the number of CodeBlocks in the set of P-buffers in which the Variant A FunctionsBlock is embedded.

The field code_blocks is an array of CodeBlocks.

When a P-buffer is interpreted as a Variant A FunctionsBlock and the field count is "0x0000", then that P-buffer contains an empty FunctionsBlock.

FunctionsBlock Variant B

This variant is used to encode :FONTSTYLE blocks. Because it is specific to :FONTSTYLE blocks, it will be named "FontstyleFuncs". It has this structure:

FontstyleFuncs {
    uint8_t        flags[21];
    uint16_t       count;
    CodeBlock      code_blocks[count];
};

The field flags is discussed here.

The field count contains the number of CodeBlocks in the set of P-buffers in which the FontstyleFuncs struct is embedded.

The field code_blocks is an array of CodeBlocks.

FunctionsBlock Variant C

This variant exists in several block-specific types, each of which will have its own name, and all having the same structure in general:

<name>Funcs {
    uint16_t   count;
    <struct>   <data>[count];
}

where <name> is chosen to indicate which block the specific FunctionsBlock is used with, <struct> is a struct specific to the block encoded, and <data> is an array of "count" <struct>s.

The first <struct> instance begins directly after the count field. Each subsequent <struct> instance begins at the start of the first P-buffer following the last P-buffer containing any part of the current <struct> instance.

These blocks use Variant C FunctionsBlock structs:

  • The :FINISH block.
  • The :FONTSWITCH block.
  • The :INIT block.
  • The :NEWLINE block.

The details are in the sections in Driver File Blocks where the blocks listed are discussed.

Empty FunctionsBlocks

Empty FunctionsBlocks occur in two contexts:

  • In binary device files, a blank FunctionsBlock appears when the source file contains neither :PAUSE nor :FONTPAUSE blocks.
  • In binary driver files, where, as noted above, the various FunctionsBlocks for entire groups of fields occur with no intervening bytes, blank FunctionsBlocks are used as "placeholders" so that each such block can be identified.
  • The field unknown in a binary driver file is always present as an empty FunctionsBlock.

Empty FunctionsBlocks have these characteristics:

  1. They occupy exactly one P-buffer.
  2. For Variant A and Variant C, the field FunctionsBlock.count contains "0x0000".
  3. For Variant B, the field CodeBlock.count will be "0x0000".

The rest of the block may or may not be blank: gendev regularly reuses buffers without clearing them.

There will always be at least one FontstyleFuncs block for style "plain". If there was no :FONTSTYLE block with the value "plain" for attribute type, then gendev will generate exactly the same FontstyleFuncs block it does for a :FONTSTYLE containing nothing more than a value for the attribute type. This FonstyleFuncs block will have the the value "0x0001" for the field FontstyleFuncs.count but the field CodeBlock.count will contain the value "0x0000". This is a Variant B FunctionsBlock. Despite this, checking the field FontstyleFuncs.count may in some cases save some time: if it's value is "0x0000", then the FunctionsBlock is empty.

The CodeBlock

CodeBlock Structure

This is the CodeBlock structure:

CodeBlock {
   uint8_t        designator;
   uint8_t        cb05_flag;
   uint8_t        lp_flag;
   uint16_t       line_pass;
   uint16_t       count;
   uint8_t        text[count];
};

The field designator will have one of the values shown in Meta Data for CodeBlock types.

The field cb05_flag contains the value "0x00" in all CodeBlocks except for those in :FONTSTYLE blocks. In a :FONTSTYLE block, this field will contain the value "0x01" if there is no :STARTVALUE block outside of any :LINEPROC block (that is, no Codeblock with designator "0x05").

The field lp_flag contains the value "0x00" in all CodeBlocks except for those in :FONTSTYLE blocks. In a :FONTSTYLE block, this field will contain the value "0x01" in two distinct situations:

  1. The field cb05_flag contains the value "0x01". Note that this only happens in an empty CodeBlock, that is, one whose field count contains the value "0x0000" and whose field text contains a NULL pointer.
  2. The CodeBlock compiled from the block (contained within a :LINEPROC block) shown in the first column, contains exactly one device function, which is one of those shown in the second column:
:STARTVALUE %textpass(), %ulineon()
:FIRSTWORD  %ulineon(),  %ulineoff()
:STARTWORD  %ulineon(),  %ulineoff()
:ENDWORD    %ulineoff()
:ENDVALUE   %ulineoff()

The first condition is clear, but it is not clear why it happens. No method of producing a CodeBlock in which field cb05_flag contained "0x01" and field lp_flag contained "0x00" was found.

The second condition was tested with all of the Type I device functions and, depending on the block involved, with one or more of %textpass(), %ulineoff() and %ulineon() excluded per the restrictions enforced by gendev 4.1. It does not matter if the additional device functions precede or follow the device function: if any other device function is present, the value of field lp_flag will be "0x00". Note that the device function %textpass() can only be found in :STARTVALUE blocks within :LINEPROC blocks. Also note that, if multiple copies of %ulineon() or %ulineoff() are present, the field lp_flag will still have the value "0x01": it must be a different device function, not merely an additional device function for the value of the field lp_flag to become "0x00".

The name "lp_flag" is short for "lineproc_flag", since this flag can be set or not in each of the sub-blocks that can appear inside a :LINEPROC. I hypothesized that this flag mignt be used to indicate that the behavior of the ULINE fontstyle keyword documented in WGML 4 Reference is to be done. This possibility led me to explore hypothetical device functions: %uscoreon(), %usboldon(), %ulboldon(). gendev rejected all of them, so that is probably not a valid hypothesis.

The possibility that either or both of these flags do determine some aspect of the existing wgml's behavior should be kept in mind as a possible explanation for discrepancies observed when comparing what wgml 4.0 does with what our wgml does.

Formerly, the fields cb05_flag and lp_flag were treated as a single uint16_t field unknown. As discussed in Multiple CodeBlocks, it is quite likely that the fields designator and cb05_flag are, in fact, a single uint16_t field in gendev 4.1 (and presumably in gendev 3.33 as well). However, the parsing code skips fields cb05_flag and lp_flag, and, except for the :INIT blocks, uses field designator internally but does not report what it was to the rest of wgml. Thus, these fields are shown as uint8_t.

The field line_pass has the value "0x0000" in all CodeBlocks except those inside a :LINEPROC block within a :FONTSTYLE block, where it is used to record the line pass to which the CodeBlock belongs.

The field count contains the number of bytes in the encoded block.

The field text contains a pointer to the bytes themselves. It is these bytes that must be interpreted by wgml to provide the result intended by the author of the document being processed using the sequence of device functions selected by the author of the :DEVICE or :DRIVER block being used by wgml to guide its output.

A certain amount of space conservation occurs, at least in the :DEVICE block for :PAUSE and :FONTPAUSE blocks: if two :VALUE blocks are identical, they are only encoded once. The entire content of the :VALUE blocks must be identical, not just individual lines.

Multiple CodeBlocks

When multiple CodeBlocks are placed in the same FunctionsBlock, they usually follow each other without any intermediary bytes. This is why they are shown as arrays in the various FunctionsBlocks.

When :FONTSWITCH blocks were examined with really long values for field type, a curious phenomenon was observed: under certain conditions, a single junk byte can occur between CodeBlocks.

When the existing binary driver files were used to test cfparse.exe, one of them, PCGRDRV.COP, also turned out to have such a byte in two of it's :FONTSTYLE block encodings.

This, it turns out, depends on the offset of the field CodeBlock.designator in the P-buffer:

Offset of Designator         Element Shifted by Junk Byte
       79 (see note)         CodeBlock.designator
       76                    CodeBlock.line_pass
       74                    CodeBlock.count

Note: if the designator would normally occupy offset 79, it is shifted to offset 00 of the following P-buffer.

In all cases, it is the byte at offset 79 which is incorrect. Because gendev re-uses P-buffers without first clearing them, this can be any value and will be whatever value was in offset 79 of the P-buffer being reused. So the parsers may have to be modified to skip the byte at offset 79 if the field CodeBlock.designator falls (or would fall) on one of the three values above. This can be determined quite simply: if there are more CodeBlocks present, then the next designator should be immediately after the last byte of the current CodeBlock. The offset of that last byte can be computed, and so so can the offset of the designator.

This phenomenon can be conceptualized in this way: gendev 4.1 refuses to break three values across a P-buffer boundary: the value of the field CodeBlock.count, the value of the field CodeBlock.line_pass, and the value of the field CodeBlock.designator. Since the first two are uint16_t integers, this implies that the field CodeBlock.designator is, in fact, considered to be a uint16_t integer by gendev 4.1.

Testing shows that:

  • This only affects the fields listed above, not the contents of the field CodeBlock.function.
  • It does not matter if the CodeBlock is first, last, or in the middle of a set of CodeBlocks. All that matters is the position of the designator in the P-block.
  • This identical pattern is found in: :DEVICE blocks, :FONTSTYLE blocks, :FONTSWITCH blocks, and :INIT blocks.
  • The only place the first CodeBlock can start in such a position is in a :FONTSWITCH (the type plus the flags can push the Designator far enough into the P-buffer). The pattern is the same.
  • These blocks, which are restricted to one CodeBlock, can never manifest the problem: :ABSOLUTEADDRESS blocks, :DBOX blocks, :FINISH blocks, :HLINE blocks, :HTAB blocks, :NEWLINE blocks, :NEWPAGE blocks, and :VLINE blocks.

Since wgml 4.0 works with PCGRDRV, it would be natural to suppose that it has been programmed to compensate for this behavior. However, in PCGRDRV, the problem affects the value of the field CodeBlock.line_pass; the test framework used with Wgml Sequencing section frequently produced binary driver files which wgml 4.0 could not use. Problems reported by wgml 4.0 included:

  • not being able to find one of the :FONTSTYLE blocks; and
  • producing this error message:
SY--001: Memory exhausted!!!!!!!!!!!

before halting.

When cfparse.exe was enhanced to identify and compensate for this problem, two facts quickly became apparent:

  • The test drivers were, indeed, producing a great many CodeBlocks with this problem affecting one or another of these fields.
  • The only field wgml 4.0 could not handle when affected by this problem was CodeBlock.designator.

Until our gendev exists, it will not be possible to determine if not having gendev reproduce this problem for field CodeBlock.designator produces a binary file that wgml 4.0 can use successfully or whether wgml 4.0 just can't deal with it at all.

A note about :DEVICE blocks: the CodeBlocks produced from all of the :VALUE blocks inside the :PAUSE and :FONTPAUSE blocks are placed in one set of P-buffers, one directly following the other. The behavior seen in :DRIVER blocks, where the CodeBlocks belonging to each sub-block within the :DRIVER block begin at the start of the next P-buffer, does not occur in :DEVICE blocks.

Physical Limit

When I attempted to produce a binary file with a CodeBlock over 64K in size, this error message resulted:

SY--001: Memory exhausted!!!!!!!!!!!

However, this limit only applies to CodeBlocks: the FunctionBlock as a whole can be larger than 64K, although for :PAUSE and :FONTPAUSE this may make one or more of the PauseBlock.startpause, PauseBlock.documentpause, PauseBlock.docpagepause, PauseBlock.devpagepause, and Devicefont.fontpause fields "roll over", as they are only 16-bits wide even when the value they would normally contain requires a wider field.

Clone this wiki locally