Releases: zeek/spicy
v1.10.1
v1.9.1
-
Drop
;
after#pragma
. -
Update CI setups.
-
Fix repeated evaluations of
&parse-at
expression. -
Fix stray Python escape sequence.
-
Drop freebsd-12 from CI.
-
GH-1617: Fix handling of
%synchronize-*
attributes for units in lists.We previously would not detect
%synchronize-at
or%synchronize-from
attributes if the unit was not directly in a field, i.e., we mishandled the common case of synchronizing on a unit in a list.With this patch we now handle these attributes, regardless of how the unit appears.
v1.8.4
-
Drop
;
after#pragma
. -
Update CI setups.
-
Fix repeated evaluations of
&parse-at
expression. -
Fix stray Python escape sequence.
-
Fix skipping of literal fields with condition.
-
Fix type of generated code for
string::size
.While we defined
string
's size operator to return anuint64
and documented that it returns the length in codepoints, not bytes, we still generated C++ code which worked on the underlying bytes (i.e., it directly invokedstd::string::size
instead of usinghilti::rt::string::size
).
v1.10.0
Changed Functionality
-
Numerous improvements to improve throughput of generated parsers.
For this release we have revisited the code typically generated for parsers and the runtime libraries they use with the goal of improving throughput of parsers at runtime. Coarsely summarized this work was centered around
- reduction of allocations during parsing
- reduction of data copies during parsing
- use of dedicated, hand-check implementations for automatically generated code to avoid overhead from safety checks in the runtime libraries
With these changes we see throughput improvements of some parsers in the range of 20-30%. This work consisted of numerous incremental changes, see
CHANGES
for the full list of changes. -
GH-1667: Always advance input before attempting resynchronization.
When we enter resynchronization after hitting a parse error we previously would have left the input alone, even though we know it fails to parse. We then relied fully on resynchronization to advance the input.
With this patch we always forcibly advance the input to the next non-gap position. This has no effect for synchronization on literals, but allows it to happen earlier for regular expressions.
-
GH-1659: Lift requirement that
bytes
forwarded from filter be mutable. -
GH-1489: Deprecate &bit-order on bit ranges.
This had no effect and allowing it may be confusing to users. Deprecate it with the idea of eventual removal.
-
Extend location printing to include single-line ranges.
For a location of, e.g., "line 1, column 5 to 10", we now print
1:5-1:10
, whereas we used to print it as only1:5
, hence dropping information. -
GH-1500: Add
+=
operator forstring
.This allows appending to a
string
without having to allocate a new string. This might perform better most of the time. -
GH-1640: Implement skipping for any field with known size.
This patch adds
skip
support for fields with&size
attribute or of builtin type with known size. If a unit has a known size and it is specified in a&size
attribute this also allows to skip over unit fields.
Bug fixes
-
GH-1605: Allow for unresolved types for set
in
operator. -
GH-1617: Fix handling of
%synchronize-*
attributes for units in lists.We previously would not detect
%synchronize-at
or%synchronize-from
attributes if the unit was not directly in a field, i.e., we mishandled the common case of synchronizing on a unit in a list.We now handle these attributes, regardless of how the unit appears.
-
GH-1585: Put closing of unit sinks behind feature guard.
This code gets emitted, regardless of whether a sink was actually connected or not. Put it behind a feature guard so it does not enable the feature on its own.
-
GH-1652: Fix filters consuming too much data.
We would previously assume that a filter would consume all available data. This only holds if the filter is attached to a top-level unit, but in general not if some sub-unit uses a filter. With this patch we explicitly compute how much data is consumed.
-
GH-1668: Fix incorrect data consumption for
&max-size
.We would previously handle
&size
and&max-size
almost identical with the only difference that&max-size
sets up a slightly larger view to accommodate a sentinel. In particular, we also used identical code to set up the position where parsing should resume after such a field.This was incorrect as it is in general impossible to tell where parsing continues after a field with
&max-size
since it does not signify a fixed view like&size
. We now compute the next position for a&max-size
field by inspecting the limited view to detect how much data was extracted. -
GH-1522: Drop overzealous validator.
A validator was intended to reject a pattern of incorrect parsing of vectors, but instead ending up rejecting all vector parsing if the vector elements itself produced vectors. We dropped this validation.
-
GH-1632: Fix regex processing using
{n,m}
repeat syntax being off by one -
GH-1648: Provide meaningful unit
__begin
value when parsing starts.We previously would not provide
__begin
when starting the initial parse. This meant that e.g.,offset()
was not usable if nothing ever got parsed.We now provide a meaningful value.
-
Fix skipping of literal fields with condition.
-
GH-1645: Fix
&size
check.The current parsing offset could legitimately end up just beyond the
&size
amount. -
GH-1634: Fix infinite loop in regular expression parsing.
Documentation
-
Update documentation of
offset()
. -
Fix docs namespace for symbols from
filter
module.We previously would document these symbols to be in
spicy
even though they are infilter
. -
Add bitfield examples.
v1.8.3
-
GH-1645: Fix
&size
check.The current parsing offset could legitimately end up just beyond the
&size
amount. -
GH-1617: Fix handling of
%synchronize-*
attributes for units in lists.We previously would not detect
%synchronize-at
or%synchronize-from
attributes if the unit was not directly in a field, i.e., we mishandled the common case of synchronizing on a unit in a list.With this patch we now handle these attributes, regardless of how the unit appears.
v1.9.0
New Functionality
-
GH-1468: Allow to directly access members of anonymous bitfields.
We now automatically map fields of anonymous bitfields into their containing unit.
type Foo = unit { : bitfield(8) { x: 0..3; y: 4..7; }; on %done { print self.x, self.y; } };
-
GH-1467: Support bitfield constants in Spicy for parsing.
One can now define bitfield "constants" for parsing by providing integer expressions with fields:
type Foo = unit { x: bitfield(8) { a: 0..3 = 2; b: 4..7; c: 7 = 1; };
This will first parse the bitfield as usual and then enforce that the two bit ranges that are coming with expressions (i.e.,
a
andc
) indeed containing the expected values. If they don't, that's a parse error.We also support using such bitfield constants for look-ahead parsing:
type Foo = unit { x: uint8[]; y: bitfield(8) { a: 0..3 = 4; b: 4..7; }; };
This will parse uint8s until a value is discovered that has its bits set as defined by the bitfield constant.
(We use the term "constant" loosely here: only the bits with values are actually enforced to be constant, all others are parsed as usual.)
-
GH-1089, GH-1421: Make
offset()
independent of random access functionality.We now store the value returned by offset() directly in the unit instead of computing it on the fly when requested from cur - begin. With that offset() can be used without enabling random access functionality on the unit.
-
Add support for passing arbitrary C++ compiler flags.
This adds a magic environment variable HILTI_CXX_FLAGS which if set specifies compiler flags which should be passed during C++ compilation after implicit flags. This could be used to e.g., set defines, or set low-level compiler flags.
Even with this flag, for passing include directories one should still use
HILTI_CXX_INCLUDE_DIRS
since they are searched before any implicitly added paths. -
GH-1435: Add bitwise operators
&
,|
, and^
for booleans. -
GH-1465: Support skipping explicit
%done
in external hooks.Assuming
Foo::X
is a unit type, these two are now equivalent:on Foo::X::%done { } on Foo::X { }
Changed Functionality
-
GH-1567: Speed up runtime calls to start profilers.
-
GH-1565: Disable capturing backtraces with HILTI exceptions in non-debug builds.
-
GH-1343: Include condition in
&requires
failure message. -
GH-1466: Reject uses of
self
in unit&size
and&max-size
attribute.Values in
self
are only available after parsing has started while&size
and&max-size
are consumed before that. This means that any use ofself
and its members in these contexts would only ever see unset members, so it should not be the intended use. -
GH-1485: Add validator rejecting unsupported multiple uses of attributes.
-
GH-1465: Produce better error message when hooks are used on a unit field.
-
GH-1503: Handle anonymous bitfields inside
switch
statements.We now map items of anonymous bitfields inside a
switch
cases into the unit namespace, just like we already do for top-level fields. We also catch if two anonymous bitfields inside those cases carry the same name, which would make accesses ambiguous.So the following works now:
switch (self.n) { 0 -> : bitfield(8) { A: 0..7; }; * -> : bitfield(8) { B: 0..7; }; };
Whereas this does not work:
switch (self.n) { 0 -> : bitfield(8) { A: 0..7; }; * -> : bitfield(8) { A: 0..7; }; };
-
GH-1571: Remove trimming inside individual chunks.
Trimming a
Chunk
(always from the left) causes a lot of internal work with only limited benefit since we manage visibility with astream::View
on top of aChunk
anyway.We now trimming only removes a
Chunk
from aChain
, but does not internally change individual theChunk
anymore. This should benefit performance but might lead to slightly increased memory use, but callers usually have that data in memory anyway. -
Use
find_package(Python)
with version.Zeek's configure sets
Python_EXECUTABLE
has hint, but Spicy is usingfind_package(Python3)
and would only usePython3_EXECUTABLE
as hint. This results in Spicy finding a different (the default) Python executable when configuring Zeek with--with-python=/opt/custom/bin/python3
.Switch Spicy over to use
find_package(Python)
and add the minimum version so it knows to look forPython3
.
Bug fixes
- GH-1520: Fix handling of
spicy-dump --enable-print
. - Fix spicy-build to correctly infer library directory.
- GH-1446: Initialize generated struct members in constructor body.
- GH-1464: Add special handling for potential
advance
failure in trial mode. - GH-1275: Add missing lowering of Spicy unit ctor to HILTI struct ctor.
- Fix rendering in validation of
%byte-order
attribute. - GH-1384: Fix stringification of
DecodeErrorStrategy
. - Fix handling of
--show-backtraces
flag. - GH-1032: Allow using using bitfields with type declarations.
- GH-1484: Fix using of
&convert
on bitfields. - GH-1508: Fix returned value for
<unit>.position()
. - GH-1504: Use user-inaccessible chars for encoding
::
in feature variables. - GH-1550: Replace recursive deletion with explicit loop to avoid stack overflow.
- GH-1549: Add feature guards to accesses of a unit's
__position
.
Documentation
- Move Zeek-specific documentation into Zeek documentation.
- Clarify error handling docs.
- Mention unit switch statements in conditional parsing docs.
v1.8.2
-
GH-1571: Remove trimming inside individual chunks.
Trimming
Chunk
s (always from the left) causes a lot of internal work with only limited benefit since we manage visibility withstream::View
s on top ofChunk
s anyway.This patch removes trimming inside
Chunk
s so now any trimming only removesChunk
s fromChain
s, but does not internally change individualChunk
s anymore. This might lead to slightly increased memory use, but callers usually have that data in memory anyway. -
GH-1549: GH-1554: Fix potential infinite loop when trimming data before stream.
Previously we would trigger an infinite loop if one tried to trim before the head chunk of a stream. In praxis this seem to have been no issue due to #1549 and us emitting way less calls to trim than possible.
This patch adds an explicit check whether we need to trim anything, and exits the low-level function early for such cases.
-
GH-1550: Replace recursive deletion with explicit loop to avoid stack overflow.
-
GH-1549: Add feature guards to accesses of a unit's
__position
.Access of
__position
triggers a random access functionality. In order to distinguish our internal uses from accesses due to user code, most access in our generated code should be guarded with a feature constant (if
or ternary).In this patch add proper guards for a couple instances where we did not do that correctly. That mishap caused all units with containers to be random access (even the root unit) which in turn could have lead to e.g., unbounded memory growth, or runtime overhead due to generation and execution of unneeded code, or expensive cleanup on very large untrimmed inputs.
-
Artificially limit the number of open files.
This works around a silent failure in reproc where it would refuse to run on systems which huge rlimits for the number of open files. We have seen this hit on huge production boxes.
-
Add begin to parser state.
This patch adds the current begin position to the parser state, and makes the corresponding changes to generated parser functions so it is passed down.
We already modelled the semantic beginning of the input in the unit, but had no reliable way to keep this up-to-date across non-unit contexts like
&parse-from
. This would then for certain setups lead to generated code whereinput
andposition
would point to different inputs which in turn causedoffset
(modelled asposition - input
) to be incorrect. -
Expand validator error message.
-
Disable a few newer clang-tidy categories.
The options disabled here and triggered in newer versions of clang-tidy.
-
Drop
-noall_load
linker option.We added this linker option on macos. This option was already obsolete, e.g., in the
ld
manpage:-noall_load This is the default. This option is obsolete.
Newer versions of xcode do not know this option anymore and instead generate a hard error.
-
Declare Spicy pygments extension as parallel-safe.
We previously would not declare that the Spicy pygments highlighter is safe to execute in parallel (reading or writing of sources). Sphinx then assumed that the extension was not safe to run in parallel and instead ran jobs sequentially.
This patch declares the extension as able to execute in parallel. Since the extension does not manage any external state this is safe.
-
Use
find_package(Python)
with version.Zeek's configure sets
Python_EXECUTABLE
has hint, but Spicy is usingfind_package(Python3)
and would only usePython3_EXECUTABLE
as hint. This results in Spicy finding a different (the default) Python executable when configuring Zeek with--with-python=/opt/custom/bin/python3
.Switch Spicy over to use find_package(Python) and add the minimum version so it knows to look for Python3.
v1.8.1
v1.8.0
New Functionality
Add new skip
keyword to let unit items efficiently skip over uninteresting data.
For cases where your parser just needs to skip over some data, without needing access to its content, Spicy provides a skip
keyword to prefix corresponding fields with:
module Test;
public type Foo = unit {
x: int8;
: skip bytes &size=5;
y: int8; on %done { print self; }
};
skip
works for all kinds of fields but is particularly efficient with bytes
fields, for which it will generate optimized code avoiding the overhead of storing any data.
skip
fields may have conditions and hooks attached, like any other fields. However, they do not support $$
in expressions and hooks.
For readability, a skip
field may be named (e.g., padding: skip bytes &size=3;
), but even with a name, its value cannot be accessed.
skip
fields extend support for void
with attributes fields which are now deprecated.
Add runtime profiling infrastructure.
This add an option --enable-profiling
to the HILTI and Spicy compilers. Use of the option does two things: (1) it sets a flag enabling inserting additional profiling instrumentation into generated C++ code, and (2) it enables using instrumentation for recording profiling information during execution of the compiled code, including dumping out a profiling report at the end. The profiling information collected includes time spent in HILTI functions as well as for parsing Spicy units and unit fields.
Changed Functionality
Optimizations for improved runtime performance.
This release contains a number of changes to improve the runtime performance of generated parsers. This includes tweaks for generating more performant code for parsers, low-level optimizations of types in to runtime support library as well as fine-tuning of parser execution at runtime.
- Do not force locale on users of libhilti.
- Avoid expensive checked iterator for internal
Bytes
iteration. - GH-1089: Allow to use
offset()
without enabling full random-access support. - GH-1394: Fix C++ normalization of generated enum values.
- Disallow using
$$
with anonymous containers.
Bug fixes
- GH-1386: Prevent internal error when passed invalid context.
- Fix potential use-after-move bug.
- GH-1390: Initialize
Bytes
internal control block for all constructors. - GH-1396: Fix regex performance regression introduced by constant folding.
- GH-1399: Guard access to unit
_filters
member with feature flag. - GH-1421: Store numerical offset in units instead of iterator for position.
- GH-1436: Make sure
Bytes::sub
only throws HILTI exceptions. - GH-1447: Do not forcibly make
strong_ref
in
function parameters immutable. - GH-1452: Allow resolving of unit parameters before
self
is fully resolved. - Make sure Spicy runtime config is initialized after
spicy::rt::init
. - Adjustments for building with GCC-13.
Documentation
- Document how to check whether an
optional
value is set. - Preserve indention when extracting comments in doc generation.
- Fix docs for long-form of
-x
flag to spicyc.
v1.5.4
- GH-1436: Make sure
Bytes::sub
only throws HILTI exceptions. - Allow building with gcc-13.
- Allow optimizer to remove unused filter functionality in units.
- Avoid expensive checked iterator for internal
Bytes
iteration. - GH-1390: Initialize
Bytes
internal control block for all constructors. - Do not force locale on users of libhilti.
- Fix potential use-after-move bug.
- GH-1310: Fix ASAN false positive with GCC.
- Skip clang-specific ASAN flags with other compilers.
- Don't instantiate a debug logger if we aren't going to debug log.
- Simplify extract methods.
- Shortcut some offset computations.
- GH-1345: Apply alternative fix for #1345.
- Make
printParserState
cheaper to call if debug logging is disabled. - GH-1367: Use unique filename for all object files generated during JIT.
- Fix code generation for
-X flow
or-X trace
. - Remove potential race during JIT when using
HILTI_CXX_COMPILER_LAUNCHER
.