Releases: semgrep/semgrep
Release v1.52.0
1.52.0 - 2023-12-05
Added
- Java: Semgrep will now recognize
String.format(...)
expressions as constant
strings when all their arguments are constant, but it will still not know
what exact string it is. For example, codeString.format("Abc %s", "123")
will match pattern"..."
but it will not match pattern"Abc 123"
. (pa-3284)
Changed
- Inter-file diff scan will be gradually introduced to a small percentage of
users through a slow rollout process. Users who enable the pro engine and
engage in differential PR scans on Github or Gitlab may experience the impact
of this update. (ea-268) - secrets: now performs more aggressive deduplication for instances where an
invalid and valid match are reported at the same range. Instead of reporting
both, we now report only the valid match when they are otherwise visually
identical. (scrt-271)
Fixed
-
In expression-based languages, definitions are also expressions.
This change allows dataflow to properly handle definition expressions.
For example, the pattern
0 == 0
will matchx == 0
indef f(c) do x = (y = 0) x == 0 end
because now dataflow is able to handle the expression
y = 0
. (pa-3262) -
In version 1.14.0 (pa-2477) we made sink-matching more precise when the sink
specification was like:pattern-sinks: - patterns: - pattern: sink($X, ...) - focus-metavariable: $X
Where the sink specification most likely has the intent to specify the first
argument ofsink
as a sink, andsink(ok1 if tainted else ok2)
should NOT
produce a finding, becausetainted
is not really what is being passed to
thesink
function.But we only intercepted the most simple pattern above, and more complex sink
specifications that had the same intent were not properly recognized.Now we have generalized that pattern to cover more complex cases like:
patterns: - pattern-either: - patterns: - pattern-inside: | def foo(...): ... - pattern: sink1($X) - patterns: - pattern: sink2($X) - pattern-not: bar(...) - focus-metavariable: $X ``` (pa-3284)
-
Updated the parser used for Rust (rust)
Release v1.51.0
1.51.0 - 2023-11-29
Added
- taint-mode: Added experimental rule option
taint_match_on: source
that makes
Semgrep report taint findings on the taint source rather than on the sink. (pa-3272)
Changed
- Elixir got moved to Pro. (elixir_pro)
- The 'fix_regex' field has been removed from the semgrep JSON output. Instead,
the 'fix' field contains the content the result of the fix_regex. (fix_regex) - taint-mode: Tweaked experimental option
taint_only_propagate_through_assignments
so that when it is enabled,tainted.field
andtainted(args)
will no longer
propagate taint. (pa-2193)
Fixed
-
Fixed Kotlin parse error.
Previously, code like this would throw a parse error
fun f1(context : Context) { Foo(context).elem = var1 }
due to not recognizing
Foo(context).elem = ...
as valid.
Now calls are recognized as valid in the left hand of
assignments. (ea-104) -
Python:
async
statements are now translated into the Dataflow IL so Semgrep
will be able to report findings e.g. insideasync with ...
statements. (gh-9182) -
In gitlab output, use correct url attached to rule instead of generating it.
This fixes url for supply chain findings. (gitlab) -
- The language server will no longer crash on startup for intellij (language-server)
-
- The language server no longer crashes when installed through pip on Mac platforms (language-server-macos)
-
taint-mode: When we encountered an assignment
lval := expr
whereexpr
returned
no taints, we automatically cleanedlval
. This was correct in the early days of
taint-mode, before we introduced taint by side-effect, but it is wrong now. The LHS
lval
may be tainted by side-effect, in which case we cannot clean it just because
expr
returns no taint. Now that we introducedby-side-effect: only
it is also
possible forexpr
to taintlval
by side-effect and return no immediate taint.This kind of source should now work as expected:
- by-side-effect: true patterns: - pattern: | $X = source() - focus-metavariable: $X ``` (pa-3164)
-
taint-mode: Fixed a bug in the recently added
by-side-effect: only
option
causing that when matching l-values of the forml.x
andl[i]
, thel
occurence would unexpectedly become tainted too. This led to FPs in some
typestate rules like those checking for double-lock or double-free.Now a source such as:
- by-side-effect: only patterns: - pattern: lock($L) - focus-metavariable: $L
will not produce FPs on code such as:
lock(obj.l) unlock(obj.l) lock(obj.l) ``` (pa-3282)
-
taint-mode: Removed a hack that made
lval = new ...
assignments to not clean
thelval
despite the RHS was not tainted. This caused FPs in double-free rules.
For example, given this source:pattern-sources: - by-side-effect: only patterns: - pattern: delete $VAR; - focus-metavariable: $VAR
And the code below:
while (nondet) { int *v = new int; delete v; // FP }
The
delete v
statement was reported as a double-free, because Semgrep did not
consider thatv = new int
would clean the taint inv
. (pa-3283)
Release v1.50.0
1.50.0 - 2023-11-17
No significant changes.
Release v1.49.0
1.49.0 - 2023-11-15
Added
-
Added support in Ruby, Julia, and Rust to match implicit return statement inside functions.
For example:
return 0
can now match 0 in
function f() 0 end
This matching is enabled by default and can be disabled with the rule option
implicit_return
. (gh-8408) -
Pro engine supports constant propagation of numbers defined via macro in C++ (gh-9221)
-
taint-mode: The
by-side-effect
option for taint sources (only) now accepts a
third valueonly
(besidestrue
andfalse
). Settingby-side-effect: only
will define a taint source that only propagates by side effect. This option
should allow (ab)using taint-mode for writing some typestate rules.For example, this taint rule:
pattern-sources: - by-side-effect: only patterns: - pattern: lock($L) - focus-metavariable: $L pattern-sanitizers: - by-side-effect: true patterns: - pattern: unlock($L) - focus-metavariable: $L pattern-sinks: - pattern: lock($L)
will match the second
lock(x)
in this code:lock(x) # no finding lock(x) # finding
The first
lock(x)
will not result in any finding, because the occurrence ofx
in
itself will not be tainted. Only after the function call we will record thatx
is
tainted (as a side-effect oflock
). The secondlock(x)
will result in a finding
because thex
has been tainted by the previouslock(x)
. (pa-2980)
Changed
-
In the metrics sent we now record the languages for which we invoked the interfile engine.
This will enable us to measure the performance impact and error rates of new interfile
languages. (For scans which don't send metrics, there is no change.) See the PRIAVCY.md
for more information. (ea-251) -
Removed support for named snippets (
org_name:rule_id
) fromsemgrep scan
which were removed from semgrep.dev a few months ago. (gh-9203) -
Added support for
--config <code|secrets>
to semgrep scan. When using
code or secrets, the environment variableSEMGREP_REPO_NAME
must be set.For example,
$ SEMGREP_REPO_NAME=test_repo semgrep --config secrets
Internally,
semgrep scan --config <product>
now uses the same endpoint as the
semgrep ci
to fetch the scan configuration. (gh-9205) -
Improved handling of unused lambdas to reduce false positives
Previously, we used to insert the CFGs of unused lambdas at the declaration
site. However, this approach triggered some false positives. For example,
consider the following code:void incorrect(int *p) { auto f1 = [&p]() { source(p); }; auto f2 = [&p]() { sink(p); }; }
In this code, there's no actual control flow between the source and sink, and
the lambdas are never even called. But when we inserted their CFGs at the
declaration site, it incorrectly indicated a taint finding. To prevent these
types of false positives while still scanning the body of unused lambdas, we
now insert their CFGs in parallel at the end of their parent function, right
after all other statements and just before the end node. (pa-3089) -
Bumped timeout (per-rule and per-file) from 2s to 5s. Recently we lowered it
from 30s down to 2s, but based on what we have observed so far, we believe 5s
is a better timeout for the time being. (timeout)
Fixed
-
Fixed a bug where enabling the secret beta causes the default scan mode to be
set to OSS, even when the Pro flag is turned on in the web UI. (ea-248) -
Semgrep does not stop a scan anymore for parsing errors due to
unconventional exceptions (e.g., Failure "not a program") in some
parsers. Instead, such errors are reported as "Other syntax error". (lang-13) -
Fix regression for the unused lambda change in react-nextjs-router-push test
A lambda expression defined in a return expression is also treated as used at
the location of the return expression. (pa-3089) -
Updated the Rust parser with miscellaneous improvements. In particular, Semgrep can now parse
yield
expressions in Rust. (rust) -
taint-mode: If an expressions is tainted by multiple labels A and B, with B
requiring A, the expression will now get boths labels A and B. (taint-labels)
Release v1.48.0
1.48.0 - 2023-11-06
Note
Starting from version 1.46.0, Semgrep is first released in the following ecosystems:
- pypy
- brew
- returntocorp/semgrep:canary
(Docker)
If no issues are detected after a few days, the Semgrep team then promotes the :canary
Docker tag to :latest
when everything looks fine.
Added
-
Matching: Matches with the same range but bindings in different locations
will now no longer deduplicate.For instance, the pattern
$FUNC(..., $A, ...)
would produce only
one match on the target file:foo(true, true)
because you would have two matches to the range of the call, and both
bindings of$A
would be totrue
.Now, the deduplication logic sees that the bindings of
$A
are in
different places, and thus should not be considered the same, and
produce two matches. (pa-3230)
Fixed
- Fixed out of bounds list access error in Cargo.lock parser (sc-1072)
- Secrets: metadata overrides specified in validators were incorrectly applied on
top of one another (on a per-rule basis), so that only the last was applied.
Each update is now correctly applied independently to each finding based on the
rule's validators. (scrt-231)
Release v1.47.0
1.47.0 - 2023-11-01
Note
Starting from version 1.46.0, Semgrep is first released in the following ecosystems:
- pypy
- brew
- returntocorp/semgrep:canary
(Docker)
If no issues are detected after a few days, the Semgrep team then promotes the :canary
Docker tag to ``latest` when everything looks fine.
Added
-
taint-mode: Added a Boolean
exact
option to sources and sanitizers to make
matching stricter (default isfalse
).If you specify a source such as
foo(...)
, and Semgrep encountersfoo(x)
,
by defaultfoo(x)
,foo
, andx
, will all be considered tainted. If you add
exact: true
to the source specification, then onlyfoo(x)
will be regarded
as tainted, that is the "exact" match for the specification. The same applies
to "exact" sanitizers. (gh-5897) -
Added
sg
alias for semgrep binary which is functionally equivalent toalias sg="/opt/homebrew/bin/semgrep"
with one fewer step. (gh-9117)
-
secrets: Added independent targeting from other semgrep products.
This change allows Secrets to scan all tracked files. In particular, those ignored
by semgrepignore will now get scanned. There will be additional changes
in the future to allow configuring the files that are scanned secrets. (gh-9125) -
Adds an optional
--no-secrets-validation
flag to skip secrets validation. (no-secrets-validation) -
Secrets rules (i.e., with metadata product: secrets) now mask, by replacing
with *s the ending component of the matched content. (pa-2333) -
Commutativity Support for Comparison Operators EQ and NOT_EQ
We've introduced the
commutative_compop
rule option, enabling commutativity
for comparison operators EQ and NOT_EQ. With this option,a == b
will also
matchb == a
, anda != b
will also matchb != a
. (pa-3140) -
Validation errors are separated from unvalided findings in the terminal output. (validation-error)
Changed
-
For taint rules using labels (experimental) Semgrep now preferably picks a
source withoutrequires
for the taint traceSemgrep now prioritizes taint sources without
requires
condition when
choosing a representative taint trace from multiple source traces. This helps
users to more clearly identify the initial taint source when multiple traces
are involved. (pa-3122) -
Unreachable supply chain findings report only on line dependency was found in (no longer incorrectly including the next line)
this change could affect syntactic_id generated by said findings (sc-727) -
When running
semgrep ci --supply-chain
, defaults to using OSS engine even if
PRO engine would otherwise be used (turned on in semgrep.dev, or with--pro
flag) (supply-chain-oss)
Fixed
-
- Semgrep no longer supports python 3.7 (gh-8698)
- Semgrep will now refuse to run incompatible versions of the Pro Engine, rather than crashing with a confusing error message. (gh-8873)
- Fixed an issue that prevented the use of
semgrep install-semgrep-pro --custom-binary ...
when logged out. (gh-9051) - The --severity=XXX scan flag is working again. (gh-9062)
- The --sarif does not crash when semgrep itself encountered errors
while processing targets. (gh-9091) - Fixed how the end positions assigned to metavariable bindings are computed, in
order to handle trailing newlines. This affected Semgrep's JSON output. If a
metavariable$X
was bound to a piece of text containing a trailing newline,
such as "a\n", where the starting position was e.g. at line 1, Semgrep reported
that the end position was at line 2, when in fact the text is entirely within
line 1. If the text happened to be at the end of a file, Semgrep could report
an end position that was outside the bounds of the file. (lang-18) -
- Semgrep Language Server now only scans open files on startup
- Semgrep Language Server no longer scans with pro engine rules (ls)
- Rust:
unsafe
blocks are now translated into the Dataflow IL so e.g. it becomes
possible for taint analysis to track taint from/to anunsafe
block. (pa-3218) - Correctly handle parsing toolchain directive in go.mod files (parsegomode)
Release v1.46.0
1.46.0 - 2023-10-24
Note
Starting this release (1.46.0) Semgrep is first released in the following ecosystems:
- pypy
- brew
- returntocorp/semgrep:canary
(Docker)
If no issues are detected after a few days, the Semgrep team then promotes the :canary
Docker tag to :latest
when everything looks fine.
Added
semgrep install-semgrep-pro
now takes an optional--custom-binary
flag to install the specifiedsemgrep-core-proprietary
binary rather than downloading it. (custom-pro-binary)
Fixed
-
pyproject.toml parser now handles optional newlines right after section headers. (gh-10879)
-
Updated the parsers for poetry.lock, pipfile.lock, and requirements.txt to ignore case sensitivity from package names.
This matches their respective specifications. Test cases were added to account for this change. (gh-8984) -
Reduced the limits for the prefilter optimization so that rules that cause
computing the prefilter to blow up will abort more quickly. This improves
performance by 2-3 seconds for each of the slowest rules. May cause a
slowdown if a rule that previously could be filtered out no longer will be,
but based on testing this is unlikely. (gh-9040) -
Fixed issue where conditional expressions aren't handled properly in expression based language.
Rust example:
Before:
fn expr_stmt_if(c) { y = 0; x = if c { y = 1 }; // Before: this matches when it shouldn't because y is not always 1. // After: this does not match, which is the correct behavior. y == 1; } ``` (pa-3205)
-
Fixed type error in creation of DependencyParserError object in the pnpm-lock.yaml parser (sc-1115)
Release v1.45.0
1.45.0 - 2023-10-18
Changed
-
Previously, to ignore a finding from a rule
foo.bar.my-rule
,nosemgrep
ignored a finding only if its fully qualified name was used:nosemgrep: foo.bar.my-rule
. Now,nosemgrep
can also accept just the rule ID:nosemgrep: my-rule
. (#8979) -
[Breaking Change] Improved Matching of C++ Constructors (pa-3114)
- In this update, the Semgrep team has enhanced Semgrep's ability to match C++ constructors more accurately.
- C++ introduces a syntactic ambiguity between function and variable definitions, particularly with constructors. The C++ compiler determines how to interpret an expression based on contextual information, such as whether the immediate parent scope is a function or a class, and whether the identifiers within the parentheses represent variables or types.
- Due to this complexity, static analyzers face challenges in precisely parsing these expressions without additional information.
- This commit introduces several workarounds to provide a better solution for handling this ambiguity:
- By default, when parsing a target file, Semgrep will consider an expression likefoo bar(x, y, z);
defined within the body of a function as a variable definition with a constructor. This is because variable initialization is a more common use case within the body of a function.
- Users can specify rule options that annotate, in patterns where the expression can be interpreted in both ways, which interpretation should take precedence. For instance,foo bar(x, y, z);
will be parsed as a function definition when theas_fundef
option is used and as a variable definition with a constructor when theas_vardef_with_ctor
option is applied. It's worth noting that an expression likefoo bar(1, y, z);
will be parsed as a variable definition without any additional annotation since1
cannot be a type. - Here's an example rule and its corresponding target file to illustrate these changes:
rules: - id: cpp-match-func-def message: Semgrep found a match options: cpp_parsing_pref: as_fundef languages: - cpp severity: WARNING pattern-either: - pattern: foo $X($Y); - pattern: foo $X($Y, $Z); - id: cpp-match-ctor message: Semgrep found a match options: cpp_parsing_pref: as_vardef_with_ctor languages: - cpp severity: WARNING patterns: - pattern: foo $X(...); - pattern-not: foo $X(3, ...); - id: cpp-match-ctor-3 message: Semgrep found a match languages: - cpp severity: WARNING pattern: foo $X(3, ...);
class Test { // ruleid: cpp-match-func-def foo bar(x); // ruleid: cpp-match-func-def foo bar(x, y); void test() { // ruleid: cpp-match-ctor foo bar(1); // ruleid: cpp-match-ctor foo bar(1, 2); // ruleid: cpp-match-ctor foo bar(x); // ruleid: cpp-match-ctor foo bar(x, y); // ruleid: cpp-match-ctor foo bar(x, 2); // ruleid: cpp-match-ctor foo bar(1, y); // ruleid: cpp-match-ctor-3 foo bar(3); // ruleid: cpp-match-ctor-3 foo bar(3, 4); // ruleid: cpp-match-ctor-3 foo bar(3, y); } };
Fixed
-
Semgrep Docker image: Reduction of the docker image size by using
--no-cache
when apk upgrading. Thanks to Peter Dave Hello for the contribution. -
Fixed a bug with pre-filtering introduced in 1.42.0 that caused significant slowdowns, particularly for Kotlin repos. Kotlin repos running default pro rules may see a 30 minute speedup. (ea-208)
-
Taint analysis: track
ptr->field
l-values in C++- In C++, we now track tainted field access via pointer dereference. For instance, consider the following code snippet:
void test_intra_001() { TestObject *obj = new TestObject(); obj->a = taint_source(); obj->b = SAFE_STR; // ok: cpp-tainted-field-ptr sink(obj->b, __LINE__); // ruleid: cpp-tainted-field-ptr sink(obj->a, __LINE__); }
This can be matched by the rule (gh-1058):
rules: - id: cpp-tainted-field-ptr languages: - cpp message: testing flows though C++ ptrs severity: INFO mode: taint pattern-sources: - pattern: taint_source() pattern-sinks: - patterns: - pattern: sink($X, ...) - focus-metavariable: - $X
-
Do not crash anymore with an
Invalid_arg
exception when the terminal has very few columns (e.g., in some precommit context). (#8792) -
Add
--supply-chain
flag tosemgrep ci --help
documentation (#8975) -
Avoid catastrophic
Invalid_argument: index out of bounds
errors when reporting the location of findings (#9011) -
IntelliJ and VSCode extensions: The Semgrep Language Server (LSP) no longer freezes while scanning long files.
-
Pre-filtering is now less aggressive and tries not to skip files that could be matched by a rule due to constant-propagation. Previously, a rule searching for the string
"foobar"
would skip a file that did not contain exactly"foobar"
, but that contained e.g."foo" + "bar"
. (#8767) -
semgrep ci
does not crash anymore when ran from git repositories coming from Azure projects with whitespaces in the name. (#8971) -
The
--test
flag now processes test target files even if they do not match thepaths:
directive of a rule. This is especially useful for rules using theinclude:
which is now disabled in a test context. (#8192)
Release v1.44.0
1.44.0 - 2023-10-11
Added
-
A new --matching-explanations CLI flag has been added, to get matching
explanations. This was internally used by the Semgrep Playground to
help debug rules, but is now available also directly from the CLI. (explanations) -
Using C++ tree-sitter as a failsafe pattern parser for C (gh-8905)
-
Allowing multiple type fields in metavariable-type rule syntax
Users have the flexibility to utilize multiple type fields to match the type of
metavariables. For instance:metavariable-type:
metavariable: $X
types:
- typeA
- typeBThis approach is also supported in rule 2.0. (gh-8913)
-
Support for parsing pubspec (Dart/Flutter) lockfiles (gh-8925)
-
Added support for matching template type arguments using metavariables in C++.
Users can now successfully match code snippets like:#include <memory> using namespace std; void foo() { int *i = 0; // ruleid: match-with-template shared_ptr<int> p; }
with the pattern:
shared_ptr<$TY> $LOCAL_VAR; ``` (pa-3102)
Fixed
-
Avoid fatal "missing plugin" exceptions when scanning some Apex rules
for which no Apex pattern is used by the rule such as apattern-regex:
and nothing else. (gh-8945) -
Semgrep can now parse optional assignments in Swift (e.g.
a.b? = 1
). (lang-1) -
Sequential tainting is now supported in Elixir.
def f() do x = "tainted" y = x # This now matches. sink(y) end ``` (pa-3130)
-
Target files that disappeared before the scan or that have special byte
characters in their filename do not cause the whole scan to crash anymore.
The file is skipped instead. (pa-3144) -
go.mod parsing now correctly allows arbitrary newlines and whitespace between dependencies (sc-1076)
-
fix: Improve typed metavariable matching against expressions consisting of names only. (type-inference)
Release v1.43.0
1.43.0 - 2023-10-03
Added
- Dart: Full Semgrep support for Dart has been added, whereas previously
most Semgrep constructs (and Semgrep itself) would not work correctly. (pa-2968)
Changed
- We have reduced the default timeout (per-rule and per-file) to 2s (down from 30s).
Typically, running a rule on a file should take a fraction of a second. When a rule
takes more than a couple of seconds is often because the rule is not optimally
written, or because the file is unusually large (a minified file or machine-
generated code), so waiting 30s for it does not tend to bring any value. Plus, by
cutting it earlier, we may prevent a potential OOM crash when running the rule is
very memory intensive. (pa-3155)
Fixed
- The language server will no longer surface committed findings when a user types but does not save (pdx-ls-git)