Releases · semgrep/semgrep

05 Dec 22:08

github-actions

v1.52.0

d250452

Release v1.52.0

1.52.0 - 2023-12-05

Added

Java: Semgrep will now recognize String.format(...) expressions as constant
strings when all their arguments are constant, but it will still not know
what exact string it is. For example, code String.format("Abc %s", "123")
will match pattern "..." but it will not match pattern "Abc 123". (pa-3284)

Changed

Inter-file diff scan will be gradually introduced to a small percentage of
users through a slow rollout process. Users who enable the pro engine and
engage in differential PR scans on Github or Gitlab may experience the impact
of this update. (ea-268)
secrets: now performs more aggressive deduplication for instances where an
invalid and valid match are reported at the same range. Instead of reporting
both, we now report only the valid match when they are otherwise visually
identical. (scrt-271)

Fixed

In expression-based languages, definitions are also expressions.

This change allows dataflow to properly handle definition expressions.

For example, the pattern 0 == 0 will match x == 0 in
```
def f(c) do
  x = (y = 0)
  x == 0
end
```
because now dataflow is able to handle the expression y = 0. (pa-3262)
In version 1.14.0 (pa-2477) we made sink-matching more precise when the sink
specification was like:
```
pattern-sinks:
  - patterns:
     - pattern: sink($X, ...)
     - focus-metavariable: $X
```
Where the sink specification most likely has the intent to specify the first
argument of sink as a sink, and sink(ok1 if tainted else ok2) should NOT
produce a finding, because tainted is not really what is being passed to
the sink function.

But we only intercepted the most simple pattern above, and more complex sink
specifications that had the same intent were not properly recognized.

Now we have generalized that pattern to cover more complex cases like:
```
patterns:
 - pattern-either:
   - patterns:
     - pattern-inside: |
         def foo(...):
           ...
     - pattern: sink1($X)
   - patterns:
     - pattern: sink2($X)
     - pattern-not: bar(...)
 - focus-metavariable: $X
``` (pa-3284)
```
Updated the parser used for Rust (rust)

Assets 2

29 Nov 15:00

github-actions

v1.51.0

0dde27b

Release v1.51.0

1.51.0 - 2023-11-29

Added

taint-mode: Added experimental rule option taint_match_on: source that makes
Semgrep report taint findings on the taint source rather than on the sink. (pa-3272)

Changed

Elixir got moved to Pro. (elixir_pro)
The 'fix_regex' field has been removed from the semgrep JSON output. Instead,
the 'fix' field contains the content the result of the fix_regex. (fix_regex)
taint-mode: Tweaked experimental option taint_only_propagate_through_assignments
so that when it is enabled, tainted.field and tainted(args) will no longer
propagate taint. (pa-2193)

Fixed

Fixed Kotlin parse error.

Previously, code like this would throw a parse error
```
fun f1(context : Context) {
    Foo(context).elem = var1
}
```
due to not recognizing Foo(context).elem = ... as valid.
Now calls are recognized as valid in the left hand of
assignments. (ea-104)
Python: async statements are now translated into the Dataflow IL so Semgrep
will be able to report findings e.g. inside async with ... statements. (gh-9182)
In gitlab output, use correct url attached to rule instead of generating it.
This fixes url for supply chain findings. (gitlab)
- The language server will no longer crash on startup for intellij (language-server)
- The language server no longer crashes when installed through pip on Mac platforms (language-server-macos)
taint-mode: When we encountered an assignment lval := expr where expr returned
no taints, we automatically cleaned lval. This was correct in the early days of
taint-mode, before we introduced taint by side-effect, but it is wrong now. The LHS
lval may be tainted by side-effect, in which case we cannot clean it just because
expr returns no taint. Now that we introduced by-side-effect: only it is also
possible for expr to taint lval by side-effect and return no immediate taint.

This kind of source should now work as expected:
```
- by-side-effect: true
  patterns:
    - pattern: |
        $X = source()
    - focus-metavariable: $X
``` (pa-3164)
```
taint-mode: Fixed a bug in the recently added by-side-effect: only option
causing that when matching l-values of the form l.x and l[i], the l
occurence would unexpectedly become tainted too. This led to FPs in some
typestate rules like those checking for double-lock or double-free.

Now a source such as:
```
- by-side-effect: only
  patterns:
  - pattern: lock($L)
  - focus-metavariable: $L
```
will not produce FPs on code such as:
```
lock(obj.l)
unlock(obj.l)
lock(obj.l)
``` (pa-3282)
```
taint-mode: Removed a hack that made lval = new ... assignments to not clean
the lval despite the RHS was not tainted. This caused FPs in double-free rules.
For example, given this source:
```
pattern-sources:
  - by-side-effect: only
    patterns:
      - pattern: delete $VAR;
      - focus-metavariable: $VAR
```
And the code below:
```
while (nondet) {
  int *v = new int;
  delete v; // FP
}
```
The delete v statement was reported as a double-free, because Semgrep did not
consider that v = new int would clean the taint in v. (pa-3283)

Assets 2

17 Nov 03:52

github-actions

v1.50.0

115a458

Release v1.50.0

1.50.0 - 2023-11-17

No significant changes.

Assets 2

15 Nov 13:34

github-actions

v1.49.0

51c0c6e

Release v1.49.0

1.49.0 - 2023-11-15

Added

Added support in Ruby, Julia, and Rust to match implicit return statement inside functions.

For example:
```
return 0
```
can now match 0 in
```
function f()
  0
end
```
This matching is enabled by default and can be disabled with the rule option implicit_return. (gh-8408)
Pro engine supports constant propagation of numbers defined via macro in C++ (gh-9221)
taint-mode: The by-side-effect option for taint sources (only) now accepts a
third value only (besides true and false). Setting by-side-effect: only
will define a taint source that only propagates by side effect. This option
should allow (ab)using taint-mode for writing some typestate rules.

For example, this taint rule:
```
pattern-sources:
  - by-side-effect: only
    patterns:
    - pattern: lock($L)
    - focus-metavariable: $L
pattern-sanitizers:
  - by-side-effect: true
    patterns:
    - pattern: unlock($L)
    - focus-metavariable: $L
pattern-sinks:
  - pattern: lock($L)
```
will match the second lock(x) in this code:
```
lock(x) # no finding
lock(x) # finding
```
The first lock(x) will not result in any finding, because the occurrence of x in
itself will not be tainted. Only after the function call we will record that x is
tainted (as a side-effect of lock). The second lock(x) will result in a finding
because the x has been tainted by the previous lock(x). (pa-2980)

Changed

In the metrics sent we now record the languages for which we invoked the interfile engine.
This will enable us to measure the performance impact and error rates of new interfile
languages. (For scans which don't send metrics, there is no change.) See the PRIAVCY.md
for more information. (ea-251)
Removed support for named snippets (org_name:rule_id) from semgrep scan which were removed from semgrep.dev a few months ago. (gh-9203)
Added support for --config <code|secrets> to semgrep scan. When using
code or secrets, the environment variable SEMGREP_REPO_NAME must be set.

For example,
```
$ SEMGREP_REPO_NAME=test_repo semgrep --config secrets
```
Internally, semgrep scan --config <product> now uses the same endpoint as the
semgrep ci to fetch the scan configuration. (gh-9205)
Improved handling of unused lambdas to reduce false positives

Previously, we used to insert the CFGs of unused lambdas at the declaration
site. However, this approach triggered some false positives. For example,
consider the following code:
```
void incorrect(int *p) {
  auto f1 = [&p]() {
    source(p);
  };
  auto f2 = [&p]() {
    sink(p);
  };
}
```
In this code, there's no actual control flow between the source and sink, and
the lambdas are never even called. But when we inserted their CFGs at the
declaration site, it incorrectly indicated a taint finding. To prevent these
types of false positives while still scanning the body of unused lambdas, we
now insert their CFGs in parallel at the end of their parent function, right
after all other statements and just before the end node. (pa-3089)
Bumped timeout (per-rule and per-file) from 2s to 5s. Recently we lowered it
from 30s down to 2s, but based on what we have observed so far, we believe 5s
is a better timeout for the time being. (timeout)

Fixed

Fixed a bug where enabling the secret beta causes the default scan mode to be
set to OSS, even when the Pro flag is turned on in the web UI. (ea-248)
Semgrep does not stop a scan anymore for parsing errors due to
unconventional exceptions (e.g., Failure "not a program") in some
parsers. Instead, such errors are reported as "Other syntax error". (lang-13)
Fix regression for the unused lambda change in react-nextjs-router-push test

A lambda expression defined in a return expression is also treated as used at
the location of the return expression. (pa-3089)
Updated the Rust parser with miscellaneous improvements. In particular, Semgrep can now parse yield expressions in Rust. (rust)
taint-mode: If an expressions is tainted by multiple labels A and B, with B
requiring A, the expression will now get boths labels A and B. (taint-labels)

Assets 2

06 Nov 17:15

github-actions

v1.48.0

21cefb5

Release v1.48.0

1.48.0 - 2023-11-06

Note

Starting from version 1.46.0, Semgrep is first released in the following ecosystems:
- pypy
- brew
- returntocorp/semgrep:canary (Docker)
If no issues are detected after a few days, the Semgrep team then promotes the :canary Docker tag to :latest when everything looks fine.

Added

Matching: Matches with the same range but bindings in different locations
will now no longer deduplicate.

For instance, the pattern $FUNC(..., $A, ...) would produce only
one match on the target file:
```
foo(true, true)
```
because you would have two matches to the range of the call, and both
bindings of $A would be to true.

Now, the deduplication logic sees that the bindings of $A are in
different places, and thus should not be considered the same, and
produce two matches. (pa-3230)

Fixed

Fixed out of bounds list access error in Cargo.lock parser (sc-1072)
Secrets: metadata overrides specified in validators were incorrectly applied on
top of one another (on a per-rule basis), so that only the last was applied.
Each update is now correctly applied independently to each finding based on the
rule's validators. (scrt-231)

Assets 2

02 Nov 00:26

github-actions

v1.47.0

2b41173

Release v1.47.0

1.47.0 - 2023-11-01

Note

Added

taint-mode: Added a Boolean exact option to sources and sanitizers to make
matching stricter (default is false).

If you specify a source such as foo(...), and Semgrep encounters foo(x),
by default foo(x), foo, and x, will all be considered tainted. If you add
exact: true to the source specification, then only foo(x) will be regarded
as tainted, that is the "exact" match for the specification. The same applies
to "exact" sanitizers. (gh-5897)
Added sg alias for semgrep binary which is functionally equivalent to
```
alias sg="/opt/homebrew/bin/semgrep"
```
with one fewer step. (gh-9117)
secrets: Added independent targeting from other semgrep products.

This change allows Secrets to scan all tracked files. In particular, those ignored
by semgrepignore will now get scanned. There will be additional changes
in the future to allow configuring the files that are scanned secrets. (gh-9125)
Adds an optional --no-secrets-validation flag to skip secrets validation. (no-secrets-validation)
Secrets rules (i.e., with metadata product: secrets) now mask, by replacing
with *s the ending component of the matched content. (pa-2333)
Commutativity Support for Comparison Operators EQ and NOT_EQ

We've introduced the commutative_compop rule option, enabling commutativity
for comparison operators EQ and NOT_EQ. With this option, a == b will also
match b == a, and a != b will also match b != a. (pa-3140)
Validation errors are separated from unvalided findings in the terminal output. (validation-error)

Changed

For taint rules using labels (experimental) Semgrep now preferably picks a
source without requires for the taint trace

Semgrep now prioritizes taint sources without requires condition when
choosing a representative taint trace from multiple source traces. This helps
users to more clearly identify the initial taint source when multiple traces
are involved. (pa-3122)
Unreachable supply chain findings report only on line dependency was found in (no longer incorrectly including the next line)
this change could affect syntactic_id generated by said findings (sc-727)
When running semgrep ci --supply-chain, defaults to using OSS engine even if
PRO engine would otherwise be used (turned on in semgrep.dev, or with --pro flag) (supply-chain-oss)

Fixed

- Semgrep no longer supports python 3.7 (gh-8698)
Semgrep will now refuse to run incompatible versions of the Pro Engine, rather than crashing with a confusing error message. (gh-8873)
Fixed an issue that prevented the use of semgrep install-semgrep-pro --custom-binary ... when logged out. (gh-9051)
The --severity=XXX scan flag is working again. (gh-9062)
The --sarif does not crash when semgrep itself encountered errors
while processing targets. (gh-9091)
Fixed how the end positions assigned to metavariable bindings are computed, in
order to handle trailing newlines. This affected Semgrep's JSON output. If a
metavariable $X was bound to a piece of text containing a trailing newline,
such as "a\n", where the starting position was e.g. at line 1, Semgrep reported
that the end position was at line 2, when in fact the text is entirely within
line 1. If the text happened to be at the end of a file, Semgrep could report
an end position that was outside the bounds of the file. (lang-18)
- Semgrep Language Server now only scans open files on startup
- Semgrep Language Server no longer scans with pro engine rules (ls)
Rust: unsafe blocks are now translated into the Dataflow IL so e.g. it becomes
possible for taint analysis to track taint from/to an unsafe block. (pa-3218)
Correctly handle parsing toolchain directive in go.mod files (parsegomode)

Assets 2

24 Oct 17:22

github-actions

v1.46.0

8479f8f

Release v1.46.0

1.46.0 - 2023-10-24

Note

Starting this release (1.46.0) Semgrep is first released in the following ecosystems:
- pypy
- brew
- returntocorp/semgrep:canary (Docker)
If no issues are detected after a few days, the Semgrep team then promotes the :canary Docker tag to :latest when everything looks fine.

Added

semgrep install-semgrep-pro now takes an optional --custom-binary flag to install the specified semgrep-core-proprietary binary rather than downloading it. (custom-pro-binary)

Fixed

pyproject.toml parser now handles optional newlines right after section headers. (gh-10879)
Updated the parsers for poetry.lock, pipfile.lock, and requirements.txt to ignore case sensitivity from package names.
This matches their respective specifications. Test cases were added to account for this change. (gh-8984)
Reduced the limits for the prefilter optimization so that rules that cause
computing the prefilter to blow up will abort more quickly. This improves
performance by 2-3 seconds for each of the slowest rules. May cause a
slowdown if a rule that previously could be filtered out no longer will be,
but based on testing this is unlikely. (gh-9040)

Fixed issue where conditional expressions aren't handled properly in expression based language.

Rust example:

Before:

fn expr_stmt_if(c) {
  y = 0;
  x = if c { y = 1 };

  // Before: this matches when it shouldn't because y is not always 1.
  // After: this does not match, which is the correct behavior.
  y == 1;
}
``` (pa-3205)

Fixed type error in creation of DependencyParserError object in the pnpm-lock.yaml parser (sc-1115)

Assets 2

18 Oct 20:08

github-actions

v1.45.0

ec48ff2

Release v1.45.0

1.45.0 - 2023-10-18

Changed

Previously, to ignore a finding from a rule foo.bar.my-rule, nosemgrep ignored a finding only if its fully qualified name was used: nosemgrep: foo.bar.my-rule. Now, nosemgrep can also accept just the rule ID: nosemgrep: my-rule. (#8979)
[Breaking Change] Improved Matching of C++ Constructors (pa-3114)
- In this update, the Semgrep team has enhanced Semgrep's ability to match C++ constructors more accurately.
- C++ introduces a syntactic ambiguity between function and variable definitions, particularly with constructors. The C++ compiler determines how to interpret an expression based on contextual information, such as whether the immediate parent scope is a function or a class, and whether the identifiers within the parentheses represent variables or types.
- Due to this complexity, static analyzers face challenges in precisely parsing these expressions without additional information.
- This commit introduces several workarounds to provide a better solution for handling this ambiguity:
  - By default, when parsing a target file, Semgrep will consider an expression like foo bar(x, y, z); defined within the body of a function as a variable definition with a constructor. This is because variable initialization is a more common use case within the body of a function.
  - Users can specify rule options that annotate, in patterns where the expression can be interpreted in both ways, which interpretation should take precedence. For instance, foo bar(x, y, z); will be parsed as a function definition when the as_fundef option is used and as a variable definition with a constructor when the as_vardef_with_ctor option is applied. It's worth noting that an expression like foo bar(1, y, z); will be parsed as a variable definition without any additional annotation since 1 cannot be a type.
- Here's an example rule and its corresponding target file to illustrate these changes:
```
rules:
  - id: cpp-match-func-def
    message: Semgrep found a match
    options:
      cpp_parsing_pref: as_fundef
    languages:
      - cpp
    severity: WARNING
    pattern-either:
      - pattern: foo $X($Y);
      - pattern: foo $X($Y, $Z);

  - id: cpp-match-ctor
    message: Semgrep found a match
    options:
      cpp_parsing_pref: as_vardef_with_ctor
    languages:
      - cpp
    severity: WARNING
    patterns:
      - pattern: foo $X(...);
      - pattern-not: foo $X(3, ...);

  - id: cpp-match-ctor-3
    message: Semgrep found a match
    languages:
      - cpp
    severity: WARNING
    pattern: foo $X(3, ...);
```
```
class Test {

  // ruleid: cpp-match-func-def
  foo bar(x);
  // ruleid: cpp-match-func-def
  foo bar(x, y);

  void test() {
    // ruleid: cpp-match-ctor
    foo bar(1);
    // ruleid: cpp-match-ctor
    foo bar(1, 2);

    // ruleid: cpp-match-ctor
    foo bar(x);
    // ruleid: cpp-match-ctor
    foo bar(x, y);

    // ruleid: cpp-match-ctor
    foo bar(x, 2);
    // ruleid: cpp-match-ctor
    foo bar(1, y);

    // ruleid: cpp-match-ctor-3
    foo bar(3);
    // ruleid: cpp-match-ctor-3
    foo bar(3, 4);
    // ruleid: cpp-match-ctor-3
    foo bar(3, y);
  }
};
```

Fixed

Semgrep Docker image: Reduction of the docker image size by using --no-cache when apk upgrading. Thanks to Peter Dave Hello for the contribution.
Fixed a bug with pre-filtering introduced in 1.42.0 that caused significant slowdowns, particularly for Kotlin repos. Kotlin repos running default pro rules may see a 30 minute speedup. (ea-208)

Taint analysis: track ptr->field l-values in C++

In C++, we now track tainted field access via pointer dereference. For instance, consider the following code snippet:

void test_intra_001() {
  TestObject *obj = new TestObject();

  obj->a = taint_source();
  obj->b = SAFE_STR;

  // ok: cpp-tainted-field-ptr
  sink(obj->b, __LINE__);
  // ruleid: cpp-tainted-field-ptr
  sink(obj->a, __LINE__);
}

This can be matched by the rule (gh-1058):

rules:
  - id: cpp-tainted-field-ptr
    languages:
      - cpp
    message: testing flows though C++ ptrs
    severity: INFO
    mode: taint
    pattern-sources:
      - pattern: taint_source()
    pattern-sinks:
      - patterns:
          - pattern: sink($X, ...)
          - focus-metavariable:
              - $X

Do not crash anymore with an Invalid_arg exception when the terminal has very few columns (e.g., in some precommit context). (#8792)
Add --supply-chain flag to semgrep ci --help documentation (#8975)
Avoid catastrophic Invalid_argument: index out of bounds errors when reporting the location of findings (#9011)
IntelliJ and VSCode extensions: The Semgrep Language Server (LSP) no longer freezes while scanning long files.
Pre-filtering is now less aggressive and tries not to skip files that could be matched by a rule due to constant-propagation. Previously, a rule searching for the string "foobar" would skip a file that did not contain exactly "foobar", but that contained e.g. "foo" + "bar". (#8767)
semgrep ci does not crash anymore when ran from git repositories coming from Azure projects with whitespaces in the name. (#8971)
The --test flag now processes test target files even if they do not match the paths: directive of a rule. This is especially useful for rules using the include: which is now disabled in a test context. (#8192)

Assets 2

11 Oct 14:55

github-actions

v1.44.0

6b5ae0d

Release v1.44.0

1.44.0 - 2023-10-11

Added

A new --matching-explanations CLI flag has been added, to get matching
explanations. This was internally used by the Semgrep Playground to
help debug rules, but is now available also directly from the CLI. (explanations)
Using C++ tree-sitter as a failsafe pattern parser for C (gh-8905)
Allowing multiple type fields in metavariable-type rule syntax

Users have the flexibility to utilize multiple type fields to match the type of
metavariables. For instance:

metavariable-type:
metavariable: $X
types:
- typeA
- typeB

This approach is also supported in rule 2.0. (gh-8913)
Support for parsing pubspec (Dart/Flutter) lockfiles (gh-8925)

Added support for matching template type arguments using metavariables in C++.
Users can now successfully match code snippets like:

#include <memory>
using namespace std;

void foo() {
    int *i = 0;

    // ruleid: match-with-template
    shared_ptr<int> p;
}

with the pattern:

shared_ptr<$TY> $LOCAL_VAR;
``` (pa-3102)

Fixed

Avoid fatal "missing plugin" exceptions when scanning some Apex rules
for which no Apex pattern is used by the rule such as a pattern-regex:
and nothing else. (gh-8945)
Semgrep can now parse optional assignments in Swift (e.g. a.b? = 1). (lang-1)

Sequential tainting is now supported in Elixir.

def f() do
  x = "tainted"
  y = x

  # This now matches.
  sink(y)
end
``` (pa-3130)

Target files that disappeared before the scan or that have special byte
characters in their filename do not cause the whole scan to crash anymore.
The file is skipped instead. (pa-3144)
go.mod parsing now correctly allows arbitrary newlines and whitespace between dependencies (sc-1076)
fix: Improve typed metavariable matching against expressions consisting of names only. (type-inference)

Assets 2

03 Oct 14:24

github-actions

v1.43.0

0363b97

Release v1.43.0

1.43.0 - 2023-10-03

Added

Dart: Full Semgrep support for Dart has been added, whereas previously
most Semgrep constructs (and Semgrep itself) would not work correctly. (pa-2968)

Changed

We have reduced the default timeout (per-rule and per-file) to 2s (down from 30s).
Typically, running a rule on a file should take a fraction of a second. When a rule
takes more than a couple of seconds is often because the rule is not optimally
written, or because the file is unusually large (a minified file or machine-
generated code), so waiting 30s for it does not tend to bring any value. Plus, by
cutting it earlier, we may prevent a potential OOM crash when running the rule is
very memory intensive. (pa-3155)

Fixed

The language server will no longer surface committed findings when a user types but does not save (pdx-ls-git)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.52.0 - 2023-12-05

Added

Changed

Fixed

1.51.0 - 2023-11-29

Added

Changed

Fixed

1.50.0 - 2023-11-17

1.49.0 - 2023-11-15

Added

Changed

Fixed

1.48.0 - 2023-11-06

Added

Fixed

1.47.0 - 2023-11-01

Added

Changed

Fixed

1.46.0 - 2023-10-24

Added

Fixed

1.45.0 - 2023-10-18

Changed

Fixed

1.44.0 - 2023-10-11

Added

Fixed

1.43.0 - 2023-10-03

Added

Changed

Fixed

Releases: semgrep/semgrep

Release v1.52.0

1.52.0 - 2023-12-05

Added

Changed

Fixed

Release v1.51.0

1.51.0 - 2023-11-29

Added

Changed

Fixed

Release v1.50.0

1.50.0 - 2023-11-17

Release v1.49.0

1.49.0 - 2023-11-15

Added

Changed

Fixed

Release v1.48.0

1.48.0 - 2023-11-06

Added

Fixed

Release v1.47.0

1.47.0 - 2023-11-01

Added

Changed

Fixed

Release v1.46.0

1.46.0 - 2023-10-24

Added

Fixed

Release v1.45.0

1.45.0 - 2023-10-18

Changed

Fixed

Release v1.44.0

1.44.0 - 2023-10-11

Added

Fixed

Release v1.43.0

1.43.0 - 2023-10-03

Added

Changed

Fixed