Skip to content

Releases: semgrep/semgrep

Release v1.52.0

05 Dec 22:08
Compare
Choose a tag to compare

1.52.0 - 2023-12-05

Added

  • Java: Semgrep will now recognize String.format(...) expressions as constant
    strings when all their arguments are constant, but it will still not know
    what exact string it is. For example, code String.format("Abc %s", "123")
    will match pattern "..." but it will not match pattern "Abc 123". (pa-3284)

Changed

  • Inter-file diff scan will be gradually introduced to a small percentage of
    users through a slow rollout process. Users who enable the pro engine and
    engage in differential PR scans on Github or Gitlab may experience the impact
    of this update. (ea-268)
  • secrets: now performs more aggressive deduplication for instances where an
    invalid and valid match are reported at the same range. Instead of reporting
    both, we now report only the valid match when they are otherwise visually
    identical. (scrt-271)

Fixed

  • In expression-based languages, definitions are also expressions.

    This change allows dataflow to properly handle definition expressions.

    For example, the pattern 0 == 0 will match x == 0 in

    def f(c) do
      x = (y = 0)
      x == 0
    end

    because now dataflow is able to handle the expression y = 0. (pa-3262)

  • In version 1.14.0 (pa-2477) we made sink-matching more precise when the sink
    specification was like:

    pattern-sinks:
      - patterns:
         - pattern: sink($X, ...)
         - focus-metavariable: $X

    Where the sink specification most likely has the intent to specify the first
    argument of sink as a sink, and sink(ok1 if tainted else ok2) should NOT
    produce a finding, because tainted is not really what is being passed to
    the sink function.

    But we only intercepted the most simple pattern above, and more complex sink
    specifications that had the same intent were not properly recognized.

    Now we have generalized that pattern to cover more complex cases like:

    patterns:
     - pattern-either:
       - patterns:
         - pattern-inside: |
             def foo(...):
               ...
         - pattern: sink1($X)
       - patterns:
         - pattern: sink2($X)
         - pattern-not: bar(...)
     - focus-metavariable: $X
    ``` (pa-3284)
  • Updated the parser used for Rust (rust)

Release v1.51.0

29 Nov 15:00
Compare
Choose a tag to compare

1.51.0 - 2023-11-29

Added

  • taint-mode: Added experimental rule option taint_match_on: source that makes
    Semgrep report taint findings on the taint source rather than on the sink. (pa-3272)

Changed

  • Elixir got moved to Pro. (elixir_pro)
  • The 'fix_regex' field has been removed from the semgrep JSON output. Instead,
    the 'fix' field contains the content the result of the fix_regex. (fix_regex)
  • taint-mode: Tweaked experimental option taint_only_propagate_through_assignments
    so that when it is enabled, tainted.field and tainted(args) will no longer
    propagate taint. (pa-2193)

Fixed

  • Fixed Kotlin parse error.

    Previously, code like this would throw a parse error

    fun f1(context : Context) {
        Foo(context).elem = var1
    }
    

    due to not recognizing Foo(context).elem = ... as valid.
    Now calls are recognized as valid in the left hand of
    assignments. (ea-104)

  • Python: async statements are now translated into the Dataflow IL so Semgrep
    will be able to report findings e.g. inside async with ... statements. (gh-9182)

  • In gitlab output, use correct url attached to rule instead of generating it.
    This fixes url for supply chain findings. (gitlab)

    • The language server will no longer crash on startup for intellij (language-server)
    • The language server no longer crashes when installed through pip on Mac platforms (language-server-macos)
  • taint-mode: When we encountered an assignment lval := expr where expr returned
    no taints, we automatically cleaned lval. This was correct in the early days of
    taint-mode, before we introduced taint by side-effect, but it is wrong now. The LHS
    lval may be tainted by side-effect, in which case we cannot clean it just because
    expr returns no taint. Now that we introduced by-side-effect: only it is also
    possible for expr to taint lval by side-effect and return no immediate taint.

    This kind of source should now work as expected:

    - by-side-effect: true
      patterns:
        - pattern: |
            $X = source()
        - focus-metavariable: $X
    ``` (pa-3164)
  • taint-mode: Fixed a bug in the recently added by-side-effect: only option
    causing that when matching l-values of the form l.x and l[i], the l
    occurence would unexpectedly become tainted too. This led to FPs in some
    typestate rules like those checking for double-lock or double-free.

    Now a source such as:

    - by-side-effect: only
      patterns:
      - pattern: lock($L)
      - focus-metavariable: $L

    will not produce FPs on code such as:

    lock(obj.l)
    unlock(obj.l)
    lock(obj.l)
    ``` (pa-3282)
  • taint-mode: Removed a hack that made lval = new ... assignments to not clean
    the lval despite the RHS was not tainted. This caused FPs in double-free rules.
    For example, given this source:

    pattern-sources:
      - by-side-effect: only
        patterns:
          - pattern: delete $VAR;
          - focus-metavariable: $VAR

    And the code below:

    while (nondet) {
      int *v = new int;
      delete v; // FP
    }

    The delete v statement was reported as a double-free, because Semgrep did not
    consider that v = new int would clean the taint in v. (pa-3283)

Release v1.50.0

17 Nov 03:52
Compare
Choose a tag to compare

1.50.0 - 2023-11-17

No significant changes.

Release v1.49.0

15 Nov 13:34
Compare
Choose a tag to compare

1.49.0 - 2023-11-15

Added

  • Added support in Ruby, Julia, and Rust to match implicit return statement inside functions.

    For example:

    return 0

    can now match 0 in

    function f()
      0
    end

    This matching is enabled by default and can be disabled with the rule option implicit_return. (gh-8408)

  • Pro engine supports constant propagation of numbers defined via macro in C++ (gh-9221)

  • taint-mode: The by-side-effect option for taint sources (only) now accepts a
    third value only (besides true and false). Setting by-side-effect: only
    will define a taint source that only propagates by side effect. This option
    should allow (ab)using taint-mode for writing some typestate rules.

    For example, this taint rule:

    pattern-sources:
      - by-side-effect: only
        patterns:
        - pattern: lock($L)
        - focus-metavariable: $L
    pattern-sanitizers:
      - by-side-effect: true
        patterns:
        - pattern: unlock($L)
        - focus-metavariable: $L
    pattern-sinks:
      - pattern: lock($L)

    will match the second lock(x) in this code:

    lock(x) # no finding
    lock(x) # finding

    The first lock(x) will not result in any finding, because the occurrence of x in
    itself will not be tainted. Only after the function call we will record that x is
    tainted (as a side-effect of lock). The second lock(x) will result in a finding
    because the x has been tainted by the previous lock(x). (pa-2980)

Changed

  • In the metrics sent we now record the languages for which we invoked the interfile engine.
    This will enable us to measure the performance impact and error rates of new interfile
    languages. (For scans which don't send metrics, there is no change.) See the PRIAVCY.md
    for more information. (ea-251)

  • Removed support for named snippets (org_name:rule_id) from semgrep scan which were removed from semgrep.dev a few months ago. (gh-9203)

  • Added support for --config <code|secrets> to semgrep scan. When using
    code or secrets, the environment variable SEMGREP_REPO_NAME must be set.

    For example,

    $ SEMGREP_REPO_NAME=test_repo semgrep --config secrets
    

    Internally, semgrep scan --config <product> now uses the same endpoint as the
    semgrep ci to fetch the scan configuration. (gh-9205)

  • Improved handling of unused lambdas to reduce false positives

    Previously, we used to insert the CFGs of unused lambdas at the declaration
    site. However, this approach triggered some false positives. For example,
    consider the following code:

    void incorrect(int *p) {
      auto f1 = [&p]() {
        source(p);
      };
      auto f2 = [&p]() {
        sink(p);
      };
    }
    

    In this code, there's no actual control flow between the source and sink, and
    the lambdas are never even called. But when we inserted their CFGs at the
    declaration site, it incorrectly indicated a taint finding. To prevent these
    types of false positives while still scanning the body of unused lambdas, we
    now insert their CFGs in parallel at the end of their parent function, right
    after all other statements and just before the end node. (pa-3089)

  • Bumped timeout (per-rule and per-file) from 2s to 5s. Recently we lowered it
    from 30s down to 2s, but based on what we have observed so far, we believe 5s
    is a better timeout for the time being. (timeout)

Fixed

  • Fixed a bug where enabling the secret beta causes the default scan mode to be
    set to OSS, even when the Pro flag is turned on in the web UI. (ea-248)

  • Semgrep does not stop a scan anymore for parsing errors due to
    unconventional exceptions (e.g., Failure "not a program") in some
    parsers. Instead, such errors are reported as "Other syntax error". (lang-13)

  • Fix regression for the unused lambda change in react-nextjs-router-push test

    A lambda expression defined in a return expression is also treated as used at
    the location of the return expression. (pa-3089)

  • Updated the Rust parser with miscellaneous improvements. In particular, Semgrep can now parse yield expressions in Rust. (rust)

  • taint-mode: If an expressions is tainted by multiple labels A and B, with B
    requiring A, the expression will now get boths labels A and B. (taint-labels)

Release v1.48.0

06 Nov 17:15
Compare
Choose a tag to compare

1.48.0 - 2023-11-06

Note

Starting from version 1.46.0, Semgrep is first released in the following ecosystems:
- pypy
- brew
- returntocorp/semgrep:canary (Docker)
If no issues are detected after a few days, the Semgrep team then promotes the :canary Docker tag to :latest when everything looks fine.

Added

  • Matching: Matches with the same range but bindings in different locations
    will now no longer deduplicate.

    For instance, the pattern $FUNC(..., $A, ...) would produce only
    one match on the target file:

    foo(true, true)
    

    because you would have two matches to the range of the call, and both
    bindings of $A would be to true.

    Now, the deduplication logic sees that the bindings of $A are in
    different places, and thus should not be considered the same, and
    produce two matches. (pa-3230)

Fixed

  • Fixed out of bounds list access error in Cargo.lock parser (sc-1072)
  • Secrets: metadata overrides specified in validators were incorrectly applied on
    top of one another (on a per-rule basis), so that only the last was applied.
    Each update is now correctly applied independently to each finding based on the
    rule's validators. (scrt-231)

Release v1.47.0

02 Nov 00:26
Compare
Choose a tag to compare

1.47.0 - 2023-11-01

Note

Starting from version 1.46.0, Semgrep is first released in the following ecosystems:
- pypy
- brew
- returntocorp/semgrep:canary (Docker)
If no issues are detected after a few days, the Semgrep team then promotes the :canary Docker tag to ``latest` when everything looks fine.

Added

  • taint-mode: Added a Boolean exact option to sources and sanitizers to make
    matching stricter (default is false).

    If you specify a source such as foo(...), and Semgrep encounters foo(x),
    by default foo(x), foo, and x, will all be considered tainted. If you add
    exact: true to the source specification, then only foo(x) will be regarded
    as tainted, that is the "exact" match for the specification. The same applies
    to "exact" sanitizers. (gh-5897)

  • Added sg alias for semgrep binary which is functionally equivalent to

    alias sg="/opt/homebrew/bin/semgrep"

    with one fewer step. (gh-9117)

  • secrets: Added independent targeting from other semgrep products.

    This change allows Secrets to scan all tracked files. In particular, those ignored
    by semgrepignore will now get scanned. There will be additional changes
    in the future to allow configuring the files that are scanned secrets. (gh-9125)

  • Adds an optional --no-secrets-validation flag to skip secrets validation. (no-secrets-validation)

  • Secrets rules (i.e., with metadata product: secrets) now mask, by replacing
    with *s the ending component of the matched content. (pa-2333)

  • Commutativity Support for Comparison Operators EQ and NOT_EQ

    We've introduced the commutative_compop rule option, enabling commutativity
    for comparison operators EQ and NOT_EQ. With this option, a == b will also
    match b == a, and a != b will also match b != a. (pa-3140)

  • Validation errors are separated from unvalided findings in the terminal output. (validation-error)

Changed

  • For taint rules using labels (experimental) Semgrep now preferably picks a
    source without requires for the taint trace

    Semgrep now prioritizes taint sources without requires condition when
    choosing a representative taint trace from multiple source traces. This helps
    users to more clearly identify the initial taint source when multiple traces
    are involved. (pa-3122)

  • Unreachable supply chain findings report only on line dependency was found in (no longer incorrectly including the next line)
    this change could affect syntactic_id generated by said findings (sc-727)

  • When running semgrep ci --supply-chain, defaults to using OSS engine even if
    PRO engine would otherwise be used (turned on in semgrep.dev, or with --pro flag) (supply-chain-oss)

Fixed

    • Semgrep no longer supports python 3.7 (gh-8698)
  • Semgrep will now refuse to run incompatible versions of the Pro Engine, rather than crashing with a confusing error message. (gh-8873)
  • Fixed an issue that prevented the use of semgrep install-semgrep-pro --custom-binary ... when logged out. (gh-9051)
  • The --severity=XXX scan flag is working again. (gh-9062)
  • The --sarif does not crash when semgrep itself encountered errors
    while processing targets. (gh-9091)
  • Fixed how the end positions assigned to metavariable bindings are computed, in
    order to handle trailing newlines. This affected Semgrep's JSON output. If a
    metavariable $X was bound to a piece of text containing a trailing newline,
    such as "a\n", where the starting position was e.g. at line 1, Semgrep reported
    that the end position was at line 2, when in fact the text is entirely within
    line 1. If the text happened to be at the end of a file, Semgrep could report
    an end position that was outside the bounds of the file. (lang-18)
    • Semgrep Language Server now only scans open files on startup
    • Semgrep Language Server no longer scans with pro engine rules (ls)
  • Rust: unsafe blocks are now translated into the Dataflow IL so e.g. it becomes
    possible for taint analysis to track taint from/to an unsafe block. (pa-3218)
  • Correctly handle parsing toolchain directive in go.mod files (parsegomode)

Release v1.46.0

24 Oct 17:22
Compare
Choose a tag to compare

1.46.0 - 2023-10-24

Note

Starting this release (1.46.0) Semgrep is first released in the following ecosystems:
- pypy
- brew
- returntocorp/semgrep:canary (Docker)
If no issues are detected after a few days, the Semgrep team then promotes the :canary Docker tag to :latest when everything looks fine.

Added

  • semgrep install-semgrep-pro now takes an optional --custom-binary flag to install the specified semgrep-core-proprietary binary rather than downloading it. (custom-pro-binary)

Fixed

  • pyproject.toml parser now handles optional newlines right after section headers. (gh-10879)

  • Updated the parsers for poetry.lock, pipfile.lock, and requirements.txt to ignore case sensitivity from package names.
    This matches their respective specifications. Test cases were added to account for this change. (gh-8984)

  • Reduced the limits for the prefilter optimization so that rules that cause
    computing the prefilter to blow up will abort more quickly. This improves
    performance by 2-3 seconds for each of the slowest rules. May cause a
    slowdown if a rule that previously could be filtered out no longer will be,
    but based on testing this is unlikely. (gh-9040)

  • Fixed issue where conditional expressions aren't handled properly in expression based language.

    Rust example:

    Before:

    fn expr_stmt_if(c) {
      y = 0;
      x = if c { y = 1 };
    
      // Before: this matches when it shouldn't because y is not always 1.
      // After: this does not match, which is the correct behavior.
      y == 1;
    }
    ``` (pa-3205)
  • Fixed type error in creation of DependencyParserError object in the pnpm-lock.yaml parser (sc-1115)

Release v1.45.0

18 Oct 20:08
Compare
Choose a tag to compare

1.45.0 - 2023-10-18

Changed

  • Previously, to ignore a finding from a rule foo.bar.my-rule, nosemgrep ignored a finding only if its fully qualified name was used: nosemgrep: foo.bar.my-rule. Now, nosemgrep can also accept just the rule ID: nosemgrep: my-rule. (#8979)

  • [Breaking Change] Improved Matching of C++ Constructors (pa-3114)

    • In this update, the Semgrep team has enhanced Semgrep's ability to match C++ constructors more accurately.
    • C++ introduces a syntactic ambiguity between function and variable definitions, particularly with constructors. The C++ compiler determines how to interpret an expression based on contextual information, such as whether the immediate parent scope is a function or a class, and whether the identifiers within the parentheses represent variables or types.
    • Due to this complexity, static analyzers face challenges in precisely parsing these expressions without additional information.
    • This commit introduces several workarounds to provide a better solution for handling this ambiguity:
      - By default, when parsing a target file, Semgrep will consider an expression like foo bar(x, y, z); defined within the body of a function as a variable definition with a constructor. This is because variable initialization is a more common use case within the body of a function.
      - Users can specify rule options that annotate, in patterns where the expression can be interpreted in both ways, which interpretation should take precedence. For instance, foo bar(x, y, z); will be parsed as a function definition when the as_fundef option is used and as a variable definition with a constructor when the as_vardef_with_ctor option is applied. It's worth noting that an expression like foo bar(1, y, z); will be parsed as a variable definition without any additional annotation since 1 cannot be a type.
    • Here's an example rule and its corresponding target file to illustrate these changes:
    rules:
      - id: cpp-match-func-def
        message: Semgrep found a match
        options:
          cpp_parsing_pref: as_fundef
        languages:
          - cpp
        severity: WARNING
        pattern-either:
          - pattern: foo $X($Y);
          - pattern: foo $X($Y, $Z);
    
      - id: cpp-match-ctor
        message: Semgrep found a match
        options:
          cpp_parsing_pref: as_vardef_with_ctor
        languages:
          - cpp
        severity: WARNING
        patterns:
          - pattern: foo $X(...);
          - pattern-not: foo $X(3, ...);
    
      - id: cpp-match-ctor-3
        message: Semgrep found a match
        languages:
          - cpp
        severity: WARNING
        pattern: foo $X(3, ...);
    
    class Test {
    
      // ruleid: cpp-match-func-def
      foo bar(x);
      // ruleid: cpp-match-func-def
      foo bar(x, y);
    
      void test() {
        // ruleid: cpp-match-ctor
        foo bar(1);
        // ruleid: cpp-match-ctor
        foo bar(1, 2);
    
        // ruleid: cpp-match-ctor
        foo bar(x);
        // ruleid: cpp-match-ctor
        foo bar(x, y);
    
        // ruleid: cpp-match-ctor
        foo bar(x, 2);
        // ruleid: cpp-match-ctor
        foo bar(1, y);
    
        // ruleid: cpp-match-ctor-3
        foo bar(3);
        // ruleid: cpp-match-ctor-3
        foo bar(3, 4);
        // ruleid: cpp-match-ctor-3
        foo bar(3, y);
      }
    };
    

Fixed

  • Semgrep Docker image: Reduction of the docker image size by using --no-cache when apk upgrading. Thanks to Peter Dave Hello for the contribution.

  • Fixed a bug with pre-filtering introduced in 1.42.0 that caused significant slowdowns, particularly for Kotlin repos. Kotlin repos running default pro rules may see a 30 minute speedup. (ea-208)

  • Taint analysis: track ptr->field l-values in C++

    • In C++, we now track tainted field access via pointer dereference. For instance, consider the following code snippet:
    void test_intra_001() {
      TestObject *obj = new TestObject();
    
      obj->a = taint_source();
      obj->b = SAFE_STR;
    
      // ok: cpp-tainted-field-ptr
      sink(obj->b, __LINE__);
      // ruleid: cpp-tainted-field-ptr
      sink(obj->a, __LINE__);
    }
    

    This can be matched by the rule (gh-1058):

    rules:
      - id: cpp-tainted-field-ptr
        languages:
          - cpp
        message: testing flows though C++ ptrs
        severity: INFO
        mode: taint
        pattern-sources:
          - pattern: taint_source()
        pattern-sinks:
          - patterns:
              - pattern: sink($X, ...)
              - focus-metavariable:
                  - $X
    
  • Do not crash anymore with an Invalid_arg exception when the terminal has very few columns (e.g., in some precommit context). (#8792)

  • Add --supply-chain flag to semgrep ci --help documentation (#8975)

  • Avoid catastrophic Invalid_argument: index out of bounds errors when reporting the location of findings (#9011)

  • IntelliJ and VSCode extensions: The Semgrep Language Server (LSP) no longer freezes while scanning long files.

  • Pre-filtering is now less aggressive and tries not to skip files that could be matched by a rule due to constant-propagation. Previously, a rule searching for the string "foobar" would skip a file that did not contain exactly "foobar", but that contained e.g. "foo" + "bar". (#8767)

  • semgrep ci does not crash anymore when ran from git repositories coming from Azure projects with whitespaces in the name. (#8971)

  • The --test flag now processes test target files even if they do not match the paths: directive of a rule. This is especially useful for rules using the include: which is now disabled in a test context. (#8192)

Release v1.44.0

11 Oct 14:55
Compare
Choose a tag to compare

1.44.0 - 2023-10-11

Added

  • A new --matching-explanations CLI flag has been added, to get matching
    explanations. This was internally used by the Semgrep Playground to
    help debug rules, but is now available also directly from the CLI. (explanations)

  • Using C++ tree-sitter as a failsafe pattern parser for C (gh-8905)

  • Allowing multiple type fields in metavariable-type rule syntax

    Users have the flexibility to utilize multiple type fields to match the type of
    metavariables. For instance:

    metavariable-type:
    metavariable: $X
    types:
    - typeA
    - typeB

    This approach is also supported in rule 2.0. (gh-8913)

  • Support for parsing pubspec (Dart/Flutter) lockfiles (gh-8925)

  • Added support for matching template type arguments using metavariables in C++.
    Users can now successfully match code snippets like:

    #include <memory>
    using namespace std;
    
    void foo() {
        int *i = 0;
    
        // ruleid: match-with-template
        shared_ptr<int> p;
    }
    

    with the pattern:

    shared_ptr<$TY> $LOCAL_VAR;
    ``` (pa-3102)
    
    
    

Fixed

  • Avoid fatal "missing plugin" exceptions when scanning some Apex rules
    for which no Apex pattern is used by the rule such as a pattern-regex:
    and nothing else. (gh-8945)

  • Semgrep can now parse optional assignments in Swift (e.g. a.b? = 1). (lang-1)

  • Sequential tainting is now supported in Elixir.

    def f() do
      x = "tainted"
      y = x
    
      # This now matches.
      sink(y)
    end
    ``` (pa-3130)
  • Target files that disappeared before the scan or that have special byte
    characters in their filename do not cause the whole scan to crash anymore.
    The file is skipped instead. (pa-3144)

  • go.mod parsing now correctly allows arbitrary newlines and whitespace between dependencies (sc-1076)

  • fix: Improve typed metavariable matching against expressions consisting of names only. (type-inference)

Release v1.43.0

03 Oct 14:24
Compare
Choose a tag to compare

1.43.0 - 2023-10-03

Added

  • Dart: Full Semgrep support for Dart has been added, whereas previously
    most Semgrep constructs (and Semgrep itself) would not work correctly. (pa-2968)

Changed

  • We have reduced the default timeout (per-rule and per-file) to 2s (down from 30s).
    Typically, running a rule on a file should take a fraction of a second. When a rule
    takes more than a couple of seconds is often because the rule is not optimally
    written, or because the file is unusually large (a minified file or machine-
    generated code), so waiting 30s for it does not tend to bring any value. Plus, by
    cutting it earlier, we may prevent a potential OOM crash when running the rule is
    very memory intensive. (pa-3155)

Fixed

  • The language server will no longer surface committed findings when a user types but does not save (pdx-ls-git)