Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConvertFrom-Json returns array with extra null element, if comments are outside the root element. #21203

Open
5 tasks done
mubed opened this issue Feb 9, 2024 · 16 comments
Open
5 tasks done
Labels
Needs-Triage The issue is new and needs to be triaged by a work group.

Comments

@mubed
Copy link

mubed commented Feb 9, 2024

Prerequisites

Steps to reproduce

Having a json file containing this:

// Comment on top level
{
    "key": "value"
}

should become a PSObject or Hashtable, when imported using Get-Content and then piped through ConvertFrom-Json, but the result is an array with a null as first element and the PSObject/Hashtable as the second element!

Expected behavior

Comment lines have to be ignored, not be included as null elements.

Actual behavior

Comments on top level are interpreted as null elements like a "{}".

Error details

No response

Environment data

Name                           Value
----                           -----
PSVersion                      7.4.1
PSEdition                      Core
GitCommitId                    7.4.1
OS                             Microsoft Windows 10.0.19045
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Visuals

No response

@mubed mubed added the Needs-Triage The issue is new and needs to be triaged by a work group. label Feb 9, 2024
@mubed
Copy link
Author

mubed commented Feb 9, 2024

Might be related to #14553

@mubed
Copy link
Author

mubed commented Feb 9, 2024

Currect workaround:

Get-Content File.json | Out-String | ConvertFrom-Json

@rhubarb-geek-nz
Copy link

Having a json file containing this:

What you have there is a thing called not a json file.

Comments are not part of the JSON file specification.

Relying on comments in what you think are json files is just storing up trouble depending on the strictness of the parser you use. The simplest approach is to (a) stick to the JSON specification (b) use text elements within the structure as your comments.

@mubed
Copy link
Author

mubed commented Feb 9, 2024

Having a json file containing this:

What you have there is a thing called not a json file.

Comments are not part of the JSON file specification.

Relying on comments in what you think are json files is just storing up trouble depending on the strictness of the parser you use. The simplest approach is to (a) stick to the JSON specification (b) use text elements within the structure as your comments.

I would appreciate it, if you would first check the specifications of ComvertFrom-Json before writing google-able general information about common knowledge like what exactly JSON is:

https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/convertfrom-json?view=powershell-7.4

The ConvertFrom-Json cmdlet converts a JavaScript Object Notation (JSON) formatted string to a custom PSObject or Hashtable object that has a property for each field in the JSON string. JSON is commonly used by web sites to provide a textual representation of objects. The cmdlet adds the properties to the new object as it processes each line of the JSON string. The JSON standard allows duplicate key names, which are prohibited in PSObject and Hashtable types. For example, if the JSON string contains duplicate keys, only the last key is used by this cmdlet. See other examples below. To generate a JSON string from any object, use the ConvertTo-Json cmdlet. This cmdlet was introduced in PowerShell 3.0. Note Beginning with PowerShell 6, the cmdlet supports JSON with comments. JSON comments start with two forward slashes (//) characters. JSON comments aren't captured in the objects output by the cmdlet. Prior to PowerShell 6, ConvertFrom-Json would return an error when it encountered a JSON comment.

@rhubarb-geek-nz
Copy link

rhubarb-geek-nz commented Feb 9, 2024

PowerShell is moving from Newtonsoft.Json to System.Text.Json and the grammer rules and capabilities are not the same for the extensions to JSON, hence why I recommend keeping to the standard to save you the trouble of downstream problems such as this.

@237dmitry
Copy link

ss

@rhubarb-geek-nz
Copy link

rhubarb-geek-nz commented Feb 9, 2024

I see the null if you try a round trip. The null does not appear in the Format-Table output.

PS> Get-Content -LiteralPath file.json | ConvertFrom-Json | ConvertTo-Json
[
  null,
  {
    "key": "value"
  }
]

and the Out-String solves the problem

PS> Get-Content -LiteralPath file.json | Out-String | ConvertFrom-Json | ConvertTo-Json
{
  "key": "value"
}

So one could say that the issue is not one of ConvertFrom-Json parsing a single JSON document, it is part of the piping. This is the list of strings in the input pipeline to ConvertFrom-Json.

PS> Get-Content -LiteralPath file.json | ConvertTo-Json
[
  "// Comment on top level",
  "{",
  "\t\"key\": \"value\",",
  "}"
]

The comment is on its own line so is a complete self contained JSON-with-comments document, it can be parsed in it's entirety and written to the output. The remaining lines are a group of lines referring to the same JSON document, so the parser has not seen the closing "}" so there is still more to come in subsequent lines.

So the ambiguity is two-fold,

  1. what counts as a complete JSON document
  2. how are JSON documents represented as strings in the input pipeline

I suggest that ConvertFrom-JSON really sees two separate JSON documents.

If the comment is moved to within the brackets, the problem goes away

PS> Get-Content -LiteralPath file.json | ConvertFrom-Json | ConvertTo-Json
{
  "key": "value"
}

@237dmitry
Copy link

and the Out-String solves the problem

Yes, null in output, converting to string solves the problem:

Get-Content file.json -Raw | ConvertFrom-Json | ConvertTo-Json

@mubed
Copy link
Author

mubed commented Feb 9, 2024

I already mentioned above that there is a workaround using a single string (out-string or gc -raw) but this is not the solution. According to the documentation of ConvertFrom-Json, the current implementation has a bug.

@mubed
Copy link
Author

mubed commented Feb 9, 2024

I see the null if you try a round trip. The null does not appear in the Format-Table output.

PS> Get-Content -LiteralPath file.json | ConvertFrom-Json | ConvertTo-Json
[
  null,
  {
    "key": "value"
  }
]

and the Out-String solves the problem

PS> Get-Content -LiteralPath file.json | Out-String | ConvertFrom-Json | ConvertTo-Json
{
  "key": "value"
}

So one could say that the issue is not one of ConvertFrom-Json parsing a single JSON document, it is part of the piping. This is the list of strings in the input pipeline to ConvertFrom-Json.

PS> Get-Content -LiteralPath file.json | ConvertTo-Json
[
  "// Comment on top level",
  "{",
  "\t\"key\": \"value\",",
  "}"
]

The comment is on its own line so is a complete self contained JSON-with-comments document, it can be parsed in it's entirety and written to the output. The remaining lines are a group of lines referring to the same JSON document, so the parser has not seen the closing "}" so there is still more to come in subsequent lines.

So the ambiguity is two-fold,

  1. what counts as a complete JSON document
  2. how are JSON documents represented as strings in the input pipeline

I suggest that ConvertFrom-JSON really sees two separate JSON documents.

If the comment is moved to within the brackets, the problem goes away

PS> Get-Content -LiteralPath file.json | ConvertFrom-Json | ConvertTo-Json
{
  "key": "value"
}

Stil, ConvertFrom-Json is‌ designed to generate one single object representation (can also be an array) based on an array of string lines or a single multi line string. Parsing separate input items (JSON documents) which are not written as JSON arrays (wrapped in brackets, separated by commas) should not result in a JSON array, but a parsing error. ConverFrom-Json could simply ignore what comes after comment literals and add no data to the object structure during parsing.

@rhubarb-geek-nz
Copy link

rhubarb-geek-nz commented Feb 9, 2024

Lets start with Get-Content and note that each line of a text file is its own record in the output pipeline.

I suggest that any change is going to make the matter worse. ConvertFrom-JSON does work with multiple JSON documents in different record records and outputs each as a record

With the file

{"key":"value"}
{"key":"value"}
{"key":"value"}
{"key":"value"}
{"key":"value"}

This is not a JSON file, it would need wrapping square brackets and commas between the objects. It is however a sequence of well-formed JSON files.

This is output as multiple PSCustomObjects

PS> Get-Content -LiteralPath file.json | ConvertFrom-Json | ForEach-Object { $_.GetType() }

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     False    PSCustomObject                           System.Object
True     False    PSCustomObject                           System.Object
True     False    PSCustomObject                           System.Object
True     False    PSCustomObject                           System.Object
True     False    PSCustomObject                           System.Object

And if you round trip them, then ConvertTo-JSON turns the list of records into a JSON array.

PS> Get-Content -LiteralPath file.json | ConvertFrom-Json | ConvertTo-Json
[
  {
    "key": "value"
  },
  {
    "key": "value"
  },
  {
    "key": "value"
  },
  {
    "key": "value"
  },
  {
    "key": "value"
  }
]

So ConvertFrom-JSON as it is today will read multiple JSON files and output each as a PSCustomObject

a single string containing a "// Comment on top level" is a complete self-contained JSON document as far as it is concerned, it had no parsing errors and gave back null, so that is what was written to the output pipeline.

If people are relying on the behaviour of ConvertFrom-JSON to parse each record as a unique JSON file then I don't see what change can be made to the cmdlet.

Now if you look at the document page it says

You can pipe a JSON string to ConvertFrom-Json.

That suggests that is what you should be doing via Out-String or using Raw on Get-Content, where you are telling ConvertFrom-JSON what the boundaries of the document are, and there is exactly one in the input.

PS> Get-Content -LiteralPath file.json -Raw | ConvertFrom-Json | ConvertTo-Json
{
  "key": "value"
}

@rhubarb-geek-nz
Copy link

rhubarb-geek-nz commented Feb 9, 2024

Showing each JSON document as its own record

PS> Get-Content -LiteralPath file.json
{"key": "value"}
42
"foo"
true

Then show the output of ConvertFrom-JSON

PS>  Get-Content -LiteralPath file.json | ConvertFrom-Json | ForEach-Object { $_.GetType() }

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     False    PSCustomObject                           System.Object
True     True     Int64                                    System.ValueType
True     True     String                                   System.Object
True     True     Boolean                                  System.ValueType

Also a comment by itself is a complete JSON document, giving you a null, the following gives no parse errors.

PS> "// comment" | ConvertFrom-JSON

the output record is there....

PS> "// comment" | ConvertFrom-JSON | ForEach-Object { "." }
.

just like

PS> "true" | ConvertFrom-JSON
True

Now if you try

// comment
{"key": "value"}
42
"foo"
true

You get in a mess

PS>  Get-Content -LiteralPath file.json | ConvertFrom-Json
ConvertFrom-Json: Conversion from JSON failed with error: Additional text encountered after finished reading JSON content: 4. Path '', line 3, position 0.

So although "// comment" was sufficient to be a document by itself, when that was a prefix it assumed the following was all one document. So there looks like there is some behaviour to try and join the individual lines into a single document.

@jhoneill
Copy link

jhoneill commented Feb 9, 2024

I'd class this as a bug and I can explain WHY , I think

image

This rather odd bit of code with two break points in it feeds JSON into ConvertFrom 1 line at a time letting me single step through it in the debugger and critically I get output before all the text has been processed (i.e. Convert will send stuff down the pipeline to the next even if the input arrives very slowly, so object 1 gets processed before the last object has been sent. )

I think when the text is sent line by line, the initial comment means that when the next line is reached the process says "OK, starting a new object send the one we were working on" which is null. That case was never tested for (and I'm not sure if that says a bug in PowerShell or in .NET stuff called by PowerShell).
Processing the whole thing in one go processes correctly (and follows what the docs say about // being a comment)

Incidentally if you are reading from a file convertfrom-Json (gc -raw foo.json) not only works around this problem but should be faster (although not noticeably unless files are very large). If the JSON is arriving a little at a time from something else , piping is going to be better.

@mubed
Copy link
Author

mubed commented Feb 9, 2024

Showing each JSON document as its own record

PS> Get-Content -LiteralPath file.json
{"key": "value"}
42
"foo"
true

Then show the output of ConvertFrom-JSON

PS>  Get-Content -LiteralPath file.json | ConvertFrom-Json | ForEach-Object { $_.GetType() }

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     False    PSCustomObject                           System.Object
True     True     Int64                                    System.ValueType
True     True     String                                   System.Object
True     True     Boolean                                  System.ValueType

Also a comment by itself is a complete JSON document, giving you a null, the following gives no parse errors.

PS> "// comment" | ConvertFrom-JSON

the output record is there....

PS> "// comment" | ConvertFrom-JSON | ForEach-Object { "." }
.

just like

PS> "true" | ConvertFrom-JSON
True

Now if you try

// comment
{"key": "value"}
42
"foo"
true

You get in a mess

PS>  Get-Content -LiteralPath file.json | ConvertFrom-Json
ConvertFrom-Json: Conversion from JSON failed with error: Additional text encountered after finished reading JSON content: 4. Path '', line 3, position 0.

So although "// comment" was sufficient to be a document by itself, when that was a prefix it assumed the following was all one document. So there looks like there is some behaviour to try and join the individual lines into a single document.

According to RFC7159 you are right. According to the older and more common understanding of what a JSON file should be (only object or array as in RFC4627), you are wrong.

I would recommend to add a note to the documentation to raise awareness for this gotcha. Something like "always read the file in raw mode ".

Another suggestion is, to introduce a new CmdLet called Import-Json analogous to Import-Csv that implicitly does the same thing.

@rhubarb-geek-nz
Copy link

rhubarb-geek-nz commented Feb 9, 2024

According to RFC7159 you are right. According to the older and more common understanding of what a JSON file should be (only object or array as in RFC4627), you are wrong.

I was just trying to describe the existing behaviour and it happily parses individual elements, so let's say ConvertFrom-JSON is consistent with RFC7159 from the point of view of what constitutes a complete JSON document.

PS> 'true' | ConvertFrom-JSON
True

But of course, neither of RFC7159 or RFC4627 describe any comment encoding, other than

A JSON parser MAY accept non-JSON forms or extensions.

@mklement0
Copy link
Contributor

mklement0 commented Feb 11, 2024

Let me try to provide a summary:

  • Context:

    • Indeed, PowerShell Core's JSON parsing is based on Newtonsoft.Json; the latter supports not only tolerating comments, but also preserving them (inside a JSON text), but the intent of ConvertFrom-Json is to ignore them.

    • While moving the implementation to System.Text.Json was considered, the relevant PR (Port ConvertTo-Json to .Net Core Json API #11198) was abandoned and is closed; more importantly, it would lead to breaking changes that would be considered unacceptable (loss of support for single-quoted property names and values, loss of support for unquoted property names).

      • Either way, taking away support for quietly tolerating comments would not be acceptable (System.Text.Json does have support for that, but not for single-quoting / unquoted property names).
    • ConvertFrom-Json has a heuristic built in, based on whether the first line of input is a self-contained (complete) JSON text (JSON document) by itself:

      • If it is, the line is parsed into an object (graph) by itself and the latter is output (streamed) right away; all subsequent lines must then be self-contained too and are processed the same.

      • If it is not, all input strings (lines) are collected up front, and only after all have been received are the lines together parsed as a single JSON text.

  • The bug at hand:

  • Already implemented / suggested mitigations:

  • Suggestion for a - hopefully - bucket 3 fix (i.e. a change that is technically breaking, but acceptable for its benefits and presumed unlikeliness to affect existing code (evolved from ConvertFrom-Json includes $null element on output if input starts with blank line #12229 (comment)):

    • When input strings (lines) are parsed one at a time, do not emit a $null for, i.e. effectively ignore:

      • an empty or blank (all-whitespace) input string
      • a comment-only string, i.e. a string (line) composed solely of either // ... or /* ... */ (aside from any leading or trailing whitespace)
      • Rationale:
        • Comments are fundamentally not part of the JSON standard, and an empty or blank string in isolation is not a valid JSON text (as defined in RFC 8259).
        • Given that such strings are ignored inside JSON texts, it stands to reason to also ignore them between them.
    • When such strings (lines) are encountered at the start of the input (causing them to be ignored with respect to output), defer applying the heuristic until the first input string (line) that is neither blank/empty nor a comment-only string is encountered:

      • If that string is a self-contained JSON text, ConvertFrom-Json should also ignore any subsequent empty/blank/comment-only lines, as implied by the first bullet point; otherwise, processing should be as currently, except:

        • If a malformed/incomplete string (line) is encountered, a non-terminating error must be reported, so that the command runs to completion and still outputs results for the well-formed strings among the input. (The currently used (statement-)terminating error in effect prevents any output in programmatic processing, though on the screen only you will see results for the strings processed before such an error occurs.)
      • If not, i.e. if collect-all-(remaining)-input-strings-first behavior is triggered, ConvertFrom-Json can rely on the underlying Newtonsoft.Json API to remove incidental whitespace and comments (including in-line comments), which it mostly already does, except for the JSON array-related bug reported in ConvertFrom-Json: comments in top-level arrays are parsed incorrectly (but also are not errors) #14553 (comment)

    • Implementing the above should resolve all of the issues mentioned, including the one at hand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs-Triage The issue is new and needs to be triaged by a work group.
Projects
None yet
Development

No branches or pull requests

5 participants