Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GCOV intermediate format #282

Open
marxin opened this issue Oct 3, 2018 · 55 comments · May be fixed by #766
Open

Support GCOV intermediate format #282

marxin opened this issue Oct 3, 2018 · 55 comments · May be fixed by #766
Assignees
Labels
Format: JSON Gcov help wanted not possible for now This feature is not possible because of restrictions from other ones Type: Enhancement

Comments

@marxin
Copy link

marxin commented Oct 3, 2018

For GCC 9.1 I'm planning to come up with JSON format of the intermediate representation.
It's feasible for consumers of the information like lcov and gcovr
Feel free to comment my patch request:
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01628.html

Example output:
https://users.suse.com/~mliska/tmp/tramp.json

@latk
Copy link
Member

latk commented Oct 3, 2018

Thank you, this looks great! There's nothing wrong with the previous intermediate format, but moving to JSON will make future improvements easier (both for GCC and downstream users like gcovr). I look forward to implementing support for it, because it's much easier to work with than the human-readable output we are currently parsing….

Unfortunately I won't be able to move to your JSON format because old GCC versions and llvm-cov gcov will have to be supported, but it will be great to be able to offer a more robust parser for modern GCC as an option.

A few concerns/questions/observations:

❓ It is great that you are including a current_working_directory key! Are the file names resolved relative to this working directory, or are the file names taken directly from the gcno notes? This matters if gcov is not invoked from the same directory as gcc, which happens to be one of the difficult problems for gcovr.

💡 These JSON files can get quite big, but gcov and any parsers will have to load them into memory entirely. Please consider whether a mixed line-based/JSON format would work, i.e. something like JsonLines.org with one "file" JSON document per line and perhaps a header with metadata. That's technically not true JSON but much more Unixy. Line-based output would be more valuable than compressed output to me.

👍 Will it be possible to pipe the JSON to stdout instead of writing to a file? It seems so. Great!

🙏 No format is self-describing. The source code in your patch makes the format clear, but real documentation would avoid misunderstandings. E.g. the intermediate format shows a vaguely BNF-ish example in the docs. I'd consider writing:

The JSON output is structured as follows:

{
  "version": "<gcov_version>",
  "current_working_directory": "<cwd_absolute>",
  "files": [<file>...]
}

where each <file> has the form:

{
  "file": "<filename_relative>",
  "functions": [<function>...],
  "lines": [<line>...]
}

(and so on for function, line, and branch objects).

❓ How will the JSON format ensure forward compatibility, i.e. deal with future changes? There tend to be three choices:

  • get it right the first time (unlikely).
  • deal with any problems later (please not).
  • prepare clear extension points.

E.g. would it make sense to version the JSON format separately (preferably using a SemVer-compatible number), or do I have to extract the GCC version from the version-key to find out which format variant I'm dealing with? (Assuming a future with GCC 12 😄.) Some forward compatibility story would be great, particularly because you are deprecating the existing machine-readable format.

❓ GCC 8 introduced additional information about line coverage per template specialization in the human-readable output. Will this information be available in the JSON output? Gcovr does not currently make use of that info, but if it's more easily available that might change.

@latk latk added the Gcov label Oct 3, 2018
@marxin
Copy link
Author

marxin commented Oct 5, 2018

Thank you, this looks great! There's nothing wrong with the previous intermediate format, but moving to JSON will make future improvements easier (both for GCC and downstream users like gcovr). I look forward to implementing support for it, because it's much easier to work with than the human-readable output we are currently parsing….

Thanks for very useful feedback!

Unfortunately I won't be able to move to your JSON format because old GCC versions and llvm-cov gcov will have to be supported, but it will be great to be able to offer a more robust parser for modern GCC as an option.

I understand that, but hopefully in the future it will be main format used for GCC.

A few concerns/questions/observations:

It is great that you are including a current_working_directory key! Are the file names resolved relative to this working directory, or are the file names taken directly from the gcno notes? This matters if gcov is not invoked from the same directory as gcc, which happens to be one of the difficult problems for gcovr.

current_working_directory is taken from .gcno file and file names as well. That should be enough to locate source files.

These JSON files can get quite big, but gcov and any parsers will have to load them into memory entirely. Please consider whether a mixed line-based/JSON format would work, i.e. something like JsonLines.org with one "file" JSON document per line and perhaps a header with metadata. That's technically not true JSON but much more Unixy. Line-based output would be more valuable than compressed output to me.

Well you will have one JSON per object file. Even if you have very many object files, then it's expected that one can load into memory JSON for one of them. Then you can release it.

Will it be possible to pipe the JSON to stdout instead of writing to a file? It seems so. Great!

Yes.

No format is self-describing. The source code in your patch makes the format clear, but real documentation would avoid misunderstandings. E.g. the intermediate format shows a vaguely BNF-ish example in the docs. I'd consider writing:

The JSON output is structured as follows:

{
  "version": "<gcov_version>",
  "current_working_directory": "<cwd_absolute>",
  "files": [<file>...]
}

where each <file> has the form:

{
  "file": "<filename_relative>",
  "functions": [<function>...],
  "lines": [<line>...]
}

(and so on for function, line, and branch objects).

Got it, I was bit lazy to do that but I will document it properly.

How will the JSON format ensure forward compatibility, i.e. deal with future changes? There tend to be three choices:

  • get it right the first time (unlikely).
  • deal with any problems later (please not).
  • prepare clear extension points.

E.g. would it make sense to version the JSON format separately (preferably using a SemVer-compatible number), or do I have to extract the GCC version from the version-key to find out which format variant I'm dealing with? (Assuming a future with GCC 12 .) Some forward compatibility story would be great, particularly because you are deprecating the existing machine-readable format.

Agree that semantic version is much better that version of compiler (that can be added too but with a different key name).

GCC 8 introduced additional information about line coverage per template specialization in the human-readable output. Will this information be available in the JSON output? Gcovr does not currently make use of that info, but if it's more easily available that might change.

Yes. One another new feature are unexecuted blocks:

Executed basic blocks having a
statement with zero execution count end with ‘*’ character and are colored with magenta
color with the ‘-k’ option. This functionality is not supported in Ada.

Martin

@marxin
Copy link
Author

marxin commented Oct 10, 2018

I addressed the requests in version 2 of the patch:
https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00553.html

@marxin
Copy link
Author

marxin commented Oct 29, 2018

Note that the format change has landed into GCC's trunk and will be part of next release (9.1).

@xiaoliangwu
Copy link

Thanks for the new JSON format, but why use same "-i" option? should you keep "-i" same means to output intermediate format and use new option to output JSON format like "--json" ? breaking the backward compatible is not a good news for exists apps

@Spacetown
Copy link
Member

@xiaoliangwu I don't get your problem. The JSON from gcc has nothing to do with the JSON from gcovr.

@xiaoliangwu
Copy link

xiaoliangwu commented Aug 8, 2020

@xiaoliangwu I don't get your problem. The JSON from gcc has nothing to do with the JSON from gcovr.

OK, maybe I file on wrong place

@f18m
Copy link

f18m commented Dec 17, 2020

Hi @latk , Hi @marxin ,
any news on this topic ?
At this point it looks like 2years have passed since this issue was created and indeed now gcc 9.1 is commonly used in several recent distributions :)
It would be very nice to have gcovr be able to exploit the new gcov JSON output... thanks!!

@marxin
Copy link
Author

marxin commented Dec 18, 2020

It would be very nice to have gcovr be able to exploit the new gcov JSON output... thanks!!

I'm all for it! I think the format seems quite stable.

@nmeum
Copy link

nmeum commented Jun 25, 2021

How does this issue relate to #326?

I was looking into #282 (i.e. this issue) because I have a software simulator which generates the gcov intermediate JSON directly. As such, I was interested in implemented support for using the gcov JSON intermediate file as in input format in gcovr. However, it seems to be me that this was already done in #326. With #326, it seems to be possible to generate gcovr HTML files etc. from a JSON intermediate input file using --add-tracefile, e.g. gcovr --add-tracefile gcovr.json --html --html-details -o test.html. However, the format used in #326 differs slightly from the standard GCOV intermediate JSON format and is not entirely compatible. Would it be possible/desirable to add a additional command-line flag to gcovr for parsing the standard gcov intermediate JSON format? This should also allow using the standard intermediate JSON format (as generated by gcov -i) internally instead of using *.gcda and *.gcno files, shouldn't it? Presently, I just don't understand why the code added in #326 uses a custom JSON format.

@Spacetown
Copy link
Member

@nmeum the JSON format from #326 is a internal format of gcovr and not gcov. This format can be written by several calls and then merged together to get an overall report.
At the moment the format of gcov isn't supported and the problem is that you need at least GCC 9.1 to generate this format. In my company still 4.9 is used.

@nmeum
Copy link

nmeum commented Jun 25, 2021

@nmeum the JSON format from #326 is a internal format of gcovr and not gcov.

I am aware of that.

At the moment the format of gcov isn't supported […]

Yes, I would like to add support for the standard gcov format.

[…] the problem is that you need at least GCC 9.1 to generate this format. In my company still 4.9 is used.
Regarding the GCC version issue you mentioned: I don't think this is a problem for adding the standard gcov format as an additional input format.

For now I would simply suggesting using the standard gcov JSON format as an additional input format.

@Spacetown
Copy link
Member

That would be fine. I wanted to clarify that changing the options of the gcov call.

@Spacetown
Copy link
Member

It seems that they do not belong to the same source file or same run. in the JSON file I see data until line 37 and in the TXT file are 47 lines.

In the TXT file the blocks for each line start with 0 and in the JSON they are unique for the file.

@marxin
Copy link
Author

marxin commented Apr 24, 2023

All right, so there's a part of both .gcov and .json outputs for a test-case:
gcov-17.C.gcov-before.txt
gcov-17.C.gcov-after.txt
a-gcov-17.gcov-before.json.gz
a-gcov-17.gcov-after.json.gz

for normal mode, I'm planning to remove indices for blocks and calls as the numbering is stupid, and for blocks I'm using IDs (similar to JSON output).

What do you think about it now?

@Spacetown
Copy link
Member

Using IDs for the flocks in the legacy format is OK for me but I strongly advice to not remove the indices since other tools like ourselves rely on the presence of them. Even lcov will fail if this info is removed, see https://github.com/linux-test-project/lcov/blob/5c9399e8a0b603ff488a89b97dd8da851ed7a9f3/bin/geninfo#L1960.

@marxin
Copy link
Author

marxin commented Apr 25, 2023

Ok, understood. So I'm going to commit a change for GCC 14.1 that will only replace block "index" with block ID in legacy format.
When it comes to tools like gcovr or lcovr, they should all rely on JSON format and not a particular output format of human-readable format. Hopefully, that's going to change in the future.

@marxin
Copy link
Author

marxin commented Apr 25, 2023

One can test it using the openSUSE gcc14 package in a container:

$ podman run --rm -it opensuse/tumbleweed /bin/bash
zypper addrepo https://download.opensuse.org/repositories/devel:gcc:next/openSUSE_Tumbleweed/devel:gcc:next.repo
zypper in -y --allow-vendor-change gcc gcc-c++ wget
wget https://raw.githubusercontent.com/gcc-mirror/gcc/master/gcc/testsuite/g%2B%2B.dg/gcov/gcov-17.C
g++ gcov-17.C --coverage && ./a.out && gcov -tabi a-gcov-17.C
Success
{"format_version": "1", "gcc_version": "14.0.0 20230424 (experimental) [revision f0eabc52c9a2d3da0bfc201da7a5c1658b76e9a4]", "current_working_directory": "/", "data_file": "a-gcov-17.C", "files": [{"file": "gcov-17.C", "functions": [{"name": "_ZN3FooIcEC2Ev", "demangled_name": "Foo<char>::Foo()", "start_line": 7, "start_column": 3, "end_line": 7, "end_column": 22, "blocks": 1, "blocks_executed": 0, "execution_count": 0}, {"name": "_ZN3FooIiEC2Ev", "demangled_name": "Foo<int>::Foo()", "start_line": 7, "start_column": 3, "end_line": 7, "end_column": 22, "blocks": 1, "blocks_executed": 1, "execution_count": 1}, {"name": "_ZN3FooIcE3incEv", "demangled_name": "Foo<char>::inc()", "start_line": 9, "start_column": 8, "end_line": 9, "end_column": 22, "blocks": 1, "blocks_executed": 0, "execution_count": 0}, {"name": "_ZN3FooIiE3incEv", "demangled_name": "Foo<int>::inc()", "start_line": 9, "start_column": 8, "end_line": 9, "end_column": 22, "blocks": 1, "blocks_executed": 1, "execution_count": 2}, {"name": "_ZL5noretv", "demangled_name": "noret()", "start_line": 18, "start_column": 13, "end_line": 21, "end_column": 1, "blocks": 1, "blocks_executed": 1, "execution_count": 1}, {"name": "main", "demangled_name": "main", "start_line": 24, "start_column": 1, "end_line": 45, "end_column": 1, "blocks": 16, "blocks_executed": 12, "execution_count": 1}], "lines": [{"line_number": 7, "function_name": "_ZN3FooIcEC2Ev", "count": 0, "unexecuted_block": true, "branches": []}, {"line_number": 7, "function_name": "_ZN3FooIiEC2Ev", "count": 1, "unexecuted_block": false, "branches": []}, {"line_number": 9, "function_name": "_ZN3FooIcE3incEv", "count": 0, "unexecuted_block": true, "branches": []}, {"line_number": 9, "function_name": "_ZN3FooIiE3incEv", "count": 2, "unexecuted_block": false, "branches": []}, {"line_number": 18, "function_name": "_ZL5noretv", "count": 1, "unexecuted_block": false, "branches": []}, {"line_number": 20, "function_name": "_ZL5noretv", "count": 1, "unexecuted_block": false, "branches": []}, {"line_number": 24, "function_name": "main", "count": 1, "unexecuted_block": false, "branches": []}, {"line_number": 27, "function_name": "main", "count": 1, "unexecuted_block": false, "branches": []}, {"line_number": 29, "function_name": "main", "count": 1, "unexecuted_block": false, "branches": []}, {"line_number": 30, "function_name": "main", "count": 1, "unexecuted_block": false, "branches": []}, {"line_number": 31, "function_name": "main", "count": 1, "unexecuted_block": false, "branches": []}, {"line_number": 33, "function_name": "main", "count": 11, "unexecuted_block": false, "branches": [{"count": 10, "throw": false, "fallthrough": false}, {"count": 1, "throw": false, "fallthrough": true}]}, {"line_number": 34, "function_name": "main", "count": 10, "unexecuted_block": false, "branches": []}, {"line_number": 36, "function_name": "main", "count": 1, "unexecuted_block": true, "branches": [{"count": 0, "throw": false, "fallthrough": true}, {"count": 1, "throw": false, "fallthrough": false}]}, {"line_number": 38, "function_name": "main", "count": 1, "unexecuted_block": false, "branches": [{"count": 0, "throw": false, "fallthrough": true}, {"count": 1, "throw": false, "fallthrough": false}]}, {"line_number": 39, "function_name": "main", "count": 0, "unexecuted_block": true, "branches": [{"count": 0, "throw": false, "fallthrough": true}, {"count": 0, "throw": true, "fallthrough": false}]}, {"line_number": 41, "function_name": "main", "count": 1, "unexecuted_block": false, "branches": [{"count": 1, "throw": false, "fallthrough": true}, {"count": 0, "throw": true, "fallthrough": false}]}, {"line_number": 43, "function_name": "main", "count": 1, "unexecuted_block": false, "branches": []}, {"line_number": 44, "function_name": "main", "count": 0, "unexecuted_block": true, "branches": []}]}]}

@Spacetown
Copy link
Member

@marxin Currently I'm adding gcc-12 and gcc-13 to the test matrix because of the proper exit codes when files can't be written (see #775 ).
While doing this I was thinking about how we can detect that the JSON format has the needed elements? At the moment we only check if gcov provides the needed option by checking the content of the help. Is it possible to add the generated version of the JSON format to the help output?

@marxin
Copy link
Author

marxin commented May 4, 2023

Yes, you can easily do that by looking at format_version (or possibly gcc_version) in the produced JSON output in order to detect the requested stuff. Right now (for GCC 14.1), I bumped to version 2:
https://gcc.gnu.org/onlinedocs/gcc/Invoking-Gcov.html

@Spacetown
Copy link
Member

But I need this info before creating a JSON output. We only have the help output of gcov to detect this.

@marxin
Copy link
Author

marxin commented May 4, 2023

You are right, so do you prefer a JSON format_version to be added to gcov --version?

Note one can workaround it right now:

gcov -jt _ 2>/dev/null | python -m json.tool
{
    "data_file": "_",
    "files": [],
    "format_version": "1",
    "gcc_version": "13.0.1 20230421 (prerelease) [revision f980561c60b0446cc427595198d7f3f4f90e0924]"
}

@Spacetown
Copy link
Member

You are right, so do you prefer a JSON format_version to be added to gcov --version?

This or document it in the help of -j.

@marxin
Copy link
Author

marxin commented May 4, 2023

Good, I've just pushed a change to gcov -v to GCC upstream:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d879d68eb309561d266ddf734ab8c69f4fef3874

@marxin
Copy link
Author

marxin commented May 4, 2023

Output example:

gcov -v
gcov (GCC) 14.0.0 20230504 (experimental)
JSON format version: 2
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@Spacetown Spacetown added the not possible for now This feature is not possible because of restrictions from other ones label Sep 30, 2023
@Spacetown
Copy link
Member

@marxin Will this change be backported to gcc 13?

@marxin
Copy link
Author

marxin commented Dec 13, 2023

@marxin Will this change be backported to gcc 13?

No.

@Spacetown
Copy link
Member

Ok, then I need to wait for the official release to test this in a docker container.

@Spacetown Spacetown removed this from the Upcoming release milestone Dec 17, 2023
@Spacetown
Copy link
Member

@marxin Do you know when gcc-14 will be released? In this opens use still format version 1 is generated.

@marxin
Copy link
Author

marxin commented Feb 23, 2024

At the end of April I guess.

@Pesa
Copy link
Contributor

Pesa commented Feb 23, 2024

@Spacetown if you want to test with gcc 14, you can use these snapshots from trunk for the time being.

@Spacetown
Copy link
Member

@Pesa Thanks for this. I'll give it a try.

@Spacetown
Copy link
Member

@Pesa I get the following error:

25.41 Processing triggers for libc-bin (2.35-0ubuntu3.6) ...
150.0 dpkg: error processing archive gcc-latest.deb (--install):
150.0  package architecture (amd64) does not match system (arm64)
150.0 Errors were encountered while processing:
150.0  gcc-latest.deb

@Pesa
Copy link
Contributor

Pesa commented Feb 24, 2024

150.0 package architecture (amd64) does not match system (arm64)

The package is built for amd64, are you using an arm machine? If so, the error is expected. Personally, I've used the gcc-latest package in Github Actions (ubuntu runners) and it's been working fine.

@Spacetown
Copy link
Member

I'm working on a Mac with M1 chip.

@Spacetown
Copy link
Member

@marxin I continue work on reading the JSON format. Meanwhile gcovr also added the function coverage data from the gcov text report (function (.*?) called (INT) returned (VALUE) blocks executed (VALUE)) and in the JSON format the returned is missing for functions. I only get following values:

{
    'name': '_Z3fooi',
    'demangled_name': 'foo(int)',
    'start_line': 3,
    'start_column': 5,
    'end_line': 13,
    'end_column': 1,
    'blocks': 4,
    'blocks_executed': 3,
    'execution_count': 1
}

@marxin
Copy link
Author

marxin commented May 5, 2024

Please file a bug report to the official GCC bugzilla: https://gcc.gnu.org/bugzilla/.
Thanks!

@Spacetown
Copy link
Member

@Spacetown
Copy link
Member

I expect that the IDs are added to the generated file the same way as it is in the human readable file.

As explained, I don't like the way they are emitted in human-readable format and thus I don't want to introduce them to JSON format. Both branches and calls are present as a list that belongs to a line, so the order is given.

The branches are listed under the line in the JSON but the relationship to which block in the line the branch belongs to is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Format: JSON Gcov help wanted not possible for now This feature is not possible because of restrictions from other ones Type: Enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants