For issue #528: Add linking database #529

mamaria-k · 2023-06-01T19:04:17Z

This issue was created to suggest improvements. Adding the ability to create a linking database.

Changes:

linking database is created when calling bear or citnames with flag --with-link.
output jsons contains fields: arguments, directory, file, files, output (optional). "Arguments" contain full command. In compile database "files" contains all libs and dependencies. In link database "file" is empty, "files" contains all linking files and dependencies.
in the linking database correct order of entries
processing case of compilation and linking in one command
processing case of ar and ranlib
add filtering events (so that calls to subcommands do not create a partial duplication of already recorded commands)
add unit tests and functional tests
manually test on some big open-source projects
update documentation

Now the filtering works considering that the events are sorted. However, I saw that with a small probability they are unsorted. Therefore, handling of the case of unsorted events will be added soon.

rizsotto

Hey Maria, thanks for this improvement.

I'm traveling at the moment, so can't review it now. And this is an important feature, so I would like to read it properly. I'm back in mid July.

Have you seen the older tickets (might be closed) which were touching this subject? And have you considered using the config file instead of the commands line arguments?

mamaria-k · 2023-06-02T10:08:07Z

Initially, I made these changes for another project, so I first of all started from the described requirements. I didn't look at previous tickets. But I'm certainly willing to make any changes.

mamaria-k · 2023-06-02T10:18:23Z

I'm sorry, but is it true that you will be able to see them only after mid July? And do you have plans to work with this pr and add it (after all the changes)? I'm asking this because it's a pretty big change.

rizsotto · 2023-06-02T22:01:14Z

I'm sorry, but is it true that you will be able to see them only after mid July? And do you have plans to work with this pr and add it (after all the changes)? I'm asking this because it's a pretty big change.

I'm back in mid July, that's true. :) Might be able to have access to a laptop in June, so I can take a quick look.

This feature was requested earlier, so I'm interested to see your implementation... And I know to implement this feature will require making some important decisions. To mention one: shall the new entries break the JSON compilation database spec? If not, then how to represent a linking call as compilation? (One of the previous discussions with c2rust transpiller authors is captured in the issues. That's why I was suggesting to look into older issues.)

Also, I would prefer to have small PRs, instead of a big one. I know it's hard, but I also know it is possible. :)

Sorry for not being available now, but that's temporary.

mamaria-k · 2023-06-03T10:36:51Z

Okay, I'll close this PR and open some smaller ones. In the process, there were indeed some decisions to be made. I would suggest that you first familiarize yourself with my solutions and then discuss how successful they are and how much they correspond to your vision. And then change them.

mamaria-k · 2023-06-13T11:50:47Z

Unfortunately, it turned out that the changes in different files are quite strongly related, so it was not possible to isolate independent meaningful parts for creating a PRs. If you have any suggestions, I'm ready to listen to them.

rizsotto

Thanks for this PR @mamaria-k !

In general, I would like to make the linking support available through the configuration file (without command line flag changes). I believe that majority of users are only interested in compilations, and just a few people will be interested in the linking commands. I prefer to keep this feature out of the way for people who are just creating the compilation database for "normal" use cases. And those who want to get the link commands, will be considered as "advanced" users and have the skills to write a config file.

The other thing I would like to clarify is where the output goes...

Your solution was using a separate file for the link commands, with a special syntax.
While the Support link commands in the output #276 issue was suggest to write that into the same output file, and the file field of the entry contained all the link related information. (See here.)
My proposal would be to serialize the cs::Semantic objects into the output file. Therefore all information that citnames was detected would be accessible. Then with some post processing (JSON processing is easy in many languages) people can filter the desired entries and format into whatever they like. (This project might provide a few default scripts if that's desired.)

I think the 3rd option gives the more flexibility. And learning from use cases later this project can be favor one "final" output format. Which might be ideal, since it's not clear what are the use cases of the link output. :)

.github/workflows/build_on_arch.yml

rizsotto · 2023-06-27T17:40:05Z

source/bear/man/bear.1.md

@@ -34,16 +34,26 @@ compilation database.
 \--verbose
 :   Enable verbose logging.

-\--output *file*
-:   Specify output file. (Default file name provided.) The output is
-    a JSON compilation database.


I would like to keep the --output flag, so please don't remove it.

Ok, returned the original set of flags.

rizsotto · 2023-06-27T17:49:38Z

source/citnames/source/semantic/Tool.h

@@ -34,7 +39,7 @@ namespace cs::semantic {

        // Returns the semantic of a command execution.
        [[nodiscard]]
-        virtual rust::Result<SemanticPtr> recognize(const Execution &) const = 0;
+        virtual rust::Result<SemanticPtr> recognize(const Execution &, const BuildTarget target) const = 0;


Could we keep the signature of this method? The idea would be that tools are recognizing everything they could, and the separation of the linker and compiler calls would be based on the Semantic instance type.

Maybe change the CompilerCall::into_entries as virtual std::list<cs::Entry> into_entries(bool only_compilations = true) const = 0?

See point 3 of the comment below.

mamaria-k · 2023-06-29T10:03:19Z

About support for the linking option in the configuration file. Do you mean storing in a file participating in the compilation (like config.h.in) or should it be a file (maybe json), after changing which the project does not need to be rebuilt?

rizsotto · 2023-06-29T11:03:08Z

Bear has a config file, which controls what should be in the output file. See citnames man page.

* Add patch for linking database

* Fix arguments formation for Entry, fix unit tests * Change saving arguments and fix tests * Add functional tests for linking

* Add files count for linking checking * Fix files validation and content filter checking * Fix processing case with ar * Fix test name * Update unit test

* Add processing of linking flags -l -Wl -L * Separate -static flag from -static-s * Supplement testing system, add processing libs for compilation database

* Fix case with unsorted events

…tabase to the configuration file

mamaria-k · 2023-06-29T15:12:21Z

Then I have a few questions.

Now both output databases have the same entry format. It seems to me convenient and understandable, it would also be possible to add to it the storage of meta-information (pid and others to help with filtering). Do you like the option when, as a result, the user will receive one file containing all the records (in order to parse it later, as you suggested), and the entries in the file will be the format I proposed?
Now the events are filtered, because when the linker is called, subcommands are called that duplicate the main linking command. Therefore, now some subtrees of the call tree are ignored, so that in the end the user receives a linking database, the entries in which are enough to run in order to build the project. If, based on question 1, this will be a single common output file, should it contain only entries from filtered (as it is now) link events, or do you want to store all entries in general? The second option has a problem: now there are no parsers for some linker subcommands, and I may not have time to write them.
About semantic of method recognize. This decision was due to the fact that some calls contain both compilation and linking. That is, after parsing, they must contain two entries, which the method signature does not allow. Currently, in such cases, the command is divided into two: compilation (maybe, some) and linking. The name of a possible intermediate file (after compilation) is generated. If in the end we want to get a common database, then I can make a separate Semantic for such "double" events. But first we need to decide on two questions. In the single output database, how do you want to handle such cases? If there is only one record, then how to parse it later for the default options "leave only compilation" and "leave only linking".

mamaria-k · 2023-09-04T13:21:58Z

@rizsotto, questions are still relevant.

rizsotto · 2023-09-05T10:37:35Z

Hey @mamaria-k ,

I would use a single output file (which can be either the JSON compilation database, or the raw recognized "semantic" objects). I will talk about the semantic objects now...
I would emit recognized semantic types (both compilation and linking in the same file)
It is possible to emit the original execution (pid and others) together with the recognized semantic. { "semantic": {...}, "execution": {...} }
To get rid of the problem of commands like cc -o a.out source1.c source2.c -lpthread, we can define a single semantic object that contains multiple "passes", so a single semantic object would represent all the things the compiler is doing. { "compiler": "/usr/bin/cc", "passes": [ {...} ] } where the {...} can be like this for compilations { "type": "compilation", "source": "source1.c", "flags": [], "output": "source1.o" } or like this for linking { "type": "linking", "objects": ["source1.o", "source2.o"], "libraries": ["-lpthread"], "flags": [], "output": "a.out" }
Can try to ignore process sub-commands (the ld executed by cc when linking is happening). Not sure if that's "easy" to do or how reliable is that. (My original idea was to include the PID and that would help to reconstruct the process tree. Later I've learned that during a Linux kernel compilations the PIDs are recycled. That's why the "reporter id" was introduced to be completely unique.) There is a good chance that duplicates can be detected by simple looking the "output" field. (It's highly unlikely that builds are overwriting the same executable. But can be problematic with ar, which can mutate archives. Though I have not seen builds which are modifying archives.)
About the Tool::recognize method, I would still only pass the execution as argument and that should return a single semantic object.

SaifRushdHadad · 2024-05-13T19:32:08Z

Is there any update on the status of this issue and the associated pull request? Just checking to see if there's still interest in moving this forward.

Also, I came across a similar approach in a cmake pull request that's still open here: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/6009. It might be relevant to our discussion?

mamaria-k · 2024-05-14T04:25:48Z

There is interest, but I will be able to continue working in mid-summer.

rizsotto reviewed Jun 2, 2023

View reviewed changes

mamaria-k closed this Jun 9, 2023

mamaria-k changed the title ~~For issue #528: Draft: Add linking database~~ For issue #528: Add linking database Jun 9, 2023

mamaria-k reopened this Jun 9, 2023

rizsotto reviewed Jun 27, 2023

View reviewed changes

mamaria-k and others added 5 commits June 29, 2023 14:24

Linking database (#2)

ac084ec

* Add patch for linking database

Save all arguments (#3)

559b2da

* Fix arguments formation for Entry, fix unit tests * Change saving arguments and fix tests * Add functional tests for linking

After testing: fix case with ar, content filter, debug message (#7)

8ba199e

* Add files count for linking checking * Fix files validation and content filter checking * Fix processing case with ar * Fix test name * Update unit test

Add processing of linking flags -l -Wl -L (#8)

b590597

* Add processing of linking flags -l -Wl -L * Separate -static flag from -static-s * Supplement testing system, add processing libs for compilation database

Fix case with unsorted events (#9)

1996895

* Fix case with unsorted events

mamaria-k force-pushed the main-for-pr-to-original-repo branch from aa23f24 to 1996895 Compare June 29, 2023 11:24

mamaria-k added 2 commits June 29, 2023 17:07

Return original flags and add setting up the creation of a linking da…

a190fa1

…tabase to the configuration file

Small fix

fae579c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For issue #528: Add linking database #529

For issue #528: Add linking database #529

mamaria-k commented Jun 1, 2023 •

edited

rizsotto left a comment

mamaria-k commented Jun 2, 2023

mamaria-k commented Jun 2, 2023

rizsotto commented Jun 2, 2023

mamaria-k commented Jun 3, 2023

mamaria-k commented Jun 13, 2023

rizsotto left a comment

rizsotto Jun 27, 2023

mamaria-k Jun 29, 2023

rizsotto Jun 27, 2023

mamaria-k Jun 29, 2023

mamaria-k commented Jun 29, 2023

rizsotto commented Jun 29, 2023

mamaria-k commented Jun 29, 2023 •

edited

mamaria-k commented Sep 4, 2023

rizsotto commented Sep 5, 2023

SaifRushdHadad commented May 13, 2024

mamaria-k commented May 14, 2024

For issue #528: Add linking database #529

Are you sure you want to change the base?

For issue #528: Add linking database #529

Conversation

mamaria-k commented Jun 1, 2023 • edited

rizsotto left a comment

Choose a reason for hiding this comment

mamaria-k commented Jun 2, 2023

mamaria-k commented Jun 2, 2023

rizsotto commented Jun 2, 2023

mamaria-k commented Jun 3, 2023

mamaria-k commented Jun 13, 2023

rizsotto left a comment

Choose a reason for hiding this comment

rizsotto Jun 27, 2023

Choose a reason for hiding this comment

mamaria-k Jun 29, 2023

Choose a reason for hiding this comment

rizsotto Jun 27, 2023

Choose a reason for hiding this comment

mamaria-k Jun 29, 2023

Choose a reason for hiding this comment

mamaria-k commented Jun 29, 2023

rizsotto commented Jun 29, 2023

mamaria-k commented Jun 29, 2023 • edited

mamaria-k commented Sep 4, 2023

rizsotto commented Sep 5, 2023

SaifRushdHadad commented May 13, 2024

mamaria-k commented May 14, 2024

mamaria-k commented Jun 1, 2023 •

edited

mamaria-k commented Jun 29, 2023 •

edited