Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For issue #528: Add linking database #529

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

mamaria-k
Copy link
Contributor

@mamaria-k mamaria-k commented Jun 1, 2023

This issue was created to suggest improvements. Adding the ability to create a linking database.

Changes:

  • linking database is created when calling bear or citnames with flag --with-link.
  • output jsons contains fields: arguments, directory, file, files, output (optional). "Arguments" contain full command. In compile database "files" contains all libs and dependencies. In link database "file" is empty, "files" contains all linking files and dependencies.
  • in the linking database correct order of entries
  • processing case of compilation and linking in one command
  • processing case of ar and ranlib
  • add filtering events (so that calls to subcommands do not create a partial duplication of already recorded commands)
  • add unit tests and functional tests
  • manually test on some big open-source projects
  • update documentation

Now the filtering works considering that the events are sorted. However, I saw that with a small probability they are unsorted. Therefore, handling of the case of unsorted events will be added soon.

Copy link
Owner

@rizsotto rizsotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Maria, thanks for this improvement.

I'm traveling at the moment, so can't review it now. And this is an important feature, so I would like to read it properly. I'm back in mid July.

Have you seen the older tickets (might be closed) which were touching this subject? And have you considered using the config file instead of the commands line arguments?

@mamaria-k
Copy link
Contributor Author

Initially, I made these changes for another project, so I first of all started from the described requirements. I didn't look at previous tickets. But I'm certainly willing to make any changes.

@mamaria-k
Copy link
Contributor Author

I'm sorry, but is it true that you will be able to see them only after mid July? And do you have plans to work with this pr and add it (after all the changes)? I'm asking this because it's a pretty big change.

@rizsotto
Copy link
Owner

rizsotto commented Jun 2, 2023

I'm sorry, but is it true that you will be able to see them only after mid July? And do you have plans to work with this pr and add it (after all the changes)? I'm asking this because it's a pretty big change.

I'm back in mid July, that's true. :) Might be able to have access to a laptop in June, so I can take a quick look.

This feature was requested earlier, so I'm interested to see your implementation... And I know to implement this feature will require making some important decisions. To mention one: shall the new entries break the JSON compilation database spec? If not, then how to represent a linking call as compilation? (One of the previous discussions with c2rust transpiller authors is captured in the issues. That's why I was suggesting to look into older issues.)

Also, I would prefer to have small PRs, instead of a big one. I know it's hard, but I also know it is possible. :)

Sorry for not being available now, but that's temporary.

@mamaria-k
Copy link
Contributor Author

Okay, I'll close this PR and open some smaller ones. In the process, there were indeed some decisions to be made. I would suggest that you first familiarize yourself with my solutions and then discuss how successful they are and how much they correspond to your vision. And then change them.

@mamaria-k mamaria-k closed this Jun 9, 2023
@mamaria-k mamaria-k changed the title For issue #528: Draft: Add linking database For issue #528: Add linking database Jun 9, 2023
@mamaria-k mamaria-k reopened this Jun 9, 2023
@mamaria-k
Copy link
Contributor Author

Unfortunately, it turned out that the changes in different files are quite strongly related, so it was not possible to isolate independent meaningful parts for creating a PRs. If you have any suggestions, I'm ready to listen to them.

Copy link
Owner

@rizsotto rizsotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @mamaria-k !

In general, I would like to make the linking support available through the configuration file (without command line flag changes). I believe that majority of users are only interested in compilations, and just a few people will be interested in the linking commands. I prefer to keep this feature out of the way for people who are just creating the compilation database for "normal" use cases. And those who want to get the link commands, will be considered as "advanced" users and have the skills to write a config file.

The other thing I would like to clarify is where the output goes...

  • Your solution was using a separate file for the link commands, with a special syntax.
  • While the Support link commands in the output #276 issue was suggest to write that into the same output file, and the file field of the entry contained all the link related information. (See here.)
  • My proposal would be to serialize the cs::Semantic objects into the output file. Therefore all information that citnames was detected would be accessible. Then with some post processing (JSON processing is easy in many languages) people can filter the desired entries and format into whatever they like. (This project might provide a few default scripts if that's desired.)

I think the 3rd option gives the more flexibility. And learning from use cases later this project can be favor one "final" output format. Which might be ideal, since it's not clear what are the use cases of the link output. :)

.github/workflows/build_on_arch.yml Outdated Show resolved Hide resolved
@@ -34,16 +34,26 @@ compilation database.
\--verbose
: Enable verbose logging.

\--output *file*
: Specify output file. (Default file name provided.) The output is
a JSON compilation database.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep the --output flag, so please don't remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, returned the original set of flags.

@@ -34,7 +39,7 @@ namespace cs::semantic {

// Returns the semantic of a command execution.
[[nodiscard]]
virtual rust::Result<SemanticPtr> recognize(const Execution &) const = 0;
virtual rust::Result<SemanticPtr> recognize(const Execution &, const BuildTarget target) const = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we keep the signature of this method? The idea would be that tools are recognizing everything they could, and the separation of the linker and compiler calls would be based on the Semantic instance type.

Maybe change the CompilerCall::into_entries as virtual std::list<cs::Entry> into_entries(bool only_compilations = true) const = 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See point 3 of the comment below.

@mamaria-k
Copy link
Contributor Author

About support for the linking option in the configuration file. Do you mean storing in a file participating in the compilation (like config.h.in) or should it be a file (maybe json), after changing which the project does not need to be rebuilt?

@rizsotto
Copy link
Owner

Bear has a config file, which controls what should be in the output file. See citnames man page.

mamaria-k and others added 5 commits June 29, 2023 14:24
* Add patch for linking database
* Fix arguments formation for Entry, fix unit tests

* Change saving arguments and fix tests

* Add functional tests for linking
* Add files count for linking checking

* Fix files validation and content filter checking

* Fix processing case with ar

* Fix test name

* Update unit test
* Add processing of linking flags -l -Wl -L

* Separate -static flag from -static-s

* Supplement testing system, add processing libs for compilation database
* Fix case with unsorted events
@mamaria-k mamaria-k force-pushed the main-for-pr-to-original-repo branch from aa23f24 to 1996895 Compare June 29, 2023 11:24
@mamaria-k
Copy link
Contributor Author

mamaria-k commented Jun 29, 2023

Then I have a few questions.

  1. Now both output databases have the same entry format. It seems to me convenient and understandable, it would also be possible to add to it the storage of meta-information (pid and others to help with filtering). Do you like the option when, as a result, the user will receive one file containing all the records (in order to parse it later, as you suggested), and the entries in the file will be the format I proposed?
  2. Now the events are filtered, because when the linker is called, subcommands are called that duplicate the main linking command. Therefore, now some subtrees of the call tree are ignored, so that in the end the user receives a linking database, the entries in which are enough to run in order to build the project. If, based on question 1, this will be a single common output file, should it contain only entries from filtered (as it is now) link events, or do you want to store all entries in general? The second option has a problem: now there are no parsers for some linker subcommands, and I may not have time to write them.
  3. About semantic of method recognize. This decision was due to the fact that some calls contain both compilation and linking. That is, after parsing, they must contain two entries, which the method signature does not allow. Currently, in such cases, the command is divided into two: compilation (maybe, some) and linking. The name of a possible intermediate file (after compilation) is generated. If in the end we want to get a common database, then I can make a separate Semantic for such "double" events. But first we need to decide on two questions. In the single output database, how do you want to handle such cases? If there is only one record, then how to parse it later for the default options "leave only compilation" and "leave only linking".

@mamaria-k
Copy link
Contributor Author

@rizsotto, questions are still relevant.

@rizsotto
Copy link
Owner

rizsotto commented Sep 5, 2023

Hey @mamaria-k ,

  • I would use a single output file (which can be either the JSON compilation database, or the raw recognized "semantic" objects). I will talk about the semantic objects now...
  • I would emit recognized semantic types (both compilation and linking in the same file)
  • It is possible to emit the original execution (pid and others) together with the recognized semantic. { "semantic": {...}, "execution": {...} }
  • To get rid of the problem of commands like cc -o a.out source1.c source2.c -lpthread, we can define a single semantic object that contains multiple "passes", so a single semantic object would represent all the things the compiler is doing. { "compiler": "/usr/bin/cc", "passes": [ {...} ] } where the {...} can be like this for compilations { "type": "compilation", "source": "source1.c", "flags": [], "output": "source1.o" } or like this for linking { "type": "linking", "objects": ["source1.o", "source2.o"], "libraries": ["-lpthread"], "flags": [], "output": "a.out" }
  • Can try to ignore process sub-commands (the ld executed by cc when linking is happening). Not sure if that's "easy" to do or how reliable is that. (My original idea was to include the PID and that would help to reconstruct the process tree. Later I've learned that during a Linux kernel compilations the PIDs are recycled. That's why the "reporter id" was introduced to be completely unique.) There is a good chance that duplicates can be detected by simple looking the "output" field. (It's highly unlikely that builds are overwriting the same executable. But can be problematic with ar, which can mutate archives. Though I have not seen builds which are modifying archives.)
  • About the Tool::recognize method, I would still only pass the execution as argument and that should return a single semantic object.

@SaifRushdHadad
Copy link

Is there any update on the status of this issue and the associated pull request? Just checking to see if there's still interest in moving this forward.

Also, I came across a similar approach in a cmake pull request that's still open here: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/6009. It might be relevant to our discussion?

@mamaria-k
Copy link
Contributor Author

There is interest, but I will be able to continue working in mid-summer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants