Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Public API for traversing the design hierarchy / AST #219

Open
kraigher opened this issue Nov 12, 2023 · 25 comments
Open

Public API for traversing the design hierarchy / AST #219

kraigher opened this issue Nov 12, 2023 · 25 comments

Comments

@kraigher
Copy link
Member

kraigher commented Nov 12, 2023

We need to create a stable public API to traverse the design hierarchy and the AST. I am hoping @Schottkyc137 and @skaupper can contribute their ideas and feedback here. My goal is to replace the Search trait and the Visitor trait with this implementation. First we need to gather some requirements than we can discuss the implementation.

Performance requirements

  • Must not perform heap allocation

Public requirements

Library

  • Must be able to get library by name
  • Must be able to iterate over all libraries

Design units

  • Must be able to iterate over all design units in a source file
  • Must be able to get primary units from library by name
  • Must be able to iterate over all primary units in a libary
  • Must be able to iterate over secondary units of a primary unit
  • Must be able to get context clause of a design unit

Declarations & Statements

  • Must be able to iterate over all declarations, sequential statements, concurrent statements in a design unit
  • Must be able to iterate over all declarations and statements nested within other declarations and statements
  • Must be iterate over declarations and statements in the order they appear in the source file.

Navigation

  • Get referenced named entity (AnyEnt) from source location
  • Get referenced design unit/statement/declaration in AST from named entity (AnyEnt)
  • Get all source locations referencing named entity (AnyEnt)

Private requirements

  • Must be able to clear all references in the entire AST
@Schottkyc137
Copy link
Contributor

Some preliminary thoughts from myself without getting too much into details:

  • Tool Directives
    Currently, tool directives are tokenized, but ignored afterwards so that no false positives are detected. However, a public API should make some methods / traits available to hook into the tokenizing process so that third-parties can utilize vhdl_lang to perform tasks on tool directives. Potential use-cases include the protect tool directive and a potential vhdl_lang tool directive to mute listing errors

  • Scope
    I believe that this discussion could benefit from first defining use-cases of vhdl_lang. In my opinion, the design of a public API will benefit from making this clear at first. I can think of the following use-cases which would be of interred for me but others will probably be interested in this as well:

    • Tool directives for third-parties (see above)
    • Code Formatters
    • Documentation extractor (Referring to @skaupper's work)
    • Linters (should linting not be a concern of a public API since it's already part of the private API?)
    • Testbench generation or other code generation from VHDL
    • Code inspection in general (See Feature request: expose tokenizer in public API #203 for example)
    • Integration with other HDL's / Language servers of other HDL's (There was interest in the Chat for some Verilog / VHDL shared language server)
    • Long-term goal: Simulator / full-fledged compiler for VHDL

Are any of these out of scope? Did I miss anything? What do you think?

@kraigher
Copy link
Member Author

Regarding scope I am a believer in adding only what is currently needed. We can add more stuff in the future when there is a use case that can drive the design. I find one rarely gets the API right if you add everything up front.

For now I think the goal should be to have a nice API without duplication of Search/Visitor that supports completions, lsp and document generation as the initial scope.

@skaupper
Copy link
Contributor

For navigation I want to add, that getting parents declarations from AST elements is important as well. Which may already be possible with AnyEnt (I have not looked into it too deeply).

Even if it is not immediately required, it would probably be good to ensure that AST elements are traversed in the order they appear in the source code, as that would probably be rather hard to rework after the fact. The current implementation does seem to do that as well.

I may encounter further things when I come around to play with the current implementation.

@kraigher
Copy link
Member Author

For navigation I want to add, that getting parents declarations from AST elements is important as well. Which may already be possible with AnyEnt (I have not looked into it too deeply).

AnyEnt has a parent field. So this is already supported.

Even if it is not immediately required, it would probably be good to ensure that AST elements are traversed in the order they appear in the source code, as that would probably be rather hard to rework after the fact. The current implementation does seem to do that as well.

Yes this has always been a goal so that sorting on srcpos is not necessary. I add it as an explicit requirement above.

@kraigher
Copy link
Member Author

There are two main things I have in mind for the API:

  1. I think the API should offer a simple way to iterate over libraries, design units, statements and declarations. I think the client code needs some control over when the nesting happens when iterating recursively. The reason is I think it is likely that a client code would like to have some context where they keep track of where they are in the AST and this is not easy with a recursive visitor pattern as you have no control over the nesting of calls.

  2. I think we do not need a general public visitor API to visit every small intermediate node in the AST below the statement/declaration level. For example why would we need a visitor function for a qualified expression? What is a client supposed to do with a visiting a qualified expression out of context? I think it is enough to have a way to find all references within an AST subtree and a way to go from SrcPos to a reference.

With these two things in mind I think we could create a type like this:

enum NodeRef<'a> {
   Declaration(&'a ast::Declaration),
   Sequential(&'a ast::Sequential),,
   Concurrent(&'a ast::Concurrent),
}

trait<'a> NodeVisitor<'a>  {
  // Only iterates over immediate children, not recursively
  fn for_each_child(&self, fun: FnMut(NodeRef<'a>));
  // An alternative is to use impl Iterator<Item = NodeRef<'a>> but I think the type inferred by impl Iterator would be complex to create without Box<dyn Iterator> however maybe the heap allocation is not so bad as @Schottkyc137 benchmarked.
}

The API on the top level would be the Root which I think could be created as a thin facade over the internal DesignRoot

impl <'a> Root<'a>  {
   fn libraries(&self) -> impl Iterator<item = Library<'a>> {}
   fn get_library(&self, sym: &Symbol) -> Option<Library<'a>> {}
}

impl <'a> Library<'a>  {
   fn primary_design_units(&self) -> impl Iterator<item = DesignUnit<'a>> {}
   fn get_primary_unit(&self, sym: &Symbol) -> Option<DesignUnit<'a>> {}
}

impl <'a> DesignUnit<'a>  {
   fn secondary_design_units(&self) -> impl Iterator<item = DesignUnit<'a>> {}
   fn get_secondary_unit(&self, sym: &Symbol) -> Option<DesignUnit<'a>> {}
   fn context_clause(&self) -> ContextClause<&'a> {}
}

impl<'a> NodeVisitor<'a> for DesignUnit {}

@skaupper
Copy link
Contributor

What I really like about the current visitor implementation is that simple use cases require simple code. The use case Get all procedure declarations can be achieved by only implementing the visit_subprogram_declaration method of the Visitor trait. But I agree with you that more context/control over how the AST is traversed is probably needed.

Although I do not know how easy it is to use custom iterators, they would allow the user a great amount of control, if needed.

I could imagine uses like these:

let design_root = ...;
// Get all primary units from the design
let primary_units_iter = design_root.get_primary_units(); // impl Iterator<item = DesignUnit>
// Get all declarations from the design
let declarations_iter = design_root.get_declarations(); // impl Iterator<item = Declaration>

By using iterators where possible the user would have access to iterator functions like .map(...) and .filter(...).

The relevant traits could look like this:

trait PrimaryUnitVisitor {
    fn get_primary_units(&self) -> impl Iterator<item = PrimaryUnit>;
}

trait DeclarationVisitor {
    fn get_declarations(&self) -> impl Iterator<item = Declaration>;
}

By implementing them for the right types (DesignRoot, PrimaryRoot and even Iterator<item=PrimaryUnit>) I imagine that even the following might be possible:

// Get all declarations from all primary units (even .filter(...) might be possible between get_primary_units and get_declarations)
let declarations_iter = design_root.get_primary_units().get_declarations(); // impl Iterator<item = Declaration>

Please bear in mind that there may be some roadblocks I did not consider (because of lack of experience), but I think an iterator based API (if possible) would be quite versatile and powerful.

@kraigher
Copy link
Member Author

@skaupper A context free vistor API could always be built as an utility on top of the API I proposed. I do not think we need a visitor function for every litte thing from day one though. I would rather have the use cases drive the design when they occur. Day one is enough with an API for the current use cases.

Regarding methods returning impl Iterator in traits I think it is not possible yet. Maybe works now with generic associated types, have to check. You have to use Box<dyn Iterator> otherwise

@skaupper
Copy link
Contributor

Day one is enough with an API for the current use cases.

That's fine for me. I wasn't proposing a visitor API as it is now, just wanted to emphasize its simplicity in cases where context is not needed. For the initial design it should be sufficient to iterate declarations and statements.

Regarding methods returning impl Iterator in traits I think it is not possible yet

Too bad that is not possible as of right now.

I do like your API suggestion, though. Equivalent to primary_design_units, secondary_design_units etc. shouldn't the following be also possible?

impl <'a> DesignUnit<'a>  {
    fn children(&self) -> impl Iterator<item = NodeRef<'a>> {}
}

If possible, I would prefer it to write design_unit.children().for_each(...) instead of design_unit.for_each_child(...) because the user would still have access to all other Iterator functions (like map and filter) without being that much more verbose. What do you think?

@kraigher
Copy link
Member Author

Maybe the right trade-off in this case is to sacrifice a little bit of performance and use a boxed dyn Iterator:

fn children(&self) -> Box<dyn Iterator<Item=NodeRef<'a>>>

It is at least better than using a Vec I think.

@kraigher
Copy link
Member Author

As it seems the overall design is ok I will create an implementation of my proposal and make an PR so we can discuss more details.

@kraigher
Copy link
Member Author

I made a first draft here: #228

@kraigher
Copy link
Member Author

I added a benchmark that can be run with cargo bench on master so we can have something to compare with when we replace the Search trait with a new api.

@kraigher
Copy link
Member Author

I am currently also working on a bottom-up refactor to bring the Searcher and the Visitor closer together. I already fixed that the Searcher does not need &mut on the AST. I use AtomicUsize to clear references instead.

@kraigher
Copy link
Member Author

@Schottkyc137 I did some benchmarking on Search trait vs Visitor. You can se what I did on the benchmark branch.

visit entire ast       197.22 ms        153/153
search entire ast       25.59 ms    1,168/1,171

As you can see the Search trait is almost 8x faster. I will try to rewrite the Visitor to use an iterator instead of a vec and see if it helps.

@Schottkyc137
Copy link
Contributor

Schottkyc137 commented Nov 23, 2023

@Schottkyc137 I did some benchmarking on Search trait vs Visitor. You can se what I did on the benchmark branch.

visit entire ast       197.22 ms        153/153
search entire ast       25.59 ms    1,168/1,171

As you can see the Search trait is almost 8x faster. I will try to rewrite the Visitor to use an iterator instead of a vec and see if it helps.

Alright then it's time to ditch the visitor (at least in its current implementation) :D
My initial idea was that it's somehow more performant in certain circumstances than the Searcher because it avoids recursion. But the benefit (if there even is any) is likely minuscule compared to the cost of dynamic allocation and also vtable lookup. Can help in any refactoring if there is anything you'd like me to do.

@kraigher
Copy link
Member Author

Can help in any refactoring if there is anything you'd like me to do.

Not at the moment. I will try to modify the Search trait and the FoundDeclaration enum so that it can solve the current use in AutocompletionVisitor and then remove visitor.rs. I it will not be much work and think I can do that without help. After that I very much appreciate you input on the future public API. I would still like to offer some kind of iterator-based API like the one proposed here.

@kraigher
Copy link
Member Author

@Schottkyc137 I added the stuff needed by completion.rs to the Search trait and used it instead there. Then I removed the Visitor trait so we only have one implementation.

@skaupper
Copy link
Contributor

@kraigher Regarding feedback: I hope that I find some time this week to throw together an example to get a feel for the API.

From what I've already tried, it seems a bit problematic if a whole Project is needed when all I want is just to iterate the AST. I will do a more detailed write-up as soon as I got to it.

@kraigher
Copy link
Member Author

@skaupper My idea would be to have a facade for the design hierarchy named Root that would allow iteration over libraries, design units and their declarations and statements. You would still need a Project to analyze the design but you would be able to get a &Root out from the Project. In this way we could move all the AST queries such as find_all_refernces from Project to Root.

@skaupper
Copy link
Contributor

skaupper commented Dec 1, 2023

So, I played around with it and looked mostly at two things:

  1. How to get the Project/Root/DesignRoot instance?
  2. How would I like to iterate through libraries and design units.

For 1. I tried to be as concise as possible (If I have a single VHDL file, how do I get the AST for it?), which resulted in the following code:

let config = Config::read_file_path(Path::new("../rust_hdl/vhdl_libraries/vhdl_ls.toml")).unwrap();

let mut project = Project::from_config(config, &mut NullMessages);
project.update_source(&Source::from_latin1_file(Path::new("./vhdl/package.vhd")).unwrap());
project.analyse();

let root = project.to_api_root();

What I do not like is, that analysing the project fails with a panic if std/standard.vhd is not part of the project. For small applications like docu extractors, code formatters etc. which do not necessarily parse whole libraries at once, needing to provide a copy of the standard library everytime you do something is at least annoying.


For 2. I played around with the iterator API.

let lib = root
    .libraries()
    .find(|lib| lib.name() == "work")
    .expect("Did not find library work.");
println!("Library: {}", lib.name());

for primary_unit in lib.primary_units() {
    println!(
        "  Primary: {}",
        match primary_unit.get() {
            AnyPrimaryUnit::Package(p) => p.ident.to_string(),
            AnyPrimaryUnit::Entity(e) => e.ident.to_string(),
            AnyPrimaryUnit::Context(c) => c.ident.to_string(),
            _ => "<not implemented>".to_string(),
        }
    );

    for secondary_unit in primary_unit.secondary_units() {
        println!(
            "    Secondary: {}",
            match secondary_unit.get() {
                AnySecondaryUnit::Architecture(a) => a.ident.to_string(),
                AnySecondaryUnit::PackageBody(p) => p.ident.to_string(),
            }
        );
    }
}

The simple for-loops work just fine, except that convenience functions like .name() are still missing, but with the wrappers around the AST elements, such functions could not be easier to implement!

What does not work is chaining iterator functions in cases like this:

let packages_in_work: Vec<_> = root
    .libraries()
    .filter(|lib| lib.name() == "work")
    .flat_map(Library::primary_units)  // This does not work!
    .filter(|pu| matches!(pu.get(), AnyPrimaryUnit::Package(_)))
    .collect();

// Do something with the filtered units
packages_in_work.iter().for_each(|p| println!("{}", p.name()));

Since Library::primary_units (as well as Primary::secondary_units and HasNodes::children) do not consume the value, these function chains cannot work.


The last example of 2. can be fixed with some rather small changes, like providing consuming variants of Primary::secondary_units etc. and additionally providing a variant of analysis::root::Library::secondary_units which does not take a &Symbol, but either a Symbol or even an usize (the unique symbol ID). Any thoughts on the (rather simple) examples, and the potential fixes?

For 1. I did not try to find a workaround/fix but it seems like some fields of DesignRoot are not populated without the standard library being parsed. I am not familiar with the analysis code. Would you see an elegant to avoid the panics while keeping DesignRoot as functional as it can be without the standard types etc.?

@skaupper
Copy link
Contributor

skaupper commented Dec 1, 2023

On a different note: It seems to be a common pattern to expect &Symbols instead of strings as arguments to some query functions (like Root::get_library or Library::get_primary_unit).

What is your preferred way to obtain Symbols, without already having the respective library/unit at hand?

@kraigher
Copy link
Member Author

kraigher commented Dec 4, 2023

What I do not like is, that analysing the project fails with a panic if std/standard.vhd is not part of the project. For small applications like docu extractors, code formatters etc. which do not necessarily parse whole libraries at once, needing to provide a copy of the standard library everytime you do something is at least annoying.

Doing analysis without the standard library is not feasible. Without analysis there would be no semantic information such as symbol references from use to declaration. What you want is just doing the parsing, that is still possible to do and you would get an AST without any references set. When only doing parsing there is not much use for the Project at and you might as well use the parser directly. This use case would thus be solved by adding the parser to the public API.

On a different note: It seems to be a common pattern to expect &Symbols instead of strings as arguments to some query functions (like Root::get_library or Library::get_primary_unit).

What is your preferred way to obtain Symbols, without already having the respective library/unit at hand?

Maybe the most user friendly is to just have public API functions take a &str. Otherwise we would need a Public API function to create and intern a symbol into the symbol table.

@skaupper
Copy link
Contributor

What you want is just doing the parsing

You are right about that. Adding the parser to the API is definitely something I would want. Also I think it would not hurt to have a wrapper type (just like Root), so convenience functions, query functions etc. can be implemented, without littering the library internals.

That way we could have the API you proposed for the analyzed AST, while for use cases which do not require the analyse step a simpler and less involved API for the parsed AST would be available. Maybe the analyzed API could event build on the parsed API.

If you want to keep analyzed and parsed APIs, I would be happy to give the parsed API a go (but probably not before christmas), assuming that you mostly care about the analyzed API for the language server etc.

Maybe the most user friendly is to just have public API functions take a &str.

I agree that it's probably best if the users do not need to care about internals like Symbols and can instead use standard primitives where it is sufficient. The wrapper types in the API are perfect places for those kind of functions :)

@Schottkyc137
Copy link
Contributor

I have been quiet for some time now as I generally agree with the points that @skaupper raises. However, I want to raise one more point concerning scope that I have alluded to earlier.

While I agree with @kraigher that features should only be implemented if the desire arises, implementations should not forbid certain classes of features or make it unreasonable difficult to implement these features.
One point in the current implementation that I see is that most resolves around one data structure – the AST. Subsequent passes (i.e., the analysis pass) only enrich that data structure with more information such as a reference to a declared entity. While this is enough for certain applications (analysis, completions, extracting documentation, ...) it might not be enough for future applications such as evaluation or compilation of a VHDL program. Most other compilers solve this with some intermediate representation after each pass. And while these applications might be far-fetched, two points should IMO still be considered:
a) The claim is that vhdl_ls contains a general-purpose VHDL language front-end and
b) Stuff like evaluation of a design might also be interesting for a language server. Maybe in the future, a design can be evaluated on-the fly and more elaborate errors can be published. Examples include checking for case statement overlap or range checking.

I'm saying all of this to still be careful not to discourage future applications, even if there is no immediate use-case.

@skaupper
Copy link
Contributor

Subsequent passes (i.e., the analysis pass) only enrich that data structure with more information

You are right about that. Which makes me think that it would probably a good idea to make that enrichment more explicit.

Currently the Project struct does a lot of things, like

  1. Handling configs
  2. Gathering sources (which is mostly done via 1. atm)
  3. Parsing said sources
  4. Analysing the parsed sources
  5. Providing an API for working with the analysed VHDL.

Splitting these functionalities in three structs, there could be

  1. A SourceProject, which is unparsed and is only responsible for gathering source files (so, point 1. and 2. from above)
  2. A ParsedProject, which takes the gathered sources and parses them all (point 3. from above). This struct could also provide access for an API working with the raw AST.
  3. A AnalysedProject, which takes the ParsedProject and does the semantic analysis as it is now.

In the future, this structure could be further extended by an ElaboratedProject (or whatever) by consuming the AnalysedProject moving or transforming fields as needed and adding new information as well. That would be what you meant @Schottkyc137, right?

For reference, a small example of how this struct could interact with each other, and how a user would have to use them:

// Prepare sources
let config = Config::read_file_path(Path::new("../rust_hdl/vhdl_libraries/vhdl_ls.toml")).unwrap();
let source_file = Path::new("test.vhd");

// Create SourceProject
let mut source_project = SourceProject::from_config(config); // or SourceProject::new() for an empty project
source_project.add_file(&source_file);

// Create ParsedProject
let parsed_project = source_project.parse();
{
    let parsed_api = parsed_project.api();
    // Do something with the raw AST
}

// Create AnalysedProject
let analysed_project = parsed_project.analyse();
{
    let analysed_api = analysed_project.api();
    // Do something with the analysed AST
}

To conclude, I see multiple benefits by such a restructuring:

  1. Separation of concerns. Project would not be responsible anymore for basically everything.
  2. Clarity. That's kind of a byproduct of 1. If a struct has only one responsibility, it should be easier to grasp what you (as a user) are supposed to do with it.
  3. Flexibility. By choosing a layer of abstraction, which handles all needed use cases, the user does not need to care about constraints of higher layers (i.e. see this comment about the need for the standard library for analysis).
  4. Extendability. If someday someone decides that an ElaboratedProject is needed, extending the proposed structure should be doable as well.

Getting rid of some &mut self functions (i.e. by replacing Project::analyse(&mut self) with ParsedProject::analyse(self) -> AnalysedProject), while not 100% necessary, is also a plus in my book.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants