Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Track] LSP performance optimization #1237

Open
He1pa opened this issue Apr 22, 2024 · 2 comments
Open

[Track] LSP performance optimization #1237

He1pa opened this issue Apr 22, 2024 · 2 comments
Assignees
Labels
lsp semantic Issues or PRs related to kcl semantic and checker track

Comments

@He1pa
Copy link
Contributor

He1pa commented Apr 22, 2024

Background

Solve the problem of delay caused by LSP compilation speed in large-scale scenarios. The goal is to reduce the response time of various requests and notifications to less than 50 ms.

Bench

compile about 300-400 kcl files

kclvm/tools/src/LSP/src/util.rs

pub(crate) fn compile_with_params(
    params: Params,
) -> anyhow::Result<(Program, IndexSet<Diagnostic>, GlobalState)> {
    // 1.lookup_compile_unit
    let (mut files, opt) = lookup_compile_unit(&params.file, true);
    ...
    // 2.Parser
    let sess = ParseSessionRef::default();
    let mut program = load_program(sess.clone(), &files, Some(opt), params.module_cache)
        .unwrap()
        .program;
    // 3.Resolver
    let prog_scope = resolve_program_with_opts(
        &mut program,
        kclvm_sema::resolver::Options {
            merge_program: false,
            type_erasure: false,
            ..Default::default()
        },
        params.scope_cache,
    );

    // 4.AdvancedResolver
    let gs = GlobalState::default();
    let gs = Namer::find_symbols(&program, gs);
    let gs = AdvancedResolver::resolve_program(&program, gs, prog_scope.node_ty_map.clone());
    ...
}
  open vscode& open file did_change(parse cache) did_change(parse + resolver cache)
lookup_compile_unit 321,993 117,994 87,132
parse 2,795,670 20,842 20,570
resolver 5,575,738 390,598 41,757
AdvancedResolver 1,101,971 183,523 174,158
total 9,869,984 835,819 381,253

Optimization Solution

lookup_compile_unit cache

/// Get compile uint(files and options) from a single file
pub fn lookup_compile_unit(
    file: &str,
    load_pkg: bool,
) -> (Vec<String>, Option<LoadProgramOptions>) {}

lookup_compile_unit will lookup compiled configuration files (kcl.yaml, kcl.mod) starting from the file being edited. LSP caches the results of lookup_compile_unit and clears the cache when the configuration file changes.
Because the reconstruction cost will not be particularly large, about 100ms each time. Therefore, clearing it all every time when edit config files

lsp lookup_compile_unit cache implementation
#1188
Add a watcher to the configuration file on the client side
kcl-lang/vscode-kcl#41

todo:
The implementation of #1188 relies on the DidChangeWatchedFiles event. This capability is the capability of the FIle Watcher provided by VSCode. For other IDEs, this capability may not be available, so the server may not receive the notification. It is necessary to implement the complete config FIle watcher capability on the server side instead of relying on client notification..

AdvanceResolver Cache

Currently, LSP will fully execute AdvanceResolver each time, which mainly consists of two parts: Namer::find_symbols() and AdvancedResolver::resolve_program(). Finally, a GlobalState will be generated, which contains semantic information and is saved in the db for analysis of requests. It takes about 180 ms in total.
For the following scenario, the two files main1.k and main2.k both depend on base.k. This is also a common scenario where different configuration definitions dependences on a common schema model.

main1.k

import base

main2.k

import base

base/base.k

schema X:
        name: str

we need to solve

  1. Edit the file main1.k, need to cache base.k, and recompile main1.k. most important needs
  2. After compiling main1.k, cache base.k and incrementally compile main2.k
  3. (Option, maybe not need todo) Edit base.k and update the information of main1.k and main2.k

Optimization of walkers in Namer and AdvanceResolver

The most important part of the Resolver result, scope_map, in ProgramScope, is the hashmap with pkg name as the key.
When the resolver is compiled incrementally, it takes the main package as the entr, analyzes the update and affected pkgs and removes them from the scope map. Then re-walk these pkgs and generate new Scope.

pub struct ProgramScope {
    pub scope_map: IndexMap<String, Rc<RefCell<Scope>>>,
    pub import_names: IndexMap<String, IndexMap<String, String>>,
    pub node_ty_map: NodeTyMap,
    pub handler: Handler,
}

For Namer::find_symbols() and AdvancedResolver::resolve_program(), we can also update only the invalid (update and affected by update)pkg instead of traversing all pkgs every time.

  1. In Resolver, record the invalid pkg and pass it to Namer and AdvancedResolver
  2. In GlobalState, the semantic information in the invalid pkg needs to be cleared, otherwise it will cause overlap. scopes, packages and sema_db are all information saved with pkg or file as index. We can clear the cache based on the invalid pkg name. But symbols is a global symbols set. Need to add pkg -> HashSet map. When adding a symbol, record the pkg to which the symbol belongs.
    pub fn alloc_schema_symbol(
        &mut self,
        schema: SchemaSymbol,
        node_key: NodeKey,
        ++ pkg_name: String,
    ) -> SymbolRef {
        let symbol_ref = SymbolRef {
            id: symbol_id,
            kind: SymbolKind::Package,
        };
        
        ++  self.symbols_info
        ++      .pkg_symbol_map
        ++      .get_mut(&pkg_name)
        ++      .unwrap()
        ++      .insert(symbol_ref);
        symbol_ref
    }
  1. Before Namer::find_symbols(), clear the cache. The order of clearing here should be opposite to the order of build (symbols clear at the end)
  2. In addition to the invalid pkg, Namer::find_symbols() and AdvancedResolver::resolve_program() also need to walk the new pkg

main pkg name and GlobalState in-place change

kcl's Program takes main pkg as the entry point and analyzes other dependent pkgs
image

Because all programs share the name main. In the second scenario,if:
open main1.k -> open main2.k -> swtich main1.k -> request
Calculating the semantic information of main2.k will overwrite main1.k, and then switch back to main1.k in the IDE. Because there are no file changes, main1.k will not be recompiled. At this time, gs saves the semantics of main2.k, so analysis of main1.k will fail

Therefore, globalstate has no way to save the main pkg information of each compilation entry at the same time. In the previous processing, after compilation, gs is cloned and stored in db. When handling the request, the corresponding gs is obtained in db according to the entry.

image

The clone of gs takes about 20 - 30ms. gs can be modified in place and unified globally to replace this clone. We need:

  1. Provide the ability to configure different main packages. main1.k and main2.k need different spaces in gs
  2. Use mutable references of gs in Namer and AdvancedResolver and modify them in place. Only use one gs globally

image

Reverse dependent updates

todo
Case3: Edit base.k and update the information of main1.k and main2.k

Other optimizations

Pay attention to the clone of big structures, use Rc/Arc clone instead, such as db, node_ty_map
#1174

@He1pa He1pa added semantic Issues or PRs related to kcl semantic and checker lsp track labels Apr 22, 2024
@Peefy
Copy link
Contributor

Peefy commented Apr 22, 2024

Some small questions and comments.

I would like to know how IDEs for other languages such as Python and typescript handle incremental compilation, as Python and KCL have similar syntax and semantics, and how the IDE for Rust handles incremental compilation because they are similar in implementation.

lookup_compile_unit 321,993 117,994 87,132

I think the time unit for testing should be clearly marked.

For Namer::find_symbols() and AdvancedResolver::resolve_program(), we can also update only the invalid pkg instead of traversing all pkgs every time.

What does the invalid pkg means?

In addition to the invalid pkg, Namer::find_symbols() and AdvancedResolver::resolve_program() also need to walk the new pkg

What does the new pkg means?

@Peefy
Copy link
Contributor

Peefy commented Apr 22, 2024

What is the calculation logic for __main1__ and `main2, how to handle multi file compilation entries, and how to handle non main package entry file IDEs? Will they conflict? How about abstract package path definition.

pub enum PackagePath {
     Logic(<Some md5 hash>)
     Real(<real filepath or pkg path>)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lsp semantic Issues or PRs related to kcl semantic and checker track
Projects
None yet
Development

No branches or pull requests

2 participants