Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: group modules in visualizaton using project structure #125

Open
AlexandrHoroshih opened this issue Oct 9, 2023 · 28 comments
Labels
core enhancement New feature or request

Comments

@AlexandrHoroshih
Copy link
Contributor

Hello and thanks for a great library!

Summary

Context

Our project is using approach similiar to Feature Sliced Design, where code is divided into layers of some entities, all of which have some sort of public API

Example:

/src
 /features <-- some "layer" of the project architecture
  /feature-a
   index.ts <-- public API of the feature
   lib.ts
   utils.ts
   etc.ts <-- various internals, that are hidden behind public API
  /feature-b
   index.ts

Problem

But if i run skott against project entrypoint or any of the features index file - the generated visualization is very noisy, because all of the internal modules are exposed right away

Suggestion

Since all of these features, widgets, etc are essentialy "groups" of modules behind single entrypoint with a public API, i suggest to add an option to provide glob-like pattern, which would describe them:

const { ... } = await skott({
  entrypoint: "src/index.ts",
  groups: {
   features: "src/features/{name}/*",
   widgets: "src/widgets/{name}/*",
  }

The visualization then can use these groups, e.g. to skip the render of each and every internal module of src/features/feature-a and render it as the single features/feature-a node

This should make the visualization:

  • much easier to read, since internal modules are hidden
  • faster to render, since there would be less graph nodes to handle

Details

This feature would also allow to use skott with bigger projects.

For example our project now has about 7000 modules in it and currently skott just can't handle it in any way, since web-app graph takes forever to load and svg export shows just something like "Maximum size exceeded"

Visualization does work, if i choose an src/widgets/{name}/index.ts as an entrypoint, but result is still not really useful because of all of the internals being exposed:

image

At the same time our project has only about a 200-300 such "services", "features" and "widgets", which graph rendering should handle much easier

Standard questions

Please answer these questions to help us investigate your issue more quickly:

Question Answer
Would you consider contributing a PR? Yes
@AlexandrHoroshih AlexandrHoroshih added the enhancement New feature or request label Oct 9, 2023
@antoine-coulon
Copy link
Owner

antoine-coulon commented Oct 9, 2023

Hello @AlexandrHoroshih, thanks for opening that very detailed issue!

This reminds me #33 that was initially thought by @pigoz, that is the compact feature being more or less what you describe there.

Overall, I do agree that skott web application usefulness is quickly limited when the number of nodes grows hence it should provide ways to seemlessly group some of the nodes into namespaces, with custom rules that would fit based on the project size and structure.

Just for you own information, note that there is already a way to do some sort of grouping using the API, by providing a custom dependency resolver, the default one being EcmaScriptDependencyResolver (resolving all JS/TS modules, one node = one module). The API allows you to hook into skott module and internal graph resolution. This is a feature that I implemented for Rush.js, a popular monorepo tool. You can see an example here of a custom resolver I wrote for it there: https://github.com/antoine-coulon/krush/blob/af6d8e661deb300f93733f1e9ef1b4c839ccb309/plugins/rush-plugin-skott/src/plugins/visualize/main.ts#L27

The idea is to provide an overview of monorepo graphs where nodes are monorepo projects (app or lib), in the same spirit as Nx graph. So you could imagine doing the same, where you build your own graph based on your own rules, and skott can then deal with the rest, such as visualization, cyclic dependencies, and all other features that relying on the emitted graph (this is a old screen shot)

rush-skott

This is literally reducing the whole monorepo file tree around 2k files to few nodes only, enhancing a lot visualization.

Nonetheless, I grant you the fact that it is not the simplest solution and we should probably provide a better and more straightforward and flexible way to do that.

Specification

I would say given this impacts all of the visualization modes (even though web application is by far the most used one), this should be a core feature added in skott in the same spirit as you suggested where we could define some sort of namespaces and scopes under which modules should be grouped.

Let me just recap your suggestion

Given the following file tree:

  src/
    features/
        feature-a/
        feature-b/
   widgets/
       widget-a/

and the following definition:

"groups": {
   "features": "src/features/{name}/*",
   "widgets": "src/widgets/{name}/*"
}

you expect the following graph to be emitted:

{
   "features/feature-a": {...},
   "features/feature-b": {...},
   "widgets/widget-a": {...}
}

That is 3 nodes, right? Of course all dependencies between internal modules of each group would still be captured.

From the definition itself, do you imagine other cases where one would want src/features/{name}/* converted to something else than "group by all first children sub-folders"? Do you imagine some sort of regex pattern that could be provided to group by something else than direct children? I'd suggest starting by something simple on the definition side, with no regex pattern, but rather with parent folders than directly mean "group by all direct sub-folders starting from this one".

Implementation

At first glance I'd say that "groups" are like "views" in database in the sense that they expose a different graph shape but the internal graph stays untouched. That would allow all information such third-party, builtin modules, etc to still be collected in the background, but applying the "groups" filter on it, would emit another version of the graph in the same spirit as it's done there: https://github.com/antoine-coulon/krush/blob/af6d8e661deb300f93733f1e9ef1b4c839ccb309/plugins/rush-plugin-skott/src/plugins/visualize/graph.spec.ts#L229. The difference would be that instead of modifying the internal graph itself, we would emit both side-by-side as patching the main graph is not a good idea imho, we still need to keep the source of truth reachable so that we can still rely on it anytime.

Consequently we could expose both graphs, one "full" containing everything and a "compacted" one:

const { getStructure } = await skott(withGroupsConfig);
const { graph, compactedGraph }  = getStructure();

assert.deepEqual(graph, {
   "src/features/feature-a/index.ts": {...},
   "src/features/feature-b/index.ts": {...},
});

assert.deepEqual(compactedGraph, {
   "features/feature-a": {...},
   "features/feature-b": {...},
});

In the end, all visualization modes could detect whether there are other "views" of the graph and decide to use them or not. One benefit of having both for the web application would also be to dynamically switch modes between "compacted" and "full", if you ever want to expand whatever is inside feature/feature-a only, otherwise you lose the information of whatever is contained inside that module.

Recap:

  1. Build the entire graph using the provided resolvers
  2. Apply the "groups" filter on it
  3. "groups" has for consequence of adding a "compactedGraph" to the getStructure interface
  4. Display modes would either use the "full" or "compacted" version of the graph

What do you think of that?

@AlexandrHoroshih
Copy link
Contributor Author

AlexandrHoroshih commented Oct 10, 2023

Thanks for a detailed reponse!

That is 3 nodes, right?

Yep, given that tree i would expect just 3 nodes

do you imagine other cases where one would want src/features/{name}/* converted to something else than "group by all first children sub-folders"? Do you imagine some sort of regex pattern that could be provided to group by something else than direct children?

So far, not really 🤔

Approaches like Feature Sliced Design or similiar are introducing "internal" structure rules too, though
As far, as i remember, even the good-old "components and containers" approach has something similiar, e.g. if project also uses Redux - there would be internal "reducers", "actions", etc

But I think, that, given that the skott's graph is covering project as a whole, these internals, even if structured, are still not really relevant
Also, if there is a option to see a full graph (and if the size of the project allows it to be rendered), user can always see those details

I think, that it is probably makes sense to add a possibility to "unwrap" a one specific group in the graph, so internal details are visible only when needed and only for one specific group, which is being investigated 🤔

I'd suggest starting by something simple on the definition side, with no regex pattern, but rather with parent folders than directly mean "group by all direct sub-folders starting from this one"

Yep, this sound good to me 👍

@AlexandrHoroshih
Copy link
Contributor Author

do you imagine other cases where one would want src/features/{name}/* converted to something else than "group by all first children sub-folders"? Do you imagine some sort of regex pattern that could be provided to group by something else than direct children?

Actually, there is one case 😅
For some reason i forgot, that our project actually has a structure like

src/
 /team-a
   /features
    ...
   /widgets
    ...
  /team-b
   /features
     ...

which is in fact "multi-level" group, like src/{team}/widgets/{name}

@AlexandrHoroshih
Copy link
Contributor Author

AlexandrHoroshih commented Oct 10, 2023

But actually we cound try to group it like

"group by all direct sub-folders starting from this one"

though 🤔

Something like this should work, i think

"groups": {
   "teamAFeatures": "src/team-a/features/{name}/*",
   "teamAWidgets": "src/team-a/widgets/{name}/*",
   "teamBFeatures": "src/team-b/features/{name}/*",
   "teamBWidgets": "src/team-b/widgets/{name}/*",
}

@AlexandrHoroshih
Copy link
Contributor Author

Also, if combined with ignorePattern, looks like it is possible to construct a skott command, that will print a graph only for the code of one specific team, so this team could look at their own architecture without external (i.e. code of other teams) connections

@antoine-coulon
Copy link
Owner

Actually these are interesting use cases. I think we can start by providing a straightforward way of grouping things as you initially mentioned and what I refer to "group by all first children sub-folders".

Also, if combined with ignorePattern, looks like it is possible to construct a skott command, that will print a graph only for the code of one specific team, so this team could look at their own architecture without external (i.e. code of other teams) connections

Exactly, I think one should aim for the smallest subset to observe, either using ignorePattern to black list entries or using cwd to select a specific working directory to start the analysis from, an this could still be combined to "groups" for instance:

skott({
  cwd: "src/team-a",
  "groups": {
   "features": "{cwd}/features",
   "widgets": "{cwd}/widgets"
   }
})

If one still wants to have the global picture but in a grouped way, then we could still allow this kind of configuration that you mentioned:

"groups": {
   "teamAFeatures": "src/team-a/features/{name}/*",
   "teamAWidgets": "src/team-a/widgets/{name}/*",
   "teamBFeatures": "src/team-b/features/{name}/*",
   "teamBWidgets": "src/team-b/widgets/{name}/*",
}

I think that this behavior of grouping can be generalized to work with any length of record so whether using the config above alone or combining it with other API options would be fine.

But again, I think this "groups" feature should just apply a custom grouping to the "raw" graph without mutating it, so it's fine experimenting that and adding another property to the SkottStructure interface. Then we'll see how to consume that in the visualization modes to provide better visualization experience

@antoine-coulon
Copy link
Owner

Would you be willing to implement that? Just for you to know, there is no time constraint.

I will try to provide detailed hints tonight of what could be a first big-picture implementation of that feature

@AlexandrHoroshih
Copy link
Contributor Author

Would you be willing to implement that?

Sure!

@AlexandrHoroshih
Copy link
Contributor Author

what could be a first big-picture implementation of that feature

Given the "group by all direct sub-folders starting from this one" definition, i think, something like that should work?

groups API

In the skott API configuration there should be optional "groups" configuration

"groups": {
   "features": "src/features/{name}",
   "widgets": "src/widgets/{name}",
   "core": "src/core" // if no {name} template provided - whole folder is a group
}

There is an edge-case though:

"groups": {
   "teamFeatures": "src/team/features/{name}",
   "team": "src/team/{name}",
}

"team" group is containing all of the "teamFeatures" modules in it, so it is not clear, which group should be rendered to the compacted graph

I think, that it should be resolved in a way, that bigger group ("team" group in the example) should take priority there 🤔
And such nested groups, probably, should not be added to the compacted graph at all - at least in the first implementation, without an option to "unwrap" a single compacted group in the web-app's graph

Compacted Graph

In the skott output:

const { getStructure } = await skott(withGroupsConfig);
const { graph, compactedGraph }  = getStructure();

compactedGraph // Graph | null, because there might not be any groups defined

Web App

In the skott web-app:

  1. If there is a compacted version of the graph - it gets to render first, since original graph might be just too large to render, so it would just block the main thread
  2. If there is a compacted version of the graph - there is a "graph action" to switch to "source graph" and back

What do you think?

@antoine-coulon
Copy link
Owner

antoine-coulon commented Oct 10, 2023

Yes, that definitely looks good to me!

Given the "group by all direct sub-folders starting from this one" definition, i think, something like that should work?

That Groups API looks ok to me. I'm just thinking about the dynamic segments i.e: src/features/{name} where {name} can be a bit counter intuitive imho. Nonetheless there is no standard way of expressing that as a glob, but we could probably just provide a wildcard instead src/features/* as usually a wildcard like this matches all files within the sub folders, we could consider that instead it matches all children sub directories.

Consequently:

"groups": {
   "features": "src/features/*", // create groups from all direct children sub folders, `src/features/team-a` will be a group
   "core": "src/core" // if no "*" wildcard provided, whole folder is a group
}

For the edge case you mentioned, I think we should provide a constraint defining that no group definition should overlap, at least in the first iteration, because it will be hard to deal correctly with multiple versions of a same set of files under different groups and still be able to represent an unified version of the graph. In other words, you would end up with 2 nodes where 1 node is a subset of the other one and at first sight this would be too tedious to deal with that in all aspects. Usually achieving that is done by clustering, but in clustering only one version of the group is there at all time, sub groups are being merged in outer groups.

Compacted Graph

const { getStructure } = await skott(withGroupsConfig);
const { graph, compactedGraph } = getStructure();

I initially named the thing "compacted" but when rethinking it, it might be a bit misleading, because we are talking about "groups" and the "groups API", and then we end up with a "compacted" property thing when defining group 🤔 My bad actually, I think this should be named groupedGraph or graphWithGroups, to be fair with the configuration namings.

compactedGraph // Graph | null, because there might not be any groups defined

Yes, but I think for that we should favor "undefined" instead of "null", because the property won't be there when no "groups" are defined. Consequently you'd have graphWithGroups (or groupedGraph) such as:

interface SkottStructure {
// ... already existing props
graphWithGroups?: Graph;
}

For the Web App, I fully agree with your vision:

  1. If there is a compacted version of the graph - it gets to render first, since original graph might be just too large to render, so it would just block the main thread

Yes 🙂

  1. If there is a compacted version of the graph - there is a "graph action" to switch to "source graph" and back

And yes ✅

@antoine-coulon
Copy link
Owner

Note that you could split the work in two parts as skott (core) and skott-webapp are two distincts packages, they can be independently published. This one is fully up to you, just saying that in case you thought/hadn't seen that both were shipped in the same package

Also, as we're talking about the "grouping" feature, it makes me realize that some of the features around "skott-webapp" will only work with graphs containing files. This means that if we turn on the "graphWithGroups" visualization mode, sections such as "File Explorer" and some of the sections in "Summary" become irrelevant.

But this is great, because it's a subject that I needed to tackle for the new version of the web app to be compatible with monorepo visualization, with hundred/thousands of projects where "nodes" are packages and not files. I'll keep you updated

@AlexandrHoroshih
Copy link
Contributor Author

For the edge case you mentioned, I think we should provide a constraint defining that no group definition should overlap

Just to clarify:
Should skott just throw an exception right away, if such groups definition is provided via config?

Basically

expect(async () => await skott({
 groups: {
      "teamFeatures": "src/team/features/{name}",
      "team": "src/team/{name}",
 }
})).toThrowErrorMatchingInlineSnapshot(`Bad groups configuration: "team" and "teamFeatures" groups are overlapping with each other`)

@AlexandrHoroshih
Copy link
Contributor Author

AlexandrHoroshih commented Oct 11, 2023

Nonetheless there is no standard way of expressing that as a glob, but we could probably just provide a wildcard instead src/features/* as usually a wildcard like this matches all files within the sub folders, we could consider that instead it matches all children sub directories

I thought about it a bit and I agree about {name}, if we consider that the user would expect the glob pattern there
But this way src/features/* also feels a bit counter-intuitive to me, as glob like this also matches desctiption "whole folder is a group" 🤔

Maybe it could just be an explicit configuration object?

groups: {
   features: {
      parentPath: "src/features",
      // modulePath == ["feature-name", ...]
      getGroup: (modulePath) =>  modulePath[0],
   },
   core: "src/core" // whole folder is a group
}

This way groups resolving is mostly moved to the userland, which is very flexible, so there is zero problems with handling custom project structures

But probably less desirable, since it is not possible to do any kind-of-static analysis as it is possible with a string pattern

E.g. with string pattern it is possible to do an ahead-of-time validation, that groups are not overlapping, but it will only work somewhere in the middle of the skotts computations with dynamic definition 🤔

And since this function would be called on every module path, it would be a potential performance bottlneck, which will be out of skott's control

So, looks like there is needs to be some string-template-pattern anyway 🤔

@antoine-coulon
Copy link
Owner

antoine-coulon commented Oct 11, 2023

Hey again @AlexandrHoroshih!

Having a configuration object with functions could be nice indeed to cover more edgy cases that would not make sense with a simple string literal, this would also follow the path I initiated with dependency resolvers where skott offers some sort of IoC for graph resolution. This would for sure delegate all the responsibility to the consumers and so the grouping would become their own responsibility (dealing with overlap etc) and it's pretty much ok with that.

And since this function would be called on every module path, it would be a potential performance bottlneck, which will be out of skott's control

Well yeah but grouping is like a lazy action, internal graph resolution construction will remain the same and once the user or any of the visualization modes use getStructure#graphWithOptions this could be indeed a lazily loaded property where all the computation happen.

So if I sum up for me there would be 3 possible definitions, from what we've been discussing:

const groupsConfig = {
   "features-1": "src/core", // whole "core" folder is a group, and this group is named "features-1"
   "features-2": "src/core/*", // all sub-folders (only first level such as `src/core/lol`) of "src/core" are grouped, for instance `features-2/lol` is a group. "/*" means all direct folders, unlike "/**", matching recursively all sub folders
   "features-3": {
         basePath: "src/feature",
          // in features-3 configuration, groups would become constructed such as `features-3/group0`, `features`
         group: () => (["group0", "group1/skott", "group2/is-an-awesome", "group3/library"]-
    }
}

Regarding the overlap checking we can naively check that all the static paths are not part of the same base path, in the above case, src/core and src/feature (basePath) prop are fine, but if another group defined something such as src/feature/something this would not be ok.

So

Just to clarify:
Should skott just throw an exception right away, if such groups definition is provided via config?

Yes, I would favor fail fast when some unexpected config is provided rather than silently patching things under the hood or still providing a graph that will most likely confuse and not help finding that the config is wrongly provided.

@antoine-coulon
Copy link
Owner

Also as configuring groups manually (specifying keys of the record statically "features-1", "features-2" etc) can become tedious on such big modules, I'd then expect the config option to be a function where one can dynamically configure keys (group names) and their matching content. This would for instance allow some automatic grouping based on Nx, Rush, Turborepo configurations, but also some other rules that could be established dynamically depending on the needs.

This is out of the scope of first iterations, I'm just thinking about it loudly 😄

@pigoz
Copy link

pigoz commented Oct 12, 2023

Upvoting the configuration function option 🙂

@antoine-coulon
Copy link
Owner

Let's make it happen then!

@AlexandrHoroshih
Copy link
Contributor Author

AlexandrHoroshih commented Feb 26, 2024

Hello @antoine-coulon !

I finally had found some time to work on this feature - here is a small PR draft, which is still Work in Progress though, so i don't want to target it to the main skott repo yet.

But in the process of working on it, I discovered a few interesting points that I'd like to discuss:

The config API design

Current implementation

Current design is implemented as we discussed before, with an edge-case handled - in function API an option to return null should be supported, so it is possible to discard some module, if it does not belong to this group for some reason.

  groups?: {
    [groupName: string]:
      | string // <- path pattern
      | {
          /**
           * Scope of the group
           */
          basePath: string;
          /**
           *
           * @param modulePath path to the module
           * @returns group name, to which the module belongs or null if it does not belong to the group
           */
          getGroup: (modulePath: string) => string | null;
        };
  };

And generated graph looks something like that

After trying out this approach at my work project i eventually moved to using only a function API:

{
 groups: {
    app: {
      basePath: 'client',
      getGroup: (nodePath) => {
        if (nodePath.includes('client/src/entrypoint')) {
          return 'entrypoint';
        }

        if (nodePath.includes('client/src/internal')) {
          return 'internal';
        }

        if (nodePath.includes('client/src/platform')) {
          return 'platform'
        }

        if (nodePath.includes('client/src/product')) {
          const match = nodePath.match(
            /product\/(?<team>.*?)\/(?<entity>.*?)\/(?<name>.*?)\//,
          );
          if (match && match.groups) {
            const { team, entity, name } = match.groups;
            return `product/${team}/${entity}/${name}`;
          }
        }

        return null;
      },
  }
}

The "config" option just wasn't really useful to me in the end

Suggestion

I feel like that "function" option is in fact the most flexible and the least confusing option of all 🤔

Also, while working on the PR, i got better understanding of skott's internals and found out that "config" option doesn't really helps to optimize anything so far - to build a "compact" graph we need to do basically the same work on resolving module dependencies that is needed for the "full" one anyway.

I suggest to simplify the API design and only accept single function, something like that:

{
 groups: (nodePath) => {
   if (anyConditionToMatchPathToSomeGroup(nodePath)) return "groupName"
   
   // Module did not matched any group and is discarded from the compact version of the graph
   return null;
 }
}

☝️ Also i think, that if we will decide to go with this API - it might need a better name, like "collapseRule" or "compactSomething" 🤔

Pros

  • Single function like (path: string) => string | null is pretty simple and flexible
  • It allows simpler implementation: e.g. single function API does not need any special validation about invalid or overlapping patterns
  • With this API users can provide their own custom names for the groups. Since it could be any string, any name is supported, even more human-readable ones (e.g. "Search Results Team Widgets - Name Widget" instead of "product/search-results/widgets/name")

Cons

  • Function API, probably, cannot be used with skott-cli easily, while pattern-based config can

The latter con is only a problem because there is currently no way to run skott in web application launch mode other than via skott-cli.

While it is possible to describe groups in cli like --group="groupName: pattern/*" -- group="otherGroupName: pattern/*" - i think it may become rather uncomfortable, if project is large and there is a more than a two or three groups.

I think, that skott also needs a way to trigger it's CLI runtime in a programmatic way, for provided instance 🤔

Something like this:

// project/scripts/build-skott-graph.mjs

import { Skott, runSkottInstance } from "skott"

 const instance = new Skott({
   ...config
 })
 
 await runSkottInstance(instance, {
  displayMode: "webapp",
  ...options
 })

If you ok with that, i can file a separate feature-request issue (and make it happen, when i have time 😅)

What do you think about this suggestions?

@antoine-coulon
Copy link
Owner

Hello @AlexandrHoroshih, nice to see the work of this API going forward! I'm just back from holidays, so I'll be a bit more reactive from now on :)

I checked your current PR and the generated graph looks definitely great, nice job!

About what you mentioned After trying out this approach at my work project i eventually moved to using only a function API, I think this is completely fair and right. We should strive to implement the simplest and most flexible solution in the first place, that is a function.

Also as you rightly mentioned, static definitions are great and were meant to provide better DX and stricter by just being a description, but turns out that implementation-wise it gets harder (by looking at your PR we can already witness the added complexity to just validate paths) and also there will be rules that won't be able to be represented as pure strings and we don't want to end up providing some weird skottish patterns to provide all the cases required.

Consequently, I would say I agree with your proposition of just exposing one entry function such as:

{
 groups: (nodePath) => {
   if (anyConditionToMatchPathToSomeGroup(nodePath)) return "groupName"
   
   // Module did not matched any group and is discarded from the compact version of the graph
   return null;
 }
}

That just says given a function taking a nodePath, if a groupName (string) is returned from that function then include that nodePath in the groupName. Otherwise, just don't include that nodePath in a group.

I feel like this is pretty straightforward, flexible and inverts the complexity by offering to the user basically all the joy of managing its own rules.

One important note about the snippet above and things I saw in your PR:

groups function should be not be invoked where it is currently done in your PR. It should be lazily generated after initial full graph creation. Groups should rely on the "full" graph to be constructed and the "grouped" graph should just be derived from it, if required by the user. And what I mean by "required" by the user is if the user tries to access that grouped version, afterwards (when calling getStructure()).

That will ensure many things:

  1. That there is no conflicts between both graphs resolutions, the "full" one remains the single source of truth, and "groups" are nothing but a grouped version derived from the graph.

  2. Also, this will defer computational work to the very end and take for instance the case where the graph is first displayed in "full" mode, "groups" could be computed whenever the user wants to switch to "grouped" mode. I'm nearly sure "groups" will be fast enough for the user to avoid us pre-computing it and doing it only when the user asks for it.

  3. Also, laziness is pretty important for one feature that was recently introduced: watch mode. Because "groups" are derived from the source graph, they need to be easily re-computable from a function, if it's only done once in the internals, we won't be able to refresh the "grouped" view, which could be also displayed in the terminal at some point (even though watch mode is also applied to webapp).

Given all these points, the following function groups: (nodePath) => {} should provide as "nodePath" each node's path from the "full" graph. So basically, "groups" is doing nothing but looping over the "full" graph and deriving a new graph from it using whatever that function returns.

const fullGraph = {
   "src/features/a/index.js": {} // whatever inside there
};

const {groupedGraph} = this.getStructure(); // getStructure() is the one computing lazily the groupedGraph

// and basically, this is what happens in the group function

const apiConfig = {
   ..._, 
   groups: (nodePath) => {
      // nodePath === "src/features/a/index.js"
   }
}

function getStructure() {

  let groupedGraph
  
 // graphNodes could be created from Object.values(this.#projectGraph )
  for (const node of graphNodes) {
   const groupName = apiConfig.groups(node.id)
   // add other information to the new node that will be added
   if(groupName) // add all the node info to the group within "groupedGraph"
  }


  return {
     ...otherThings,
    groupedGraph
   }
}

And so "groups" function needs to be called with each node from the graph while traversing it (seems like what you did a bit there https://github.com/AlexandrHoroshih/skott/pull/1/files#diff-7b7a6308c7431b730b864232c5ef51e67e376482a19da77140ca8f139cb07610R354 apart that you added it in the addNode function used in the "full" graph creation).

About the "groups" function

I think instead of returning null, we can simply return undefined as for me this better represents what happens in that this is what is returned by a void function, so doing nothing with a nodePath = void.

Cons: Function API, probably, cannot be used with skott-cli easily, while pattern-based config can
The latter con is only a problem because there is currently no way to run skott in web application launch mode other than via skott-cli.

So about the cons, this is an interesting point. there is currently no way to run skott in web application launch mode other than via skott-cli -> this is not true, the web app can be started from the APIs, you can check an example I did there in the context of a Rush.js plugin.

The "skott-webapp" package is a standalone package that can be started independently from "skott" itself. The only requirement for "skott-webapp" is to have an http server that exposes a set of routes providing the expected data and serving static files, but it could be coming from anywhere. Turns out by default it comes from there

export function renderWebApplication(config: {
but as I pointed out I created a custom web server for the plugin there https://github.com/antoine-coulon/krush/blob/af6d8e661deb300f93733f1e9ef1b4c839ccb309/plugins/rush-plugin-skott/src/plugins/visualize/main.ts#L52.

One interesting note is that I should probably expose a util function allowing to seamlessly create that web server so that it's easy run the web app and skott API on their own, without going through the CLI.

About the CLI and CLI-based display modes

Now this is where your statement would be correct, when it comes to using CLI based display modes "file-tree", "graph" etc, there is currently no solution to both use the API (benefiting from the dynamic nature of the config) and then to render things in the CLI. But given I have new use cases about the skott CLI coming from Effect friends, I'll probably manage to find a way to use both the API and the CLI-based display modes, without having to rely on skott's binary.

Something like:

import skott from "skott"
import { displayModes } from "skott/cli"

const skottResult = await skott();

// this will print the result in the CLI
displayModes.fileTree.render(skottResult);

Summary

  • Please keep going in the nice direction you mentioned, that is using one function in the "groups" property. Don't forget to take into account important points I raised just above (it's still open to discussion but these are important). Also as you pointed out, this API needs to be renamed as well, this is not a suitable name for a function. This could be only "group", or "groupBy" (in the same spirit as lodash). In my opinion this should include the verb "group" as the generated property in the end is called "groupedGraph".

  • I will take care of the CLI/API stuff interoperability to make sure everything is usable whatever the entrypoint mechanism is. Just enjoy focusing on landing that feature in the core + in the web application as you started to do it. Also note I did not review the web application part yet, but I will do once the core is settled.

@AlexandrHoroshih
Copy link
Contributor Author

Thanks for the reply!

One interesting note is that I should probably expose a util function allowing to seamlessly create that web server so that it's easy run the web app and skott API on their own, without going through the CLI.

That would be really great!

Such a feature, as I think, would simplify things a lot, because a path like import a special function -> call it with Skott instance and settings -> Skott Webapp opens feels way nicer, than custom server one 🤔

Please keep going in the nice direction you mentioned, that is using one function in the "groups" property

Got it 👍

I think i will split the work in two parts then (as you initially suggested actually) - first the group or groupBy function and then the web app part

@AlexandrHoroshih
Copy link
Contributor Author

Hi @antoine-coulon !

Here is the first part - the groupBy function in the skott package:
#146

I also tried to use this API locally (by monkeypatching my node_modules/skott) to see, how groupedGraph would look in the skott-webapp and result is already much better than looking at relations between ~7000 modules directly 😄

What is interesting is that eventually i opened a few tabs with different groupedGraph renders at different levels of details:
Because project has a structure like {team}/(services|widgets)/{name}, it is possible to explore relations at both team level and specific services and widgets level.

image image

And since the project is quite large, i was switching tabs with different views to make more sense of it.

Maybe it could lead to next iteration of this feature in skott 🤔

@antoine-coulon
Copy link
Owner

Hello @AlexandrHoroshih, thanks for landing the PR! I just reviewed it.

The grouped graph, looks great on the picture, it will definitely improve the visualization experience and help extracting useful information.

And since the project is quite large, i was switching tabs with different views to make more sense of it. Maybe it could lead to next iteration of this feature in skott 🤔

I'm not sure I understood that part, what do you exactly suggest? Having a set of groupedGraph and each graph would be uniquely identified (string identifier) that would follow different rules, instead to only one (as done in the PR)?

If it is what you had in mind, I think it's a great idea :) If it was not that, then it probably just gave us an interesting idea! In the web application, you could then switch between all the different graphs (views). We could even imagine provide a way of creating groupedGraphs directly from the web application itself.

@AlexandrHoroshih
Copy link
Contributor Author

Thanks!

I'm not sure I understood that part, what do you exactly suggest? Having a set of groupedGraph and each graph would be uniquely identified (string identifier) that would follow different rules, instead to only one (as done in the PR)?

Yeah, I was more or less thinking along those lines 😅

But I'm not sure that's the best way to go about it:
As my example shows that on a large project, seeing several different "layers" of the project structure makes it easier, but not radically so - there are still too many links between modules, it's still not easy to sort them out.

I think that maybe in the future, using the grouping API, it will be possible to do some more advanced analysis that could show "harmful" groups of modules.
For example, those that depend on many other modules at once and, therefore, if they are imported anywhere, they will pull half of the whole project into the main bundle 🤔

@antoine-coulon
Copy link
Owner

antoine-coulon commented Mar 7, 2024

@AlexandrHoroshih thanks for contributing, the PR just got merged and the 0.33.0 got released with the new feature!

As my example shows that on a large project, seeing several different "layers" of the project structure makes it easier, but not radically so - there are still too many links between modules, it's still not easy to sort them out.

Isn't it because you need to have a more precise definition of your groups instead of having huge top-level groups?
Also, at some point I guess it will be the job of the web application to offer a better visualization, skott's core goal is to collect all the information that it can get. From that you can imagine hundreds of use cases starting from discarding groups, filtering groups, isolating a group and its direct dependencies etc.

@AlexandrHoroshih
Copy link
Contributor Author

AlexandrHoroshih commented Mar 11, 2024

From that you can imagine hundreds of use cases starting from discarding groups, filtering groups, isolating a group and its direct dependencies etc.

I gave it a thought and i agree, that makes sense 🤔

I tried to do a deeper analysis and flow was kind of the same you described here:

  • Look at initial groupedGraph render, notice stuff that stands out, e.g. an service (or other architectural building block) that depends on others "too much"
  • Re-build groupedGraph with different groupBy rule, which renders problematic service and discards unrelevant information
  • and so on, until i figure out the problem and solution like "this service does too much and should be refactored into a few smaller services"

@antoine-coulon
Copy link
Owner

Yeah definitely, to me it seems hard to get the right groups at first glance. It feels like knowing what groups you want goes by visualizing first and incrementally grouping into things that make sense for the current use case.

Thing is, doing that back and forth does not really provide a great UX/DX as you always need to change the groupBy function and restart the process, that is why I was thinking of doing some of the work directly in the web app or even suggesting directly groups based on the graph (at some point in the future).

But I guess those will be done in next iterations, allowing the grouped graph to be displayed at first and allowing a switch between the "full" and "grouped" graph will already be good enough!

@AlexandrHoroshih
Copy link
Contributor Author

allowing a switch between the "full" and "grouped" graph will already be good enough!

Speaking of that 😅

Currently most of the skott-webapp source code expects to work directly with data object, which contains data.graph and related meta-information like data.cycles

Given that we are already discussing an arbitrary number of such graphs i feel like it would make sense to refactor skott-webapp internals a bit, so instead of current "single graph" type:

export interface AppState {
  data: DataState;
  ui: UiState;
}

it should expect something like that

export interface AppState {
  data: {
   source: DataState, // data for original modules graph, always available
   [key: string]?: DataState, // Any graph derived from source one - e.g. "grouped"
 };
  ui: UiState;
}

export interface UiState {
  filters: {
   // ...
  };
  network: {
   currentTarget: string; // e.g. "source" by default
   // ...
  }
}

So that it is possible to do something like that:

      subscription = appStore.store$
        .pipe(
          map(({ data, ui }) => data[ui.network.currentTarget]),
          distinctUntilChanged(isEqual)
        )
        .subscribe((data) => {
          const { graphNodes, graphEdges } = makeNodesAndEdges(
            Object.values(data.graph),
            { entrypoint: data.entrypoint }
          );

          setNodesDataset(new DataSet(graphNodes));
          setEdgesDataset(new DataSet(graphEdges));
        });

and that

<Menu.Dropdown>
          <Menu.Label>Graph View</Menu.Label>
          {Object.keys(state.data).map((name) => (
            <Menu.Item key={name}
             onClick={() => setNetworkCurrentTarget(name)}
            >
               {name}
            </Menu.Item>
          ))}
</Menu.Dropdown>

What do you think about this approach to implementation?

@antoine-coulon
Copy link
Owner

I'm not sure if this is the right time to generalize yet, as having custom groups created on the fly will have other implications such as having custom UI states per custom group as well (so all UI state can't be at the top-level store, there might be shared UI state but also per-graph UI state in addition to data). So I might need to handle as well the entire store design to make that fit with multiple graphs at once.

For now, I would suggest you to stick to what we had discussed in the first place, which is dealing with two statically known graphs

export interface AppState {
  data: {
   full: DataState, // data for original modules graph, always available
   grouped?: DataState, // Any graph derived from source one - e.g. "grouped"
 };
  ui: UiState;
}

And given the "grouped" graph being defined or not, just display a UI switch to display the "source"/"full" or the "grouped" one.

For now, let's keep things simple and keep the same UiState shared, that is if "circular dependencies" were checked before doing the graph switch, then it should be automatically rendered with the newly selected graph.

Another notes:

  • "Summary" section should be updated accordingly, information about files should be updated depending on the groups
  • "File Explorer" section should be hidden when the "grouped" graphs is selected, as the file explorer only makes sense in the context of a full graph where nodes are the files themselves

As a side note, I'm totally fine with having hardcoded such as "full" | "grouped" in the context of this iteration and also naively considered only these two. Let's incrementally bring small chunks of value to the users :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants