Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to get license information of dependent packages when using yarn workspaces #349

Open
ledsun opened this issue Mar 30, 2021 · 17 comments

Comments

@ledsun
Copy link

ledsun commented Mar 30, 2021

I am using two workspaces.
And each workspace depends on the same package with different versions.
In this case, the package is installed under the workspace directory.

This is the output of the yarn list:

~ yarn list
yarn list v1.15.2
├─ cross-env@5.0.5
│  ├─ cross-spawn@^5.1.0
│  └─ is-windows@^1.0.0
├─ cross-spawn@5.1.0
│  ├─ lru-cache@^4.0.1
│  ├─ shebang-command@^1.2.0
│  └─ which@^1.2.9
├─ is-windows@1.0.2
├─ isexe@2.0.0
├─ lru-cache@4.1.5
│  ├─ pseudomap@^1.0.2
│  └─ yallist@^2.1.2
├─ path-key@3.1.1
├─ pseudomap@1.0.2
├─ shebang-command@1.2.0
│  └─ shebang-regex@^1.0.0
├─ shebang-regex@1.0.0
├─ which@1.3.1
│  └─ isexe@^2.0.0
├─ workspace-a@1.0.0
│  ├─ cross-env@7.0.3
│  ├─ cross-env@7.0.3
│  │  └─ cross-spawn@^7.0.1
│  ├─ cross-spawn@7.0.3
│  │  ├─ path-key@^3.1.0
│  │  ├─ shebang-command@^2.0.0
│  │  └─ which@^2.0.1
│  ├─ shebang-command@2.0.0
│  │  └─ shebang-regex@^3.0.0
│  ├─ shebang-regex@3.0.0
│  └─ which@2.0.2
│     └─ isexe@^2.0.0
├─ workspace-b@1.0.0
│  ├─ cross-env@5.0.5
│  └─ workspace-a@1.0.0
└─ yallist@2.1.2
✨  Done in 0.08s.

In this case, licensed is unable to get the license information of the packages installed under the workspace directory.
The following is the output of the licensed cache:

~ licensed cache
Caching dependency records for licensed-with-yarn-warkspaces
  yarn
    Using cross-env-5.0.5 (5.0.5)
    Error cross-env-7.0.3 (7.0.3)
    Using cross-spawn-5.1.0 (5.1.0)
    Error cross-spawn-7.0.3 (7.0.3)
    Using is-windows (1.0.2)
    Using isexe (2.0.0)
    Using lru-cache (4.1.5)
    Using path-key (3.1.1)
    Using pseudomap (1.0.2)
    Using shebang-command-1.2.0 (1.2.0)
    Error shebang-command-2.0.0 (2.0.0)
    Using shebang-regex-1.0.0 (1.0.0)
    Error shebang-regex-3.0.0 (3.0.0)
    Using which-1.3.1 (1.3.1)
    Error which-2.0.2 (2.0.2)
    Error workspace-a (1.0.0)
    Error workspace-b (1.0.0)
    Using yallist (2.1.2)

  * Errors:
    * licensed-with-yarn-warkspaces.yarn.cross-env-7.0.3
      - dependency path not found

    * licensed-with-yarn-warkspaces.yarn.cross-spawn-7.0.3
      - dependency path not found

    * licensed-with-yarn-warkspaces.yarn.shebang-command-2.0.0
      - dependency path not found

    * licensed-with-yarn-warkspaces.yarn.shebang-regex-3.0.0
      - dependency path not found

    * licensed-with-yarn-warkspaces.yarn.which-2.0.2
      - dependency path not found

    * licensed-with-yarn-warkspaces.yarn.workspace-a
      - dependency path not found

    * licensed-with-yarn-warkspaces.yarn.workspace-b
      - dependency path not found
      - ```
@jonabc
Copy link
Contributor

jonabc commented Mar 30, 2021

@ledsun thanks for the report! I'll take a look when I'm able - is this an urgent issue for you?

@ledsun
Copy link
Author

ledsun commented Mar 31, 2021

I'm in no hurry. I have a workaround.

I assume that the cause of this issue is that yarn workspaces creates symbolic links in the node_modules directory.

~ ls -lah node_modules/
total 8
drwxr-xr-x  17 shigerunakajima  staff   544B  3 30 18:06 .
drwxr-xr-x   8 shigerunakajima  staff   256B  3 30 17:39 ..
drwxr-xr-x   5 shigerunakajima  staff   160B  3 30 18:06 .bin
-rw-r--r--   1 shigerunakajima  staff   2.4K  3 30 18:06 .yarn-integrity
drwxr-xr-x   7 shigerunakajima  staff   224B  3 30 18:06 cross-env
drwxr-xr-x   9 shigerunakajima  staff   288B  3 30 18:06 cross-spawn
drwxr-xr-x   6 shigerunakajima  staff   192B  3 30 18:06 is-windows
drwxr-xr-x  10 shigerunakajima  staff   320B  3 30 17:37 isexe
drwxr-xr-x   6 shigerunakajima  staff   192B  3 30 18:06 lru-cache
drwxr-xr-x   7 shigerunakajima  staff   224B  3 30 17:45 path-key
drwxr-xr-x   8 shigerunakajima  staff   256B  3 30 18:06 pseudomap
drwxr-xr-x   6 shigerunakajima  staff   192B  3 30 18:06 shebang-command
drwxr-xr-x   6 shigerunakajima  staff   192B  3 30 18:06 shebang-regex
drwxr-xr-x   8 shigerunakajima  staff   256B  3 30 18:06 which
lrwxr-xr-x   1 shigerunakajima  staff    14B  3 30 18:06 workspace-a -> ../workspace-a
lrwxr-xr-x   1 shigerunakajima  staff    14B  3 30 18:06 workspace-b -> ../workspace-b
drwxr-xr-x   7 shigerunakajima  staff   224B  3 30 18:06 yallist

I think licensed is getting the directory under node_modules in the next line:
https://github.com/github/licensed/blob/2.15.1/lib/licensed/sources/yarn.rb#L79
But Dir.glob(config.pwd.join("node_modules/**/package.json")) does not follow symbolic links.

My workaround is a patch to replace this with Dir.glob(config.pwd.join('node_modules/**{,/*/**}/package.json')).

@jonabc
Copy link
Contributor

jonabc commented Apr 3, 2021

@ledsun

I assume that the cause of this issue is that yarn workspaces creates symbolic links in the node_modules directory.
My workaround is a patch to replace this with Dir.glob(config.pwd.join('node_modules/{,/*/}/package.json')).

👍 that makes sense. That would be an easy patch to put in but I'd like to learn more about how you'd expect licensed to work with them before adding that as a general fix.

I'm also not super up to speed on yarn workspaces themselves. I've read a few docs on them and they appear to be a way to manage dependencies among multiple projects/packages in a monorepo setup when one project takes a dependency on another project in the repo... is that correct?

I have a concern that a package with the same name and version but different licensing or other metadata might exist in different locations under node_modules which would make the logic for the yarn source indeterminate. Whichever duplicate name-version package is parsed last would be used for all projects. I'm not sure how realistic this scenario is but I think it might be possible if a package is forked and/or multiple package registries are used.

Can you set up a reproduction environment for the problem, removing any sensitive and non-public content? I'm wondering whether building knowledge about workspaces into the yarn source would be more appropriate. Something like enumerating workspaces, then changing dependency_paths to make use of them like Dir.glob(config.pwd.join("node_modules/**{,/${workspace}/**}/package.json"))

@ledsun
Copy link
Author

ledsun commented Apr 19, 2021

Sorry. I can't provide a reproducible environment.

@jgeurts
Copy link

jgeurts commented Apr 29, 2021

I'm running into this as well, specifically with package paths specified in nohoist. Here's a reproduction repo and licensed results.

Site note: I didn't want to store cached licenses in my repo, so each push runs licensed cache and licensed status. Hoping that's expected usage when not storing cached files.

@jgeurts
Copy link

jgeurts commented May 3, 2021

@jonabc Please let me know if that helps or if I can provide anything else related to this!

@jonabc
Copy link
Contributor

jonabc commented May 3, 2021

@jgeurts 👋 sorry I've been pretty busy and haven't had a chance to take a look. I'll try to get to it within the next 24 hrs

@jonabc
Copy link
Contributor

jonabc commented May 4, 2021

@jguerts thanks for the example repo. the repo along with some additional yarn documentation for the nohoist syntax helped me wrap my head around the problem. This is an interesting problem space in that the workspaces functionality is solving for monorepos with multiple built packages. If each of these packages are distributed and used separately they should probably be tracked as individual "apps" from licensed's POV. I'm thinking the best way to handle this is

  1. consider workspaces as separate apps
    • the easiest way to handle this in licensed is to force users to specify their workspaces glob patterns in the licensed config file, however that can cause issues if .licensed.yml and package.json get out of sync. automagically handling workspaces detected from package.json will require some refactoring of how licensed apps are detected. I'll have to think on this one a bit more and see if there's an ideal solution
  2. update the yarn source's handling of node_modules to look at both the root node_modules directory and the workspaces node_modules directory. I don't know if adjusting the node_modules search path is a good idea in this case because it's possible to find the same dependency, at the same version, in two different locations and the metadata should know how to reference the correct one.

Off the top of my head this sounds like a fairly significant chunk of work, and as I mentioned things are pretty busy for me at the moment. Is this blocking you from using licensed and something you need fixed ASAP?

@jgeurts
Copy link

jgeurts commented May 4, 2021

Thank you for looking into this!

It’s a blocker for me, but I’ll try working around it by removing the nohoist part of package.json before installing npms and running licensed. That should force all npms to install to the root. I’ve been holding off on that in case this was a relatively easy/quick update to licensed.

@jgeurts
Copy link

jgeurts commented May 4, 2021

I tried removing the nohoist part of the package.json file, but licensed failed to get paths for a couple dependencies still. So, I ended up "manually" doing your suggestion from #​1:

  • I wrote a script to remove the monorepo dependencies from each workspace package.json
  • Ran npm install in each workspace directory
  • Configured licensed to use npm source type with each workspace as an explicit licensed app.

That mostly worked, but I had some troubles with licenses for some of the dependencies not coming through accurately. I'll open a separate issue for that, if I can get it reproducing.

@jonabc
Copy link
Contributor

jonabc commented May 4, 2021

@jgeurts FYI I've started looking at a fix. There is some complexity in the solution but I'm hopeful to have this fixed and able to put out a release in the next few days.

@jonabc
Copy link
Contributor

jonabc commented May 5, 2021

@jgeurts I've hit a wall in how to get useful information out of yarn's CLI that perhaps you might be more knowledgable on?

When I call yarn list from a package defined as a yarn workspace, it still gives full information on dependencies used by everything in the repo, and not information specific to that workspace. That means that licensed has no way to distinguish which dependencies are specific to that workspace package or not. That is pretty consequential if someone is publishing individual packages out of a monorepo - it would not be correct to say that every dependency from the repo applies to all packages.

Even worse is that, at least with the version of yarn I have locally (1.22.10), yarn's hoisting mechanism is making packages available to workspaces that don't call them out as dependencies 😢 . I am able to use packages listed as a dependencies of workspace A inside of files in workspace B without any direct dependency linkage of workspace A -> B.

TBH I'm not sure how to support yarn workspaces given that I can't accurately tell what dependencies a project uses. For a tool that's purpose relates to licensing compliance I tend to err towards the safer option and am inclined to explicitly deny support for workspaces. I'll try with yarn v2 tomorrow and see if maybe they've fixed things in later versions.

If you have any ideas on how to support this I'd be very interested in hearing them 😆

@villelahdenvuo
Copy link
Contributor

@jonabc This seems to also be an issue with npm workspaces:

Project setup
// ./package.json
{
	"name": "@grano/mygrano-serverless",
	"version": "1.0.0",
	"description": "My Grano Serverless",
	"private": true,
	"workspaces": [
		"serverless-utils",
		"cognito",
		"antivirus",
		"ecom-cloudfront",
		"iam-cloudfront",
		"iam-ses-notifications",
		"ses-events",
		"ses"
	]
}

// ./antivirus/package.json
{
	"name": "@grano/antivirus",
	"version": "1.0.0",
	"description": "Virus Check Lambda for Widdix antivirus",
	"private": true,
	"dependencies": {
		"@grano/logger": "^2.12.1",
		"callbackify": "^1.1.0",
		"dotenv": "^16.0.0",
		"es6-promisify": "^7.0.0",
		"lodash": "^4.17.21",
		"superagent": "^7.1.1"
	}
}
# licensed.yml
apps:
- name: antivirus
  source_path: "/home/runner/work/mygrano-serverless/mygrano-serverless/antivirus"
  cache_path: "/home/runner/work/mygrano-serverless/mygrano-serverless/.licenses/antivirus"
  sources:
  - name: antivirus.npm

Results:

# npm install --ignore-scripts --no-audit --production
added 65 packages in 3s
# npm ls callbackify
@grano/mygrano-serverless@1.0.0 /Users/ville.lahdenvuo/Documents/Grano/repot/mygrano-serverless
└─┬ @grano/antivirus@1.0.0 -> ./antivirus
  └── callbackify@1.1.0

...

  Caching dependency records for antivirus
    * Errors:
      * antivirus.npm.callbackify
        - missing: callbackify@^1.1.0, required by @grano/antivirus@1.0.0
  
      * antivirus.npm.dotenv
        - missing: dotenv@^16.0.0, required by @grano/antivirus@1.0.0
  
      * antivirus.npm.es6-promisify
        - missing: es6-promisify@^7.0.0, required by @grano/antivirus@1.0.0
  
      * antivirus.npm.superagent
        - missing: superagent@^7.1.1, required by @grano/antivirus@1.0.0

I assume it's because npm de-dupes dependencies in the root node_modules folder, but licensed does not traverse into parent folders.

@villelahdenvuo
Copy link
Contributor

villelahdenvuo commented Mar 11, 2022

For people having problems with npm workspaces I added this script before running licensed to work around the issue:

  echo "::group::Workspaces workaround"
  WORKSPACES=$(jq -r '.workspaces // [] | join(" ")' package.json)
  if [[ -n "$WORKSPACES" ]]; then
    for package in $WORKSPACES; do
      echo "Copying node_modules to $package"
      rsync -r --exclude ".bin" --exclude "@my-org/*" ./node_modules/ "./$package/node_modules"
    done
  fi
  echo "::endgroup::"

@jonabc
Copy link
Contributor

jonabc commented Mar 11, 2022

thanks for the repro - taking a look!

@jonabc
Copy link
Contributor

jonabc commented Mar 11, 2022

@villelahdenvuo in this case, the errors being reported by licensed are actually coming from npm. I set up a quick repro for the above setup and got the same errors running npm list from the individual workspace.

There's an issue that was opened on the npm cli about this where npm ls run from a workspace isn't aware that it's dependencies are installed elsewhere. It looks like the latest versions of npm (8.5.0+) enable automatic workspace root detection which will solve this with a little extra logic in licensed to strip the current workspace from the list of detected dependencies

Are you able to upgrade to using npm 8.5+? I'd like to make that a requirement if possible for using npm workspaces because it allows the closest parity to existing licensed usage without requiring any additional configuration options.

@villelahdenvuo
Copy link
Contributor

@jonabc Yeah I'm already using 8.5+ locally, I just need to make sure it's being used on the Github Actions also. Thanks for the investigation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants