Skip to content
This repository has been archived by the owner on Jun 4, 2022. It is now read-only.

Explore CLJS require from node_modules #130

Open
anmonteiro opened this issue Apr 17, 2017 · 27 comments
Open

Explore CLJS require from node_modules #130

anmonteiro opened this issue Apr 17, 2017 · 27 comments

Comments

@anmonteiro
Copy link
Owner

anmonteiro commented Apr 17, 2017

FWIW clojurescript-npm supports it.

See: https://github.com/nasser/clojurescript-npm/blob/master/lib/register.js

@arichiardi
Copy link
Collaborator

Oh, this would be awesome, together with npm packaging on lumo-compilable libraries (like andare)

@anmonteiro
Copy link
Owner Author

anmonteiro commented Sep 17, 2017

This one is a little tough because (the way I see it) it will have to rely on convention.

We can easily support requiring CLJS namespaces from a node_modules installation but here are some open questions:

  1. If you want to require foo.core, how do we know where it is? (i.e. could be under node_modules/some-name/src/cljs/foo/core.cljs

  2. It would be nice to allow package authors to publish Lumo-consumable namespaces with JS / analysis caches. But again, where should Lumo look for these cached namespaces?

IMO, we could easily solve 1. and 2. by piggybacking on the Node.js convention of placing that information in the package's package.json. But we'd need to agree on a standard set of package.json fields that Lumo would recognize. Any ideas / suggestions?

@arichiardi
Copy link
Collaborator

Totally agree with your analysis and definitely agree with storing info in package.json.
There could be a lumo section in there to have clear semantic (the dir where to find source, like macros).
Another dir we would need is the aot one (or commonly lib, or dist for JS), where authors can put analysis cache.

@anmonteiro
Copy link
Owner Author

anmonteiro commented Sep 17, 2017

NPM standardizes a directories field which we could probably use without having to create a new lumo entry.

Here's a suggestion:

{
  "name": "my-package",
  "directories": {
    "lib": "src",
    "cache": "./lumo-cache"
  }
}

Lumo could then simply add node_modules/my-package/src to the classpath and support requiring whatever namespaces my-package provides.

@arichiardi
Copy link
Collaborator

Sounds sensible to me

anmonteiro added a commit that referenced this issue Sep 21, 2017
@anmonteiro
Copy link
Owner Author

First step towards this done in 3cddd1d.

Future enhancements are:

  • also read the cache entry
  • allow adding/removing node_modules source paths at runtime (currently we read them once when Lumo initializes and they're static throughout that session – meaning we can't manipulate them with the functions in lumo.classpath or recognize that we installed a new module while the session is running)
  • properly document how this works

@arichiardi
Copy link
Collaborator

👍

@arichiardi
Copy link
Collaborator

arichiardi commented Oct 27, 2017

I am trying this out, one thing I have noticed in my debugging is that (of course) other libraries are using the lib convention, for instance the following:

[ '/home/arichiardi/git/module-cljs/node_modules/formidable/lib',
  '/home/arichiardi/git/module-cljs/node_modules/json-schema/lib',
  '/home/arichiardi/git/module-cljs/node_modules/jsonify',
  '/home/arichiardi/git/module-cljs/node_modules/restify/lib',
  '/home/arichiardi/git/module-cljs/node_modules/sshpk/lib',
  '/home/arichiardi/git/module-cljs/node_modules/underscore.string',
  '/home/arichiardi/git/module-cljs/node_modules/url-template/lib' ]

So I was kind of wondering if we should have our own convention...or maybe not...we could skip inspecting the above libs if so...but I don't think it makes a big difference in terms of performance.

@arichiardi
Copy link
Collaborator

I tried to include a child project in my project, so it's working but it fails to require the node dependencies of the child module (I have faker in there). It seems that the indexing of the node dependencies is only one level deep (I will investigate more though), and we instead need to require recursively when we include ClojureScript from an npm package.

@arichiardi
Copy link
Collaborator

arichiardi commented Oct 27, 2017

Just want to add that children with dependencies that are already present in the parent work as long as the have the directories section.

@arichiardi
Copy link
Collaborator

This works pretty well thanks to the approach in #302 and #300 for scoped packages 🎉 I was able to publish a ClojureScript npm package and consume it from lumo in a parent using it as dependencies.

I was wondering though one thing, should we default directories.lib to "src" when missing? I guess it is a good convention? What do you think @anmonteiro?

@bhurlow
Copy link

bhurlow commented Oct 29, 2017

great thread here, this would be a pretty nice efficiency boost for my workflow with lumo so I'm +1 on the general idea. I've been solving this a different way by running a starting script that reads node_module deps from a file and then shell execs out to lumo with all the required classpaths passed in to the cli. I believe this is more or less how leiningen does it. Perhaps this is a case to separate some dependency loading from lumo itself so it can focus on providing a generic environment

arichiardi added a commit to arichiardi/macrovich that referenced this issue Nov 6, 2017
Lumo has experimental support for ClojureScript npm packaging now:
  anmonteiro/lumo#130

This means we can deploy to npm seamlessly and then require namespaces at the
REPL. Macrovich is massively important for converting JVM ClojureScript to
self-host ClojureScript so I think it should be the first one who gets this new
approach so that it can be used as npm dependency as well.
cgrand pushed a commit to cgrand/macrovich that referenced this issue Nov 7, 2017
Lumo has experimental support for ClojureScript npm packaging now:
  anmonteiro/lumo#130

This means we can deploy to npm seamlessly and then require namespaces at the
REPL. Macrovich is massively important for converting JVM ClojureScript to
self-host ClojureScript so I think it should be the first one who gets this new
approach so that it can be used as npm dependency as well.
@bhurlow
Copy link

bhurlow commented Dec 2, 2017

@arichiardi I think defaulting to src/${package_name} when otherwise not specified in package.json would be sensible convention

@arichiardi
Copy link
Collaborator

At the moment we skip the read from the dependencies that don't have the lib entry in directories above. I am actually tempted of saying that we should have lumo specific key to make it even clearer that the project is a special one...once we have that then we can use defaults...another way to put it is that I would not use custom lumo defaults on things that are not marked as lumo specific.

@bhurlow
Copy link

bhurlow commented Dec 3, 2017

@arichiardi I've been doing this sort of 'manually' here where for a given lumo based project, I specify cljs-via-npm deps in a lumo key in package.json. This works fine, except that I have to duplicate each dependency.

For library implementors though, it seems to me that they may wish to simply provide a bootstrapped cljs compatible library via npm which has no explicit requirements on lumo. If other cljs platform tools are born, would we ask those library owners to add a package.json key for that specific runtime? What about dependencies for planck or shadow-cljs?

@thheller
Copy link

Hello everyone.

The request to be able to publish/consume CLJS packages via npm instead of clojars has come up quite a lot for shadow-cljs as well. I'm sort of opposed to this but that doesn't mean that we shouldn't do it. Since at least @cgrand has started publishing packages like this (eg. https://www.npmjs.com/package/xforms) I thought we should at least talk about this.

The problem we need to address is the classpath issue. Fundamentally for the JVM the classpath is constructed by multiple paths that are tested in order to look for things. npm/node does not have a classpath and follows completely different rules.

So as far as I see it we need a way for CLJS packages to declare the "root" paths that should be added to the classpath. Either packages can just publish sources and use their source paths or hook into the npm publish process somehow and copy all source files into on path.

I do not like using the default "directories" property for this since JS packages may be using that and I don't want random JS packages on my classpath. So one option would be to add a "clojurescript" property to package.json or add a separate cljs.edn (maybe even deps.edn) to the package itself.

Example package.json:

{
  "name": "something",
  "clojurescript": {
    "source-paths":[
      "src/main"
    ],
    "dependencies":[
     ["some-lib", "version", {"exclusions":["foo/bar"]}]
   ]
  },
  // ...
}

I think I'd prefer a .edn version since we'd safe translating JSON to EDN. JS tools don't care about this info anyways.

There could be other keys added for caches and stuff although I'd be very careful with this since caches might not be compatible since different compiler versions might be incompatible.

One problem would be classpath ordering but I guess we could follow the same ordering the JVM does by resolving the dependencies in order.

Another issue is that npm can install packages with conflicting versions in nested directories. Since addressing on the classpath must be unique all conflicting versions would be on the classpath but only one would be used.

Thoughts?

@arichiardi
Copy link
Collaborator

Hello Thomas, thanks for pitching in.

One note before continuing: that directories has been designed for "describing" the project structure if I understand this correctly, that is why it felt like a good approach: https://docs.npmjs.com/files/package.json#directories

I agree though that if you want to start having a full replacement of project.clj then a custom .edn would be the way to go.

We are already doing this in serverless-cljs-plugin, which wraps the lumo compiler: https://github.com/nervous-systems/serverless-cljs-plugin/blob/master/README.md#lumo

One disadvantage is the split between node and JVM deps so that is why I started asking around if we could publish lumo-compatible libraries on npm. Christophe jumped right in but for others the same cannot be said: I think we should accept the fact that library authors rarely like to publish to two places and we need to handle this in our build tools.

So this goes back to a custom .edn, which is accepted in the js world anyways (typescript has so so many configuration files).

I would definitely love to coordinate with you this effort.

@thheller
Copy link

Tell people where the bulk of your library is. Nothing special is done with the lib folder in any way, but it's useful meta info.

As I said the problem I have with using the default "directories" is that JS libs may be using it for other purposes than marking a "classpath root". Just want to avoid running into some JS lib convention I don't know about that would end up putting X files onto the classpath.

I'm sort of opposed to this whole idea since it IMHO doesn't solve anything unless EVERYTHING is published to npm. Otherwise you still need a way to load things maven-style (eg. clojars) anyways. It would be much easier to only load CLJS libs from one place rather than introducing the extra ambiguity that CLJS libs also may come from another place.

Currently we have maven-style for CLJS and they can declare JS deps via deps.cljs :npm-deps. You start by resolving CLJS deps first, construct a classpath, find :npm-deps and install those. Not ideal but works.

Cross package manager resolving is hard. Which artifact do you pick if lib A depends on core.async via maven but lib B uses the npm coordinate? Gets even worse with deps.edn git support.

I don't think we can solve the split issue since npm is never going to become the default. The more surface area the split gets the harder everything becomes so why make it bigger?

The only argument I have heard FOR publishing to npm is that it is easier (especially for JS people). I think we can solve that with better tools on the CLJS front.

What other arguments are there?

@arichiardi
Copy link
Collaborator

arichiardi commented Jan 29, 2018

The other argument for Enterprise is dependency mirroring, with the split in place you need two...in general though what you say is probably the best approach...unifying is going to be hard anyways and if we start from what we have, for instance deps.edn, we'll have more traction.

At the very least a simple npm dependency resolves for tools.deps can be a wrapper around npm...maybe...

@yogthos
Copy link

yogthos commented Mar 15, 2018

Using node_modules as the default with package.json seems like it would provide the most convenience for packaging and distributing things.

@thheller I think that having a reserved directory like node_modules/cljs or something would address majority of the concerns regarding other libraries putting things on the classpath.

My vote would be to use node_modules as the default, and have deps.edn as an opt in for cases where you want more control over the classpath.

@bhurlow
Copy link

bhurlow commented Mar 16, 2018

disregarding the technical limitation for a second, I think locating cljs (especially for bootstrapped) deps in npm would go a long way towards welcoming js devs into the clojurescript ecosystem. Making the modules searchable in a public forum such as npm could greatly aid discover-ability and adoption. I think many folks out there don't know what they're missing :)

@yogthos
Copy link

yogthos commented Mar 17, 2018

Yeah, I think there's a huge value in making ClojureScript ecosystem seamlessly accessible to JavaScript developers using tools they're already familiar with.

@filipesilva
Copy link
Contributor

filipesilva commented May 27, 2019

I'd like to chime in from the TypeScript (TS) side of the JS ecosystem. I think the example of TS is relevant because it is widely used and faces similar questions. Please forgive any ignorance in the CLJS side of things, I am still learning it.

TS users have a very big incentive to ship .ts files to npm because consumers are already all using npm. One specific consumer, Angular, has even more of an incentive because it ships its own specialized TS compiler in order to perform AOT compilation of HTML templates, and that compiler needs metadata that isn't present on .js files.

Despite this, the golden rule in packaging TS and Angular libraries has always been "don't ship TS". Nothing really stops you from doing it. But doing it usually has bad results (more on that later).

Instead, TS libraries are encouraged to also publish additional declaration files that complement the .js files. You can either add a special typings field to package.json, or publish a separate @types/package-name just for the typings.

Angular libraries publish that and more. They have their own guidelines in what's called the Angular Package Format (APF). The APF seeks allow for many different consumers of npm libraries, both on the platform and on the tool level (node, browser, Webpack, Closure, Rollup, UMD, TS, Angular AOT, script tags, etc).

Both TS and Angular piggyback on package.json to add more things, but try to steer clear of altering what the main field points to as it being a library that can be consumed by either Node or the browser directly, and mostly compatible with older platforms by being ES5 (notable exception for ESM imports).

You might think all of this complexity stems from TS, but it is not so. Even with just JS you will need to make special allowances for your various consumers.

To be fair with impending ES modules and several ES versions out in the wild already, this is a bit of a timebomb. The JS community will need to figure out both forwards and backwards multi-platform compat in package.json soon.

All this to say that I don't think there is anything inherently wrong with piggybacking on package.json. It is already a vaguely common practice. As long as you always keep a functioning, mostly compatible, javascript code in the main entry point.

When you don't, these are the problems that can happen:

  • main is for browser: can break node apps
  • main is for node: can break browser apps
  • main is ES2015+ code: can break on older node/browser at runtime, breaks minifiers that don't support ES2015 (like UglifyJS), can bundler parsers.
  • main is TS code: breaks TS compilation units, requires consumer to have similar TS compiler version to even work

Despite all these problems, it is still common practice to ship whatever is handy on closed source situations like inside companies that have a NPM proxy. In this case NPM is just where artifacts are stored. And since the consumers are tightly controlled, it's not a problem.

But if you want to distribute packages freely you should be mindful of these issues. If you want to package CLJS libraries, then main should still point at a .js ES5 entry point for either Node or the browser. And if you want to also ship *.cljs files you should consider how they are going to be consumed, and how to ensure consumers can compile them.

One important point about packaging ES5 code compiled from CLJS is bundle size. As mentioned on thheller/shadow-cljs#200 (comment), you can end up with multiple versions of cljs.core. I don't know how big it all is, but I saw 93k for a hello world (#480) and that's already too much.

The (browser) JS world is very sensitive to bundle sizes nowadays, as it should be. No one wants to have multiple 93k overheads from importing a couple of different libraries. There still isn't a bundler/minifier that is as good as the Closure compiler, but good usage of Rollup/Webpack/Terser can get you very close, and with a lot more compatibility. The Angular project expends significant effort in keeping their libraries optimizer-friendly and free from toplevel eval side-effects.

So I think that if you ever want to make compiled CLJS libraries usable by non CLJS consumers, the most important problem to tackle is the possible multiplicity of cljs.core.

It's generally fine to package unoptimized ES5 code and to let consumers optimize it, so I don't think you need to care about the Closure compiler when packaging libraries.

The important bit is to provide a way for multiple libraries to be using the same existing cljs.core (and others, I assume). The best example I can think of is tslib, where typescript also ships helpers. It's not an awesome example because they very rarely change.

Maybe something similar could be done for cljs.core, together with judicious use of semver.

I can imagine a CLJS library on npm that looks something like this:

dist/         # ES5 .js for browser, with ESM imports to `@cljs/core.async`
src/          # .cljs files
package.json  # `main` points at `dist/something.js`
              # `cljs` points at `src/something.cljs`
              # `dependencies` contains `"@cljs/core.async": ">=a.b.c <x.y.z"`

Where the "@cljs/core.async": ">=a.b.c <x.y.z" dependency is on npm and contains the unoptimized core.async for a particular version. It also doubles as an indication of what the minimum and maximum CLJS version must be used to compile the .cljs files.

This way a JS consumer of this lib can use it as JS and has a chance at not bundling cljs.core twice (although not guaranteed), and a CLJS consumer can use it as well.

@thheller
Copy link

One feature that other compile-to-JS languages don't have to deal with is macros. A big part of ClojureScript is macros and a lot of CLJS libraries will have one or more. Therefore it is often undesirable to have CLJS code directly accessible from JS since you can't really "consume" a macro from JS.

Having a generic mechanism for CLJS where each package is directly consumable from npm is therefore quite pointless and not a goal. We could in theory publish all packages in source form but that doesn't buy us anything over doing the same over maven.

Another issue is that ClojureScript was very much built with the Closure Compiler in mind. That buys us a lot of freedom in how to design packages and we generally don't need to worry about "big" namespaces since Closure will just remove the stuff we don't use. Since we expect to run through the :advanced compilation (at least for Browser-targetted builds) we also don't need to worry about long names. The CLJS compiler will often generate quite verbose names (eg. cljs.core.get.cljs$core$IFn$_invoke$arity$2 which will usually be shortened to something using 1-2 characters). Renaming properties is still not done by other compilers and likely never will be since it sort of only works when doing whole-program optimizations. Without :advanced compilation cljs.core will always be too large for anything that cares about size in the first place, even if you have only one.

CLJS code can still be easily accessible from JS so integrating it into existing JS projects is quite easy, it just requires a separate build step (just like calling tsc).

Pre-compiled code on npm is a gigantic problem and so far I have not seen any promising progress. Shipping pre-compiled code is always worse than having access to the source (just like TS). npm is full of pre-compiled packages that decided for me that I'm gonna need a Promise or other polyfill. It is not uncommon to end up including the same polyfill multiple times just because it was coming from some nested install (eg. conflicting core-js versions).

The problem with shipping source of course is that you need a tool to compile it and you need to define a minimal subset of supported features. Given the rate of change JS is still going through that is unlikely to find any consensus anytime soon.

@filipesilva
Copy link
Contributor

It's true that macros cannot be used by a JS consumer, but from an exports point of view it doesn't sound like it change much. Going back to the TS example, TS libs include interfaces for the TS consumers, but they cannot be used by the JS consumers either. The library API for JS consumers is a subset of the "full" API that is only available for TS consumers.

I agree that there's no point in just publishing the source in npm if it's just the same a in maven. To be in npm it would need to be consumable from JS.

Regarding the Closure Compiler I really have to reiterate that you can get most of the way there with other tools. Not all the way, but most of the way. A new Angular app is ~3.5mb in dev mode and 0.18mb in prod mode. Using some advanced optimization modes we get 14k basic apps starting from the same base library. Property renaming is great but that is just part of the story.

But I also don't really know how big whole of cljs.core is. I'd be interested in seeing how small I can get it without the Closure Compiler. Is there any way I could get a build of it that uses either ESM or Node-style modules so that existing tools can understand it? Rollup is probably the most powerful one right now but really does need ESM.

Notwithstanding the problems of polyfills and the sort, I don't agree that shipping pre-compiled code is always worse than having access to the source in TS. In that case I think the pre-compiled code is better. In Angular we also use Bazel, which makes it pretty easy to depend directly on sources. We did that for a while and decided it was better to depend on the pre-compiled JS instead. There are advantages in depending on source but it's not without problems.

@yogthos
Copy link

yogthos commented May 29, 2019

I actually think there is a lot of value pushing source to npm because then it becomes available to people using NPM. I've seen lots of people avoid ClojureScript because of the JVM dependency because there are a lot of negative connotations. I'm not going to go into whether that's reasonable or not, it's just a fact that it is limiting. Allowing people to use ClojureScript from within the ecosystem they're familiar with helps lower the barrier to entry.

@arichiardi
Copy link
Collaborator

A bit of off-topic, the biggest problem with the current JS port of the GCC is speed - no even close.

Until that is fixed is we are never probably going to see lumo used as compiler.

I have recently found out about TeaVM and was kind of wondering how fast that would be.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants