New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce module encapsulation via a linter #41862
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish that this were simpler, and perhaps it may be so. Ideas welcome!
.clj-kondo/hooks/clojure/core.clj
Outdated
;; TODO -- this hard coding is not ideal, it would be better to support metadata on the namespaces themselves. | ||
(def ^:private global-modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the convenience of leaving this detail for another PR, it's quite nice seeing all the modules in one place for now, particularly if we want to plug some of the leaks in metabase.db
.
@@ -179,6 +179,134 @@ | |||
(non-thread-safe-form-should-end-with-exclamation* node)) | |||
{:node node}) | |||
|
|||
;; ----------------- Modules ----------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to hide all these helper function in a (testable) namespace, but I didn't figure out how to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of tests, I didn't find an easy way to jack into this kondo namespace with Cursive, so I didn't bother with "rich comment" tests either. I tested manually by inserting actual requires in the app and running the linter but don't have a great way to share that here. I guess I could show it via screenshots or a loom, but this seems a bit OTT.
.clj-kondo/hooks/clojure/core.clj
Outdated
|
||
'metabase.upload | ||
;; We anticipate using this CLJC ns for type inference on the frontend, so it's exposed outside the CLJ module. | ||
'metabase.upload.types)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving aside plugging the leaks in metabase.db
, which is included as a stress-test we could defer, one might as if it's worth exposing supporting nested modules if we're not yet using this namespace from the frontend. One might also argue that it's better to just move this code out of the module, despite the awkward naming challenge that presents.
For all the complexity of this linter, only a handful comes from this feature, so I think it's worth it.
.clj-kondo/hooks/clojure/core.clj
Outdated
'metabase.upload.types)) | ||
|
||
;; TODO -- this hard coding is not ideal, it would be better to support metadata on the namespaces themselves. | ||
(def ^:private internal-modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need these for uploads, but I think they're pretty natural, and only a little bit more complicated than global nested modules to support. A drop in the bucket compared to the vanilla machinery.
(when (str/starts-with? (name ns-sym) prefix) | ||
module-ns-sym)))) | ||
|
||
(def ^:private module-detectors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is tempting to inline, its more efficient to only calculate it once.
(into #{} (remove nil?) | ||
[;; Modules can depend on their own internals. | ||
(modules ns-sym) | ||
;; We can depend on our siblings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit weird for modules to be able to depend on sibling modules. We may want to tighten this up a bit.
|
.clj-kondo/hooks/clojure/core.clj
Outdated
(sorted-set | ||
'metabase.db | ||
;; We may aspire to encapsulating more of these - either through refactoring or ^:clj-kondo/ignore. | ||
'metabase.db.connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It surprises me that this allows metabase.db.connection
to be a top-level module, while metabase.db
is also a top-level module. I thought we wanted a rule that if metabase.db
was a module then everything matching metabase.db.*
except metabase.db.core
would be internal. But the way this works is if metabase.db
is a module, then everything matching metabase.db.*
is internal, except for namespaces that are also listed as global. Is there a motivation behind this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the notion of nested module is what I call internal-modules
in the next var, with metabase.db.liquibase
being the one example.
The main motivation I have for nested modules that are also globally visible is metabase.upload.types
, where a big reason for how and why it was extracted was making it available to the frontend. I think this example here with metabase.db
also shows how it's a useful mechanism for "latching" a large module that isn't quite fully encapsulated yet, but is getting there.
An (potentially dangerous to this audience) analogy can also be made to thinking of module namespaces like service objects, and these global submodules as their public properties. I think this will give a good and natural way to have some interface segregation for large modules.
We could YAGNI it for now, but I felt it falls out quite naturally, and it's good to know we're not designing ourselves into a corner that wouldn't support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked this question because I think this code can be simpler if we enforce the rules that we discussed at the offsite, rather than extending it to allow for nested-by-name-but-still-top-level modules like you're doing here.
I think we should keep the logic for determining what's internals or not rule-based because it has benefits for us humans. For example, if I know metabase.db
is a module, I know that metabase.db.liquibase
is guaranteed to be internal to it. I don't have to go to this namespace and check whether metabase.db.liquibase
is top-level or not, unlike this implementation would make me do.
Here's an outline of the solution I had in mind, and it only works if we don't allow for "nested modules that are globally visible" like we do here.
- A namespace named
N
is internal to module namedM
ifN
matchesM.+
but notM.core
. - A namespace named
N
is an interface to a module namedM
ifN
isM.core
orM
(in practiceM
andM.core
shouldn't coexist)
The rule we enforce is this: a namespace N can't use an internal namespace of any module (nested or top-level) unless N is itself internal to that module, or N is an interface namespace for that module.
This works for nested modules too. Consider the case that A.B
and A.B.C
are nested modules inside module A
. Some test cases:
A.D
can useA.B
because even thoughA.B
is internal toA
,A.D
is too.A.D
can not useA.B.C
becauseA.B.C
is internal toA.B
butA.D
is not.A.B
can useA.B.C
because it is an interface namespace forA.B
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for metabase.upload.types
being used on the FE, I do think we can YAGNI on that for now, and maybe it would be better to expose module system for clojurescript code by a separate mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation, I think we're on the same page with the semantics. The only difference I can see is this code allows nested modules to be marked as "public" by declaring them global-modules
1. If we ignore that detail, I'm curious whether there is a semantic difference I missed, or you just feel the implementation could be simpler. The majority of the complexity is in the token-walking and node tagging I think, not in support of this feature.
To streamline this discussion and perhaps the PR flow too, I've removed this "public" motion, and made all nested modules implicitly private. The practical difference besides the deleted config and sanity checks is just one line of logic being removed2. With only this notion of nested we can no longer modularize metabase.db
, so the scope is reduced to just metabase.upload
, which is all we really want for now anyway.
Let's discuss this implementation for now, and once that's settled we can shift back to the "default private" versus "cannot be made public" debate.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for metabase.upload.types being used on the FE, I do think we can YAGNI on that for now, and maybe it would be better to expose module system for clojurescript code by a separate mechanism.
Agreed, the main reason I included it was to check the viability of my preferred DX and start the conversion. It's gone and left for another day.
I'd prefer fewer moving parts and special cases, or diverging structure in Clojurescript, so I'm a bit skeptical of this "separate mechanism", but who knows elegant solution could occur to us when we need to use it in anger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for simplifying this discussion about interface vs implementation. You're right that the debate about "cannot be made public" can be parked so let's simplify the implementation first. My intuition that this can be made a lot simpler is based on the fact that the code doesn't map well to the English description of how it should work. Here's a demonstration of what I mean, putting the English desciption next to the the clojure code:
;; 1. A namespace named N is an interface to a module named M if N is M.core or M
(defn interface-for? [n m] (contains? #{m (str m ".core")} n))
;; 2. A namespace named N is internal to module named M if N matches M.+ and is not an interface for M
(defn internal-to? [n m] (and (str/starts-with? n m) (not (interface-for? n m))))
;; A namespace N can't use an internal namespace of any module (nested or top-level)
;; unless N is itself internal to that module, or N is an interface namespace for that module.
(defn illegal-module [modules using used]
(first (filter (fn [module]
(and (internal-to? used module)
(not (or (internal-to? using module)
(interface-for? using module)))))
modules)))
illegal-module
could be used by module-internals-should-be-encapsulated
, replacing the use of allowed-parents
and parent-module
.
On the topic of testing, I think something like this would suffice:
(deftest illegal-module-test
(let [modules #{"A" "A.B" "A.B.C"}]
(is (nil? (illegal-module modules "A.D" "A.B"))) ; A.B is internal to A
(is (= "A.B" (illegal-module modules "A.D" "A.B.C"))) ; A.B.C is internal to A.B but A.D is not and A.D is not the interface for A.B
(is (nil? (illegal-module modules "A.B" "A.B.C"))) ; A.B.C is an interface namespace for A.B.C
(is (= "A.B.C" (illegal-module modules "A.B" "A.B.C.D"))) ; A.B.C.D is internal to A.B.C but A.B is not and A.B is not the interface for A.B.C
(is (nil? (illegal-module modules "A.B" "A.B.C.core")))) ; A.B.C.core is an interface namespace for A.B.C
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's interesting how subjective simplicity and intuitiveness can be. Late post incoming with a lot of recency bias towards the first algorithm I thought of.
If we peel away all the logic about walking the tokens, and reporting the findings on the appropriate nodes, dealing with symbols versus strings, handlings tests, etc the PR's code boils down to the following:
;; 1. The most directly enclosing module of the given namespace, if there is one.
(defn- parent-module [modules ns] (first (filter #(str/starts-with? ns (str % ".")) (reverse modules))))
;; 2. The modules containing the direct children and siblings of the given namespace.
(defn- allowed-modules [modules this-ns]
(into #{} (filter modules) [(str/replace this-ns #"\.core$" "") (parent-module modules this-ns)]))
;; Check that the [used] can be imported by [using]
(defn illegal-module
[modules using used]
(when-let [used-parent-module (parent-module modules used)]
;; Is the give dependency inside a module that is opaque to us?
(when (not (contains? (allowed-modules modules using) used-parent-module))
used-parent-module)))
It's a little bit chunkier than your code, but I'd argue its considerably more declarative and maps more naturally to human language. Seeing it this way I'm actually feeling a lot happier with it than when it hid in the kondo murk. Maybe there's a way to factor it out a bit better.
Perhaps it was end-of-week fatigue, but convincing myself your code was right felt like math exercise needing to work through complex boolean logic, compared to single branch in the reference code. I even got a bit mixed up and thought I saw a bug (that A.B
wouldn't load A.E
), but I was being silly.
Getting to less subjective stuff - let's talk efficiency and debug-ability. I think it's useful to look at your implementation being expanded:
(defn illegal-module [modules using used]
(first (filter (fn [module]
(and (and (str/starts-with? used (str module "."))
(not (contains? #{module (str module ".core")} used)))
(not (or (and (str/starts-with? using (str module "."))
(not (contains? #{module (str module ".core")} using)))
(contains? #{module (str module ".core")} using)))))
modules)))
While it's very compact, I see a few problems with the computational structure:
- Some allocating forms get executed multiple times. These can't be de-duplicated without changing the call structure.
- Redundant logic -
(not (contains? .. using))
could be skipped since its negative is checked in the surroundingor
. - All logic occurs within a loop, and is quite low level. Stepping through or tracing this would be really tough.
- This loop is less likely to short-circuit than the separate "allowed" versus "found" loops.
- Since we will loop over multiple multiple
used
variables for the same values ofusing
andmodules
, these problems get exacerbated.
Given that we run this linter across every namespace on every push, efficiency matters, and even though this is an unfair shoot out with non-optimized code it does seem like there is less low-hanging fruit. The ability to pre-build the detectors and then run all the calculations for the given using
only once across all imports are both huge boosts that fall out quite naturally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is more like what I was looking for. Let's shift focus to correctness because now I see a couple of issues with this
- Your
illegal-module
doesn't pass my tests above, there's a small bug inparent-module
.
(deftest illegal-module-test
(let [modules (sorted-set "A.B.C" "A.B" "A")]
(is (nil? (illegal-module modules "A.D" "A.B"))) ; A.B is internal to A
(is (= "A.B" (illegal-module modules "A.D" "A.B.C"))) ; A.B.C is internal to A.B but A.D is not and A.D is not the interface for A.B
(is (nil? (illegal-module modules "A.B" "A.B.C"))) ; A.B.C is an interface namespace for A.B.C
(is (= "A.B.C" (illegal-module modules "A.B" "A.B.C.D"))) ; A.B.C.D is internal to A.B.C but A.B is not and A.B is not the interface for A.B.C
(is (nil? (illegal-module modules "A.B" "A.B.C.core"))) ; A.B.C.core is an interface namespace for A.B.C
))
(illegal-module-test)
; FAIL in (illegal-module-test)
; expected: (nil? (illegal-module modules "A.B" "A.B.C.core"))
; actual: (not (nil? "A.B.C"))
nil
The problem is (parent-module modules "A.B.C.core")
returns "A.B.C", but it should return "A.B". The problem is fixed by using my internal-to?
function above:
(defn interface-for? [n m] (contains? #{m (str m ".core")} n))
(defn internal-to? [n m] (and (str/starts-with? n m) (not (interface-for? n m))))
(defn- parent-module [modules n] (first (filter #(internal-to? n %) (reverse modules))))
(illegal-module-test)
nil
- I initially thought your use of
parent-module
was just an optimisation trick, but now I see that it actually produces different semantics to how we want modules to work and your PR description. There's an important test case I missed, that your code fails:
(deftest illegal-module-test
(let [modules (sorted-set "A.B" "A")]
(is (nil? (illegal-module modules "A.B.C" "A.util")))))
(illegal-module-test)
; FAIL in (illegal-module-test)
; expected: (nil? (illegal-module modules "A.B.C" "A.util"))
; actual: (not (nil? "A"))
Your code goes one step further than "a module prevents any code outside that module from using its internals", adding an additional restriction: "a module prevents its internals from using any code outside, except top-level modules or namespaces". That effectively rules out modules sharing any code except through top-level modules, and will force us to move a lot more code to the top-level.
For next steps, I think we need a set of passing tests everyone can agree on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I ended up replying to this only in Slack, so putting a shortened reply here for posterity.
- Ah thanks for spotting. I wasn't thinking of
.core
namespaces being first class, so only supported the situations I actually saw in the code already. Easy to fix with this PR's approach too. - I don't see why that should pass - why should you be able to read the internals of your parent module? I don't agree with that working unless
A.util
is also made a public module, which then works with this code.
I'm surprised these submodule semantics were so controversial. Since we don't have a way to write tests for linters as documentation, that I know of, I guess it's just as well we're deferring them.
@crisptrutski no there is no issue for this and i think we can live without one |
d6e7c46
to
5f42769
Compare
Superseded by the CML5000 |
Description
This introduces a linter against importing a module's internals from outside that module.
Uses
metabase.upload
andmetabase.db
as test cases, which motivate the need to supporting nested and internal modules.