Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLUtils seems quite heavy #155

Open
ablaom opened this issue May 11, 2023 · 5 comments
Open

MLUtils seems quite heavy #155

ablaom opened this issue May 11, 2023 · 5 comments

Comments

@ablaom
Copy link

ablaom commented May 11, 2023

I am increasingly relying on the getobs/nobs interface in quite low-level packages I am working on. It's nice to be able to work generically with tables and arrays. But I only need this basic API and simple things like eachobs. I'm finding MLUtils.jl rather heavy for this purpose (46s precompile/load on julia 1.9).

Are there any plans for factoring out base functionality or moving stuff out to weak dependencies?

I see that StaticArrays constributes lot to load times. The dependency here is NNlib -> KernelAbstractions -> StaticArrays. What's in NNlib that's needed here? (Maybe KernelAbstractions only needs StaticArraysCore?)

julia> @time_imports using MLUtils
      1.1 ms  Statistics
      7.3 ms  ShowCases
      0.3 ms  Compat
      0.5 ms  Compat  CompatLinearAlgebraExt
      1.2 ms  ConstructionBase
     10.7 ms  InitialValues
      0.4 ms  Requires
      0.5 ms  DataValueInterfaces
      1.2 ms  DataAPI
      0.5 ms  IteratorInterfaceExtensions
      0.5 ms  TableTraits
     32.2 ms  Tables
     10.6 ms  MacroTools
     27.5 ms  ChainRulesCore
      0.9 ms  ZygoteRules
      3.7 ms  StaticArraysCore
     17.8 ms  Setfield
     17.0 ms  BangBang
      0.9 ms  ContextVariablesX
      0.5 ms  FLoopsBase
      1.1 ms  PrettyPrint
      0.5 ms  NameResolution
    126.0 ms  MLStyle
      3.0 ms  JuliaVariables
      0.4 ms  Adapt
      0.5 ms  ArgCheck
     14.1 ms  Baselet
      0.6 ms  CompositionsBase
      0.5 ms  DefineSingletons
      9.8 ms  MicroCollections
     14.6 ms  SplittablesBase
     34.1 ms  Transducers
      4.2 ms  FLoops
      1.1 ms  InverseFunctions
     18.8 ms  Accessors
     18.5 ms  FunctionWrappers
    235.6 ms  FoldsThreads 309.83% compilation time
     60.5 ms  DataStructures
      0.6 ms  SortingAlgorithms
      9.3 ms  Missings
      1.0 ms  DocStringExtensions
      4.7 ms  IrrationalConstants
      0.4 ms  LogExpFunctions
      0.6 ms  LogExpFunctions  LogExpFunctionsChainRulesCoreExt
      0.4 ms  LogExpFunctions  LogExpFunctionsInverseFunctionsExt
      0.4 ms  StatsAPI
     17.3 ms  StatsBase
      2.7 ms  SimpleTraits
      6.0 ms  UnsafeAtomics
     12.9 ms  Atomix
      2.2 ms  GPUArraysCore
     13.8 ms  Preferences
      0.4 ms  PrecompileTools
    435.4 ms  StaticArrays
      1.1 ms  ConstructionBase  ConstructionBaseStaticArraysExt
      0.5 ms  Adapt  AdaptStaticArraysExt
      0.5 ms  Accessors  AccessorsStaticArraysExt
      3.7 ms  CEnum
      0.4 ms  JLLWrappers
    242.0 ms  LLVMExtra_jll 98.67% compilation time (98% recompilation)
     42.7 ms  LLVM
      4.7 ms  UnsafeAtomicsLLVM
     27.9 ms  KernelAbstractions
     30.3 ms  NNlib 57.78% compilation time
      1.4 ms  DelimitedFiles
      7.0 ms  MLUtils
@ToucheSir
Copy link
Contributor

NNlib is used in a couple of places in https://github.com/JuliaML/MLUtils.jl/blob/main/src/utils.jl, but I don't think those would be too difficult to change or vendor the functions used.

@CarloLucibello
Copy link
Member

CarloLucibello commented May 19, 2023

Yes, it would be nice to excise the NNlib dependency. Its functionality is used in

  • chunk
  • rpad_constant
  • anything else?

so we could move those functions to NNlib.

@ablaom
Copy link
Author

ablaom commented Nov 2, 2023

Anyone have some time to revisit this?

@ToucheSir
Copy link
Contributor

The biggest blocker is still what to use in place of NNlib.scatter for

degrees = NNlib.scatter(+, ones_like(partition_idxs), partition_idxs, dstsize=(m,))
. Vendoring scatter won't help since it depends on KernelAbstractions.

It'd also be worth redoing the import timings since the JuliaFolds packages have changed ownership and received some bugfixes since this issue was originally opened.

@ablaom
Copy link
Author

ablaom commented Nov 3, 2023

Related (duplication?): #90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants