Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to configure skipping of types and specs in doc generation #1882

Open
onno-vos-dev opened this issue Mar 28, 2024 · 10 comments
Open

Comments

@onno-vos-dev
Copy link

onno-vos-dev commented Mar 28, 2024

As a maintainer of aws-elixir and aws-erlang we've had some issues with doc generation due to the size of these two packages.

While initially resolved thanks to @wojtekmach (馃檹) this again led to issues with the generation of types and specs for these two packages (aws-beam/aws-codegen#109) causing the docs size to grow up to 152MB for aws-erlang.

Since I do not believe they're THAT important in these two packages to have available on hexdocs.pm I'd like the ability in ex_doc to disable this.

Would you be open to such a PR where a configuration would allow one to skip the generation of docs for types/specs?

@josevalim
Copy link
Member

Hi @onno-vos-dev! Thanks for your work on maintaining those.

Given those are generated, what do you think about having a flag on aws-elixir or aws-codegen itself that skips the types/specs? And then you can define the docs alias to do something like this:

aliases: [docs: [&codegen_without_typespecs/1, "docs"]]

So now you regenerate the code right before generating the docs/publish the package. Then you do MIX_ENV=docs mix hex.publish.

The benefit of doing the above is that you get a bit more of control. For example, you can skip the docs for some modules but leave it on others. WDYT?

@onno-vos-dev
Copy link
Author

onno-vos-dev commented Mar 28, 2024

@josevalim Thank you for your reply and kind words! 馃檱

While I'm not one to shy away from a challenge, I do think it's gonna be pretty non-trivial... 馃槃

At the time of publishing these packages, the code which generates the code (aws-codegen) is no longer available to that particular package be it aws-elixir or aws-erlang. So it'd require an alias which can go through the dance of cloning aws-codegen and aws-sdk-go-v2 and then regenerating all the code based on some toggle. The latter is the least of my concern and should be easy enough to do, it's the former part of cloning two other repos which will be somewhat tricky from within an alias like that 馃槄 (Although I guess I could shove that part into a shell script)

In aws-elixir with the capability of adding functions to a mix.exs I can see this be somewhat doable, but in aws-erlang with rebar3 I have a hard time seeing how I can pull it off? 馃 A custom rebar3 plugin maybe and aliasing against that?

I only briefly looked through the ex_doc code but it seems fairly doable to add the configuration that I'd need and it'd likely be a lot more trivial than the dance described above. Let alone the fact that it may potentially be useful to others in the future that do not have autogenerated code and hence do not have the capability of an alias. I'd be more than happy to do the work 馃憤

@wojtekmach
Copy link
Member

I'd be OK and I think Eric would be too to make a size exception in Hex for this particular package. So my question is how having or not having typespecs and other things in docs changes the developer experience of package consumers. If it doesn't matter, then I'd like to pursue these ExDoc changes to have smaller tarballs indeed.

@josevalim
Copy link
Member

Maybe a simpler idea that we could apply to both Elixir and Erlang is to do a pre pass on the .beam files and remove spec/type attributes? That should be only few LOC for each. I can share a snippet for Elixir tomorrow.

@wojtekmach
Copy link
Member

here's a POC for Elixir. :)

#!/usr/bin/env mix run
Path.wildcard("#{Mix.Project.app_path()}/ebin/*.beam")
|> Enum.each(fn path ->
  {:ok, _module, chunks} = :beam_lib.all_chunks(String.to_charlist(path))
  {_, dbgi} = List.keyfind(chunks, ~c"Dbgi", 0)
  {:debug_info_v1, :elixir_erl, {:elixir_v1, code, typespecs}} = :erlang.binary_to_term(dbgi)

  # keep just callbacks, remove types and specs
  typespecs = Enum.filter(typespecs, &(elem(&1, 2) == :callback))

  dbgi = :erlang.term_to_binary({:debug_info_v1, :elixir_erl, {:elixir_v1, code, typespecs}})
  chunks = List.keyreplace(chunks, ~c"Dbgi", 0, {~c"Dbgi", dbgi})
  {:ok, binary} = :beam_lib.build_module(chunks)
  File.write!(path, binary)
end)

@josevalim
Copy link
Member

Sweet! You can use a similar script for Erlang too. :) it can be even in Elixir. Let me know if that鈥檚 acceptable @onno-vos-dev!

@onno-vos-dev
Copy link
Author

onno-vos-dev commented Apr 21, 2024

Sorry for the delay on this but I only just got around to deal with this. Running above Elixir script followed by mix hex.publish docs results in below error. Seems that type information is required by ex_doc 馃槩

mix hex.publish docs
Generating docs...
** (MatchError) no match of right hand side value: nil
    (ex_doc 0.28.4) lib/ex_doc/language/elixir.ex:143: ExDoc.Language.Elixir.type_data/2
    (ex_doc 0.28.4) lib/ex_doc/retriever.ex:294: ExDoc.Retriever.get_type/3
    (ex_doc 0.28.4) lib/ex_doc/retriever.ex:284: anonymous fn/4 in ExDoc.Retriever.get_types/2
    (elixir 1.14.2) lib/enum.ex:2468: Enum."-reduce/3-lists^foldl/2-0-"/3
    (ex_doc 0.28.4) lib/ex_doc/retriever.ex:283: ExDoc.Retriever.get_types/2
    (ex_doc 0.28.4) lib/ex_doc/retriever.ex:121: ExDoc.Retriever.generate_node/3
    (ex_doc 0.28.4) lib/ex_doc/retriever.ex:65: ExDoc.Retriever.get_module/2
    (elixir 1.14.2) lib/enum.ex:4249: Enum.flat_map_list/2
    (ex_doc 0.28.4) lib/ex_doc/retriever.ex:44: ExDoc.Retriever.docs_from_modules/2
    (ex_doc 0.28.4) lib/ex_doc.ex:24: ExDoc.generate_docs/3
    (ex_doc 0.28.4) lib/mix/tasks/docs.ex:359: anonymous fn/7 in Mix.Tasks.Docs.run/3
    (elixir 1.14.2) lib/enum.ex:2468: Enum."-reduce/3-lists^foldl/2-0-"/3

and with latest ex_doc:

mix hex.publish docs
Generating docs...
** (ArgumentError) errors were found at the given arguments:

 * 1st argument: not a nonempty list

   :erlang.hd([])
   (ex_doc 0.32.1) lib/ex_doc/language/source.ex:219: ExDoc.Language.Source.find_ast/3
   (ex_doc 0.32.1) lib/ex_doc/language/elixir.ex:187: ExDoc.Language.Elixir.type_data/2
   (ex_doc 0.32.1) lib/ex_doc/retriever.ex:388: ExDoc.Retriever.get_type/5
   (ex_doc 0.32.1) lib/ex_doc/retriever.ex:378: anonymous fn/6 in ExDoc.Retriever.get_types/4
   (elixir 1.14.2) lib/enum.ex:2468: Enum."-reduce/3-lists^foldl/2-0-"/3
   (ex_doc 0.32.1) lib/ex_doc/retriever.ex:377: ExDoc.Retriever.get_types/4
   (ex_doc 0.32.1) lib/ex_doc/retriever.ex:155: ExDoc.Retriever.generate_node/3
   (ex_doc 0.32.1) lib/ex_doc/retriever.ex:86: ExDoc.Retriever.get_module/2
   (ex_doc 0.32.1) lib/ex_doc/retriever.ex:54: anonymous fn/3 in ExDoc.Retriever.docs_from_modules/3
   (elixir 1.14.2) lib/enum.ex:2468: Enum."-reduce/3-lists^foldl/2-0-"/3
   (ex_doc 0.32.1) lib/ex_doc/retriever.ex:23: ExDoc.Retriever.docs_from_dir/2

@josevalim
Copy link
Member

Ah, we need to prune the doc entries for those types as well. Maybe just pruning specs is enough? You can try this version:

#!/usr/bin/env mix run
Path.wildcard("#{Mix.Project.app_path()}/ebin/*.beam")
|> Enum.each(fn path ->
  {:ok, _module, chunks} = :beam_lib.all_chunks(String.to_charlist(path))
  {_, dbgi} = List.keyfind(chunks, ~c"Dbgi", 0)
  {:debug_info_v1, :elixir_erl, {:elixir_v1, code, typespecs}} = :erlang.binary_to_term(dbgi)

  # remove specs
  typespecs = Enum.filter(typespecs, &(elem(&1, 2) != :spec))

  dbgi = :erlang.term_to_binary({:debug_info_v1, :elixir_erl, {:elixir_v1, code, typespecs}})
  chunks = List.keyreplace(chunks, ~c"Dbgi", 0, {~c"Dbgi", dbgi})
  {:ok, binary} = :beam_lib.build_module(chunks)
  File.write!(path, binary)
end)

@onno-vos-dev
Copy link
Author

馃槩 Unfortunately it's not small enough. There are three types inside the typespecs: %{type: 65, spec: 15, export_type: 65} and removing all leads to the error I posted earlier whereas removing any 1 of them is not enough... 馃槩

@onno-vos-dev
Copy link
Author

I managed to publish the docs now using below script to remove all types and specs. It's better than leaving the docs unpublished at least.

#!/usr/bin/env mix run
Path.wildcard("#{Mix.Project.app_path()}/ebin/*.beam")
|> Enum.each(fn path ->
  {:ok, _module, chunks} = :beam_lib.all_chunks(String.to_charlist(path))

  # Remove docs from types which contain the examples
  {_, docs} = List.keyfind(chunks, ~c"Docs", 0)
  {:docs_v1, anno, language, fmt, a, b, c} = :erlang.binary_to_term(docs)
  new_c = Enum.filter(c,
                   fn doc ->
                     elem(elem(doc, 0), 0) != :type
                   end)

  new_docs = :erlang.term_to_binary({:docs_v1, anno, language, fmt, a, b, new_c})
  new_chunks1 = List.keyreplace(chunks, ~c"Docs", 0, {~c"Docs", new_docs})

  {_, dbgi} = List.keyfind(new_chunks1, ~c"Dbgi", 0)
  {:debug_info_v1, :elixir_erl, {:elixir_v1, code, typespecs}} = :erlang.binary_to_term(dbgi)

  # remove specs
  new_typespecs = Enum.filter(typespecs, &(elem(&1, 2) != :spec))

  new_dbgi = :erlang.term_to_binary({:debug_info_v1, :elixir_erl, {:elixir_v1, code, new_typespecs}})
  new_chunks = List.keyreplace(new_chunks1, ~c"Dbgi", 0, {~c"Dbgi", new_dbgi})

  {:ok, binary} = :beam_lib.build_module(new_chunks)
  File.write!(path, binary)
end)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants