Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support library source file archives #16348

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

WalterBright
Copy link
Member

Adds the compiler switch -srcar so the compiler can read a source archive file rather than look for individual files.

The code to write and read the source archives is in file_manager.d

cc @LightBender

It works on my system. But I don't have a test case here because I haven't spent the time learning how to do scripts in the test suite. The script would look like:

dmd -srcarc=path/phobos/std
dmd hello.d
./hello.d

where path is wherever Phobos is on the test machine.

P.S. This is not a D language feature.

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @WalterBright!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#16348"

Copy link
Contributor

@thewilsonator thewilsonator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to bog this down with bikeshedding but I would suggest -sar= (matches the file extension), -src-ar= (separates out two reasonably well used abbreviations) or even -source-archive= (LDC will probably name this --output-sar)

This looks similar to .tar, why is that format not sufficient and requires wheel reinvention? (what about zipping?)

Comment on lines 126 to 128
{
if (fileManager.useSourceArchive)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be just else if

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 144. else if doesn't quite work here.


To create a .sar file, such as one for Phobos' std:

dmd -srcar=/home/duser/dmd/src/phobos/std
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it not needed to specify all the source files that will end up in the archive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It recursively searches the directory for .d, .di, .c and .i files only. Manually enumerating the list would be a maintenance chore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those should be specified as part of a release build any way, also dmd -sar **/*.d is a thing

Comment on lines +51 to +55
To use the std.sar file, nothing needs to be changed in the user's build system.
dmd will automatically prefer using any .sar file it finds. To disable using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not simply pass the .sar on the command line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's certainly possible as an additional behavior. One of the reasons for the existing behavior is so that one would not have to change one's makefile to make use of the .sar files.

I tried to make it as easy to use as possible. It's likely that after some experience with it, we will want to change the user interface to it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makefiles are supposed to take this kind of strain, though. Especially given that I want to know when to rebuild it (so it either needs to be explicit or mentioned in a depfile)

@WalterBright
Copy link
Member Author

I would suggest -sar= (matches the file extension), -src-ar= (separates out two reasonably well used abbreviations) or even -source-archive= (LDC will probably name this --output-sar)

I'm fine with any of those.

This looks similar to .tar, why is that format not sufficient and requires wheel reinvention?

The same reason .tar has been reinvented many times. It started out as a Tape ARchival format. That means data can only be appended to the tar file, i.e. no table of contents at the beginning. Maybe tar has been enhanced with a ToC, but I don't care enough about it.

(what about zipping?)

Zip is much more complex, with lots of unnecessary features I don't want to bother with.

There's also the issue that the source files have to end with four 0 bytes. If an existing format doesn't allow for that, that means dmd would have to copy the data into another buffer then append the 0 bytes, which would be an unacceptable slowdown.

The .sar format is stupid simple, and only requires a trivial amount of code to implement. Compare it to the compiler's libmach.d, libomf.d, etc., for overly complex formats (and unportable ones, at that).

With our own format, we're not going to be bothered by changing versions, zip bombs, OPB (Other Peoples' Bugs), etc.

@dkorpel
Copy link
Contributor

dkorpel commented Mar 31, 2024

What problem does this solve? And can it really not be solved without inventing yet another file format?

@Herringway
Copy link
Contributor

What problem does this solve? And can it really not be solved without inventing yet another file format?

I was wondering the same thing. I see documentation (though it is a little vague...) on the format and on how to use it, but the closest thing I see to motivation is that it doesn't provide any performance improvements.

I also notice that there is no endianness defined, nor any conversion to/from native endianness being performed. That is important if we should expect these archives to be portable between different architectures.

@WalterBright
Copy link
Member Author

I also notice that there is no endianness defined, nor any conversion to/from native endianness being performed. That is important if we should expect these archives to be portable between different architectures.

Big endian machines are obsolete. But, yeah, the spec should say the format is little endian.

@WalterBright
Copy link
Member Author

What problem does this solve?

I added a "Rationale" section to the changelog.

And can it really not be solved without inventing yet another file format?

Yes, it can be done using an existing file format. Implementing a clone of the zippers will take a lot more time and code, though, because zip files have a lot of other junk in them we don't need. Zip archivers are also vulnerable to "zip bombs". Using open source archivers can also have problems with their licenses. tar files don't have a table of contents at the beginning. The object code library formats are not portable between machines. And, a unique extension means it won't be confused with other archivers.

P.S. I wanted for decades to implement this, but never got around to it.


A standalone archiver program can be easily created from the implementation in DMD.

## Rationale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rationale screams of 'a solution looking for problems', instead of actual problems looking for a solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not try it out first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of trying it out before pushing it into a release.

## Rationale

1. All the source files in a project or library can be represented as a single file,
making it easy to deal with.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could simply be done by allowing multiple module x; declarations in a single source file, so you can simply find -name '*.d' | xargs cat to create a single file. A binary file is more annoying to edit or use version control on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could simply be done by allowing multiple module x; declarations in a single source file

It's not simple at all. The entire setup is designed with a 1:1 correspondence between module name and file name.

you can simply find -name '*.d' | xargs cat to create a single file.

Then the whole file has to be read to find the modules in it.

A binary file is more annoying to edit or use version control on.

True. I'm not suggesting doing that. If you're actively developing code, you'd be editing the individual files.

making it easy to deal with.

2. To compile all the source files at once with DMD, the command line can get
extremely long, and certainly unwieldy. With .sar files, you may not even need
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved by dmd -i or dub

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -i system is a bit wonky, with its exclusive and inclusive patterns, and various rules. The .sar is much simpler. If it's on the command line, its to be compiled in. If it's on the -I command, it's not compiled in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How dare you slander the holy -i option! dmd -i works like a charm, exclusion and inclusion patterns are optional. Please file a bugzilla issue if you find a problem with it. We can't fix every 'wonky' feature by adding a new feature. The new feature will enviably accumulate its own 'wonkyness'.

In std.algorithm, for example, the individual algorithms could be placed into multiple
files, since they don't refer to each other. This could also help the people who
don't want the automatic "friend" status of declarations within a single module.
.sar files can make much more granular modules more attractive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.sar is only a distribution file. When developing code, you're not hex editing the .sar file, you're dealing with the separate small files all the same, so it's not more convenient or attractive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never edit the files in /usr/include or any utilities I download and compile from source. I only edit the project I am actively working on. I doubt many users edit dub projects other than the maintainers.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I constantly jump in the editor to Phobos functions, to check unittests or the implementation. The same for source code of dependant libraries. I don't see how having plain source code hidden in a custom archive is more convenient while developing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I constantly jump in the editor to Phobos functions, to check unittests or the implementation. The same for source code of dependant libraries. I don't see how having plain source code hidden in a custom archive is more convenient while developing.

IDE's are quite capable of allowing you to view source files in an archive. They do this for zip all the time. Especially in Java land.

If yours doesn't, then yes you will need to extract it out.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a zip, it's a custom format, not supported by any editor out there. So of course we need to extract it every time.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course this needs to be implemented in the plugin level. And this is a very simple format that even my own engine has it implemented too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wouldn't overthink it. IDE's have the facilities to support it.

We would need to make sure the big D plugins are updated to use it before we ship anything with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dustmite is another one that comes to mind. @CyberShadow


4. A directory in a project can be anything. But multiple .sar files in a project means
that's where the code is. Multiple versions of the project can exist in the same directory
by using .sar files with different names.
Copy link
Contributor

@dkorpel dkorpel Mar 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no problem being described here. Multiple versions can also exist in the same directory with import paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just think how much handier libphobos.a is than a directory full of .o files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter is kinder to incremental compilation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no problem being described here. Multiple versions can also exist in the same directory with import paths.

Yes. This is inventing a solution to a problem that was solved 40 years ago by make, or any other build system.

Also, was solved by dmd -i.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter is kinder to incremental compilation

I don't see how.

5. Experiments with small programs (hello world) show a negligible change in compile
speed. Much larger programs may show significant compile speed increases with .sar
files, simply because there are fewer file operations. For slow file systems, such as
SD cards, or network file systems, the speedup could be substantial.
Copy link
Contributor

@dkorpel dkorpel Mar 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this seems to be what it all actually comes done to: performance. But can we please let profiling guide us on what to optimize, instead of making speculative optimizations? This really begs for a test of what time this actually saves in real world scenarios.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back when I developed Warp, I did a great deal of profiling. Caching file lookups (doing stat on files) was a huge win. Unfortunately, I don't have a large D project to try this on. But one reason the:

srcar=on|off

is there is to make it easy to discern the difference. It's not possible to test it without building it first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warp was a different tool for a different language and from a different decade. This PR looks like it's in a state that it can be tested now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file lookup for source files is the same in D as it is for C .h files. But yeah, a different decade.

@WalterBright
Copy link
Member Author

@thewilsonator I took your advice and changed the switch to -sar. Thanks!

@dkorpel
Copy link
Contributor

dkorpel commented Apr 1, 2024

So I tried this out by copying Phobos' source tree, renaming std to mystd so it wouldn't accidentally import pre-compiled std, and compiling the "Print hex dump" example from the homepage with import std replaced by import mystd. I created a 11.3 Mb mystd.sar with dmd -sar=mystd.

The first thing I noticed was that a simple dmd -sar=on app.d resulted in linker errors, because mystd wasn't being compiled in. I tried listing mystd.sar on the command line, but:

Error: unrecognized file extension sar

So the problem of long command lines listing many source files isn't solved by sar files. dmd -i to the rescue 🙂
I compared the performance between -sar=on and -sar=off with hyperfine:

Benchmark 1: dmd -i -sar=off app.d
  Time (mean ± σ):      1.267 s ±  0.015 s    [User: 0.971 s, System: 0.290 s]
  Range (min … max):    1.249 s …  1.301 s    10 runs
 
Benchmark 2: dmd -i -sar=on app.d
  Time (mean ± σ):      1.269 s ±  0.025 s    [User: 0.970 s, System: 0.295 s]
  Range (min … max):    1.241 s …  1.319 s    10 runs
 
Summary
  dmd -i -sar=off app.d ran
    1.00 ± 0.02 times faster than dmd -i -sar=on app.d

I copied the files over to a MicroSD card and repeated the test:

Benchmark 1: dmd -i -sar=off app.d
  Time (mean ± σ):      1.254 s ±  0.006 s    [User: 0.967 s, System: 0.284 s]
  Range (min … max):    1.243 s …  1.263 s    10 runs
 
Benchmark 2: dmd -i -sar=on app.d
  Time (mean ± σ):      1.270 s ±  0.027 s    [User: 0.958 s, System: 0.301 s]
  Range (min … max):    1.250 s …  1.321 s    10 runs
 
Summary
  dmd -i -sar=off app.d ran
    1.01 ± 0.02 times faster than dmds -i -sar=on app.d

So I can't find performance benefits.

Comment on lines 253 to 337
if (FileName.exists(si) == 1)
if (isFileExisting(si))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could changes like this be introduced in a separate refactoring PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a local nested function that encapsulates some repeated code.

Comment on lines +538 to +608
File.read(fileNames[i][0 .. strlen(fileNames[i])], fb); // read file into fb
content.importOffset = cast(uint)ar.length;
content.importLength = cast(uint)fb.length; // don't include terminating 0
ar.write(fb.peekSlice()); // append file contents to ar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing only the file contents ad-verbatim seems to be missing a trick. Why not serialize the Lexer pass? Or the generated .di of the module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message handling that prints out source lines would need the source code. That need is the original reason why the file_manager caching system was created.

content.importOffset = cast(uint)ar.length;
content.importLength = cast(uint)fb.length; // don't include terminating 0
ar.write(fb.peekSlice()); // append file contents to ar
ar.write32(0); // append 4 bytes of 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accounting for 2 trailing nulls for sentinel and 2 for lexer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's accounting for a null termination if it is a UTF32 source file.

@@ -23,6 +28,7 @@ import dmd.location;

enum package_d = "package." ~ mars_ext;
enum package_di = "package." ~ hdr_ext;
enum ext_sar = "sar"; // file extension for source archive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this begin with .d?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's being proposed as a bespoke D package format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.dar files are already Disk ARchive files. .sar files are a variety of formats.

Possible conflation with other formats is why the header has a magic number and numerous checks for goodness.

@WalterBright
Copy link
Member Author

So the problem of long command lines listing many source files isn't solved by sar files.

I know. I had that idea after I submitted this PR. That part isn't implemented yet. This PR only covers imports covered by -I.

@WalterBright
Copy link
Member Author

So I can't find performance benefits.

I can't either with a small application. On a microSD card, I expect the operating system caches files, so repeated runs will be reading the files out of memory, not the card. If there was a difference, I'd expect it only on a cold system.

@ibuclaw
Copy link
Member

ibuclaw commented Apr 1, 2024

I also notice that there is no endianness defined, nor any conversion to/from native endianness being performed. That is important if we should expect these archives to be portable between different architectures.

Big endian machines are obsolete. But, yeah, the spec should say the format is little endian.

Big Endian is alive and well.

@Herringway
Copy link
Contributor

I also notice that there is no endianness defined, nor any conversion to/from native endianness being performed. That is important if we should expect these archives to be portable between different architectures.

Big endian machines are obsolete. But, yeah, the spec should say the format is little endian.

Big Endian is alive and well.

Indeed. It's been pronounced dead quite a few times, but it always manages to rise from its grave...

Comment on lines 96 to 107
None of these make it a slam-dunk, after all, no other compiler does this that I'm
aware of. Even so, some surprising uses can be expected of .sar files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Go, Rust, and C++ (cxx98 templates at a stretch, but also cxx20 modules) could be named as compilers with some sort of packaging format for speed/caching.

Copy link
Member

@ibuclaw ibuclaw Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • C++-98's -frepo produced an .rpo for all exported template in a given build, imported for linking.
  • C++-20's modules generate .gcm for compiled module interfaces (can contain multiple modules within IIUC) for importing into compilation on import.
  • Go exports types and definitions to a .gox file for importing packages on import.
  • Rust crates export metadata and optionally encoded MIR to a .rmeta (or .rox for gccrs) imported on use.
  • I've just remembered C has precompiled headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the point, none of those are full source archives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yet, all of them (except frepo) can/are used to replace source files for speeding up compilation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that vein, you could include the precompiled headers generated and used by my Symantec C++ compiler. They did substantially improve compile times. But it's not a system I wished to propagate to D. C++ made a mistake going with their binary import system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of them (except frepo) can/are used to replace source files for speeding up compilation.

This doesn't replace source files, which is quite an important difference e.g. lets say file system accesses are just so catastrophically expensive that this is worthwhile, it'd still be parsing them and running semantic analysis every single time (long story short this is where the actual pain is - and more importantly the memory consumption. I'd take a 10% longer build if it didn't use 70GB of ram to do a naive build of our code)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't a JAR file exactly this (and more) ?

@LightBender
Copy link
Contributor

LightBender commented Apr 1, 2024

Personally, I see any performance increases as a far-fringe benefit. Modern NVMe drives have largely solved the seek-time problem, so the biggest improvements are decreased sector wastage and improved sequential reads (which NVMe still benefits from), however sequential read only improves if this is not implemented as a memory-mapped file and instead loads everything into RAM once.

What everybody is missing here is how this solves a major problem in D library distributions. Right now DUB "packages" are git repos. While it is a simple solution, it comes with some severe penalties and limitations. Consider the following examples:

  1. Download and storage of unnecessary files in each repo. We don't need packages to include test code or the other git-flotsam that appears in repos.
  2. Multiple versions of the same repo compound the storage problem. Also, compounds the sector bloat mentioned above.
  3. Difficulty with BLOBs (such as library files), to the point where it's better to simply not bother with BLOBs at all.
  4. Git repo libraries cannot be broken up into smaller packages. Believe it or not, this is what ends up killing the whole concept of "Phobos on DUB". Distributing Phobos via DUB right now would require breaking Phobos into something like 50 different repo's. With the maintenance nightmare that it incurs. Imagine having to stage 10 different PR's in perfect order across the different packages that a change hits. It's madness.

With this change we can finally make all of that go away.

To my mind this change is aimed entirely at improving our tooling story. Code-D/Serve-D is obviously the most immediate beneficiary of these archive's. But the tooling possibilities extend far beyond that and are worth serious discussions that far exceed the scope of a PR comment.

This PR has my unequivocal endorsement, and I'd be happy to push the merge button on it.

@LightBender
Copy link
Contributor

I can't either with a small application. On a microSD card, I expect the operating system caches files, so repeated runs will be reading the files out of memory, not the card. If there was a difference, I'd expect it only on a cold system.

Can confirm that Windows caches file data. Currently about 1/4th of my total RAM is being used by Windows as a Data+Code cache. You can see it in the Task Manager starting with Windows 10.

@Herringway
Copy link
Contributor

Personally, I see any performance increases as a far-fringe benefit. Modern NVMe drives have largely solved the seek-time problem, so the biggest improvements are decreased sector wastage and improved sequential reads (which NVMe still benefits from), however sequential read only improves if this is not implemented as a memory-mapped file and instead loads everything into RAM once.

What everybody is missing here is how this solves a major problem in D library distributions. Right now DUB "packages" are git repos. While it is a simple solution, it comes with some severe penalties and limitations. Consider the following examples:

  1. Download and storage of unnecessary files in each repo. We don't need packages to include test code or the other git-flotsam that appears in repos.
  2. Multiple versions of the same repo compound the storage problem. Also, compounds the sector bloat mentioned above.
  3. Difficulty with BLOBs (such as library files), to the point where it's better to simply not bother with BLOBs at all.
  4. Git repo libraries cannot be broken up into smaller packages. Believe it or not, this is what ends up killing the whole concept of "Phobos on DUB". Distributing Phobos via DUB right now would require breaking Phobos into something like 50 different repo's. With the maintenance nightmare that it incurs. Imagine having to stage 10 different PR's in perfect order across the different packages that a change hits. It's madness.

With this change we can finally make all of that go away.

To my mind this change is aimed entirely at improving our tooling story. Code-D/Serve-D is obviously the most immediate beneficiary of these archive's. But the tooling possibilities extend far beyond that and are worth serious discussions that far exceed the scope of a PR comment.

This PR has my unequivocal endorsement, and I'd be happy to push the merge button on it.

Last I checked, dub used .zip files to distribute packages. But even if it didn't, what's stopping dub from solving these problems that inventing a new format is going to eliminate?

Why doesn't dub just eliminate the "git-flotsam" on its own?

@Herringway
Copy link
Contributor

C:\Tools\D\ldc2-1.34.0-beta1-windows-multilib\bin\..\import.sar\std\conv.d(2292,5): Error: found `static` when expecting `;` following statement `d` on line C:\Tools\D\ldc2-1.34.0-beta1-windows-multilib\bin\..\import\std\conv.d(2291,1)

I assume you meant

C:\Tools\D\ldc2-1.34.0-beta1-windows-multilib\bin\..\import.sar\std\conv.d(2292,5): Error: found `static` when expecting `;` following statement `d` on line C:\Tools\D\ldc2-1.34.0-beta1-windows-multilib\bin\..\import.sar\std\conv.d(2291,1)

? (you didn't adjust the second path)

@rikkimax
Copy link
Contributor

rikkimax commented Apr 5, 2024

Yes both should be modified.

@CyberShadow
Copy link
Member

This is not a language change, but I still think it should go through a DIP. It is a big change pushed onto the community by DFL.

The same should go for any high-impact change in any component owned by the DFL, whether it's in the language or not.

In case it's not clear - the feature, and especially if DFL to start shipping such .sar files, it would be pushing work onto the community to add support for them in:

  • text editors and IDEs
  • debuggers
  • other compilers
  • linters
  • build tools
  • code manipulation tools (DustMite) etc.

@rikkimax
Copy link
Contributor

rikkimax commented Apr 7, 2024

A few more: Windows shell, fuse (posix), 7zip.

Using something more standard like zip doesn't seem like such a bad idea against such a list ;)

Note: I already sent Adam Wilson a tutorial on this prior to CyberShadow's comment for Windows shell.

Oh and it gets worse, using the original spec, it will need to be entirely reengineered before something is released using it. Once it becomes obvious that moving to a block-based design with cli arg support would be the bare minimum for it to scale long-term.

I am not arguing for or against this, but I am arguing against not doing it right and causing massive issues down the line (like needing a reengineering in multiple different codebases that we may not control).

@WalterBright
Copy link
Member Author

I went through it again and fixed it so it doesn't do unnecessary file stats. It gives a small but consistent edge in compile performance on helloworld. But not enough to be a selling point in and of itself.

I've talked with @LightBender about changing Phobos from a small number of files with collected routines in them, to a much more granular approach, which means several times as many files. It may show a more significant compile speed advantage with large numbers of files. (Phobos doesn't actually have many files in it.)

@WalterBright
Copy link
Member Author

@CyberShadow text editors and IDEs, debuggers, other compilers, linters, build tools, code manipulation tools (DustMite) etc.

All that is actually needed for those is to be able to extract a .sar file into its constituents. I haven't bothered yet to write a utility to do that, but the code to implement such a utility can just be extracted from the compiler source. It would be convenient for those tools to build it in, but it isn't necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet