Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Arrow 15 support #345

Open
rajsite opened this issue Feb 21, 2024 · 9 comments · May be fixed by #347
Open

Apache Arrow 15 support #345

rajsite opened this issue Feb 21, 2024 · 9 comments · May be fixed by #347

Comments

@rajsite
Copy link

rajsite commented Feb 21, 2024

Could dependencies be updated to support apache-arrow 15?

@dioptre
Copy link

dioptre commented Mar 26, 2024

This breaks compatibilty with arrow 15 (using table / toIPC breaks)

@domoritz
Copy link
Member

Which methods specifically? Arrow 14 and 15 use the same IPC format and the APIs are very similar.

domoritz added a commit to domoritz/arquero that referenced this issue Mar 27, 2024
@domoritz domoritz linked a pull request Mar 27, 2024 that will close this issue
@domoritz
Copy link
Member

I see what you mean. The binary format somehow changes: #347.

@dioptre
Copy link

dioptre commented Mar 27, 2024

Haha nice to see you over here. Small world.

@domoritz
Copy link
Member

I sent a pull request to change the default nullable for typed arrays back to not nullable (and also fixed a bug in null count): apache/arrow#40852. I think we could either update to arrow 15 now with my fix to the tests and then later revert, or wait for arrow 16. Either way, arquero should work either version.

@rajsite
Copy link
Author

rajsite commented Mar 27, 2024

I could make a separate issue for it, but it would also be helpful if arrow was a peer dependency instead of a direct dependency (helps make sure an app only ends up with a single copy of arrow shared across dependencies).

  "peerDependencies": {
    "apache-arrow": "^15.0.0"
  },

If arrow is good at backwards compatibility (although maybe the issues you ran into are reason enough to not assume that and need manual update across major versions) could consider a broader range specifier.

  "peerDependencies": {
    "apache-arrow": ">=15.0.0"
  },

That would be similar to the pattern of libraries like @geoarrow/geoarrow-js.

@domoritz
Copy link
Member

domoritz commented Mar 27, 2024

The arrow ipc is backwards compatible. The arrow library is best effort backwards compatible. But yes, that's a separate issue for arquero. Why do we need a peer dependency? Couldn't we specify a broader range for the dependency so that npm can dedupe?

@rajsite
Copy link
Author

rajsite commented Mar 27, 2024

It's handy for libraries that share constructed objects between each other. Example:

LibA {"dependencies": "arrow@^14"}
LibB {"dependencies": "arrow@^15"}
App depends on LibA and LibB
npm install: no warnings etc
result in:
node_modules/
   LibA
      node_modules/arrow@14.0
   LibB
      node_modules/arrow@15.0

Becomes an issue when in the app:

import {createArrow} from 'LibA';
import {useArrow} from 'LibB';

const myArrow = createArrow(); // constructed with v14 internally
useArrow(myArrow); // the object will be operated on by a library leveraging v15 internally

Versus with peerDependencies:

LibA {"peerDependencies": "arrow@^14"}
LibB {"peerDependencies": "arrow@^15"}
App depends on LibA and LibB
npm install: gives an error saying that peer dependencies cannot be satisfied among my used libraries

The app must either:

  1. Ask nicely that the libs update their support arrow range (which hopefully comes with testing from the libraries that the new major version doesn't break existing behavior)
  2. Manually define overrides in the App package.json to assume the responsibility of forcing the libs to use a shared version of arrow 14 or 15 or manually specify that they each use a separate version.

@rajsite
Copy link
Author

rajsite commented Mar 27, 2024

Couldn't we specify a broader range for the dependency so that npm can dedupe?

Critical difference is, is it okay if npm sometimes doesn't dedupe or is it potentially an issue if multiple copies of the library exist in the app at different versions. For arrow as a dependency I think it's safer for end users to indicate that an app should really only resolve to a single shared copy of arrow and peerDependencies let you specify that.

Particularly because arquero does have APIs for sharing constructed instances of the shared arrow library (if it was completely internal then would make sense as just a dependency).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants