Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up ripgrep for compilation on non-unix, non-windows platforms #2787

Merged
merged 7 commits into from Apr 23, 2024

Conversation

holzschu
Copy link
Contributor

The code of ripgrep compiles on almost any kind of architecture, including WebAssembly, except in two tiny places related to hyperlinks:

  • when it searches for the local hostname in hostname.rs,
  • when it does path canonalization in hyperlink.rs.

In both cases, there is a path with #[cfg(unix)] and a path with #[cfg(windows)]. WebAssembly being neither unix nor windows, compilation fails. This PR adds a third path:

  • for localhost, it returns "localhost" instead of calling gethostname(), for all environments that are neither unix nor windows.
  • for path canonalization, it uses std::os::wasi::ffi::OsStrExt instead of std::os::unix::ffi::OsStrExt, which is not available; this part is specific for wasm32-wasi. It could be possible to deactivate path canonalization entirely for architectures that are neither unix nor windows, but I aimed for the smallest working change.

With these two changes, it's possible to compile ripgrep with cargo build --target wasm32-wasi (and, more importantly, it works in wasm32-wasi environments).

@ltrzesniewski
Copy link
Contributor

It could be possible to deactivate path canonalization entirely for architectures that are neither unix nor windows

std::path::absolute should be stabilized soon, and I suppose it will replace canonicalization in ripgrep, which will both simplify and optimize this part of the code.

@@ -853,6 +853,57 @@ impl HyperlinkPath {
}
HyperlinkPath(result)
}
#[cfg(target_os = "wasi")]
pub(crate) fn from_path(original_path: &Path) -> Option<HyperlinkPath> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WASI spec seems to ban absolute paths and the stdlib doesn't implement canonicalize. I don't know how black-and-white this is (especially in the long term) but right now this might as well be stubbed out with None.

When running on top of Windows I imagine you'd want to return a Windows path here, even though WASI is more Unix-like internally.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how std::path::absolute will work here. It seems to depend on what std::env::current_dir will return, and that does seem to have an implementation on WASI: https://github.com/rust-lang/rust/blob/eff958c59e8c07ba0515e164b825c9001b242294/library/std/src/sys/pal/wasi/os.rs#L71-L95

But I agree, for now, this should just always return None and emit a DEBUG log that hyperlinks aren't supported on WASI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT it works like this:

  • WASI syscalls take a directory descriptor and a path relative to that directory descriptor.
  • The WASI runtime gives the WASM binary a list of directory descriptors plus their actual original paths (preopens). So the runtime grants selective access. It could easily pretend the filesystem is smaller than it really is, chroot-style, but it doesn't have to: it can grant a /home/user descriptor with the /home/user path.
  • wasi-libc takes absolute paths and resolves them to a directory descriptor + relative path based on the longest matching prefix from the preopens, if any. (This is inspired by libpreopen.)
  • wasi-libc also emulates a working directory in WASM-space.

So it's not as bad as I thought. Thanks to the emulation absolute paths should for the most part just work as long as access to that part of the filesystem is granted, particularly on Unix. (Except for missing syscalls I guess.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. So std::env::current_dir() works, and thus std::path::absolute might in turn work here as well.

Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea here seems good, but as I mentioned in the comments, it seems like hyperlinks and WASI are fundamentally incompatible.

Also, can we add CI support WASI so that we don't regress here? You should be able to lift this nearly directly: https://github.com/BurntSushi/memchr/blob/20ef11fa92d8b393735f10906c436a8ce6e792a4/.github/workflows/ci.yml#L133-L163

If tests don't work, that's okay. But at least having a cargo build --verbose in CI would be nice.

@@ -853,6 +853,57 @@ impl HyperlinkPath {
}
HyperlinkPath(result)
}
#[cfg(target_os = "wasi")]
pub(crate) fn from_path(original_path: &Path) -> Option<HyperlinkPath> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how std::path::absolute will work here. It seems to depend on what std::env::current_dir will return, and that does seem to have an implementation on WASI: https://github.com/rust-lang/rust/blob/eff958c59e8c07ba0515e164b825c9001b242294/library/std/src/sys/pal/wasi/os.rs#L71-L95

But I agree, for now, this should just always return None and emit a DEBUG log that hyperlinks aren't supported on WASI.

"hostname could not be found on unsupported platform",
)
// backup solution for systems that do not have gethostname():
Ok(OsString::from("localhost"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of giving wrong answers like this. And it seems like WASI can't really support hyperlinks anyway, due to the ban on absolute paths. So I think this change should probably be reverted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I was trying to fix here, and that's also a question I have about setting up CI, is that the current version of the code causes a compilation error:

Compiling grep-cli v0.1.10 (/Users/holzschu/src/Xcode_iPad/ripgrep_clone/crates/cli)
error[E0308]: mismatched types
  --> crates/cli/src/hostname.rs:28:9
   |
16 |   pub fn hostname() -> io::Result<OsString> {
   |                        -------------------- expected `Result<OsString, std::io::Error>` because of return type
...
28 | /         io::Error::new(
29 | |             io::ErrorKind::Other,
30 | |             "hostname could not be found on unsupported platform",
31 | |         )
   | |_________^ expected `Result<OsString, Error>`, found `Error`
   |
   = note: expected enum `Result<OsString, std::io::Error>`
            found struct `std::io::Error`
help: try wrapping the expression in `Err`
   |
28 ~         Err(io::Error::new(
29 |             io::ErrorKind::Other,
30 |             "hostname could not be found on unsupported platform",
31 ~         ))
   |

For more information about this error, try `rustc --explain E0308`.

So if the goal of setting up CI is to get the same result after the PR as before, that's not going to happen.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. I think you just want Err(io::Error::new(...)), as suggested in the error message you're showing. This particular code path is probably entirely untested in CI at present, and thus the fact that it is wrong now and has always been wrong was never noticed.

My main objection was to silently using a potentially incorrect value. I'm open to doing that in the future to get hyperlinks working in WASI after we've considered the possible alternatives. But since std::fs::canonicalize isn't going to work on WASI right now anyway, we should just avoid changing the semantic intent of this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done the changes for hostname.rs. But then, for symmetry, I think I should have a branch #[cfg(not(any(windows, unix)))] instead of #[cfg(target_os = "wasi")] in hyperlink.rs. But what should from_path() return in that case?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a DEBUG log message. Mentioned here: #2787 (comment)

@holzschu holzschu changed the title Set up ripgrep for compilation on WebAssembly (wasm32-wasi) Set up ripgrep for compilation on non-unix, non-windows platforms Apr 23, 2024
@holzschu
Copy link
Contributor Author

I've changed the files and the title of the PR.
Now, the goal is for ripgrep to compile without error on platforms that are neither unix nor windows (that includes WebAssembly, but there might be other platforms as well.

It will emit an error in hostname.rs and a debug message in hyperlinks.rs.

- name: Add wasm32-wasi target
run: rustup target add wasm32-wasi
- name: Basic build
run: cargo build --verbose
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did running tests via wasmtime fail? If so this is fine, but if they worked, then we should just include them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, running tests via wasmtime does fail (but compilation works):

error[E0433]: failed to resolve: could not find `unix` in `os`
   --> tests/util.rs:197:22
    |
197 |         use std::os::unix::fs::symlink;
    |                      ^^^^ could not find `unix` in `os`
    |

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah fair enough. We can save that for another day.

// For other platforms (not windows, not unix), return None and log a debug message.
#[cfg(not(any(windows, unix)))]
pub(crate) fn from_path(original_path: &Path) -> Option<HyperlinkPath> {
log::debug!("Hyperlinks are not supported on this platform");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with other log messages, could you please use "hyperlinks" instead of "Hyperlinks"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done.

@@ -853,6 +853,13 @@ impl HyperlinkPath {
}
HyperlinkPath(result)
}

// For other platforms (not windows, not unix), return None and log a debug message.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be using ///.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And please move this to be directly below the other from_path constructors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes have been incorporated.

Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@holzschu
Copy link
Contributor Author

Thank you for your patience and pedagogy!

@BurntSushi BurntSushi merged commit bb8601b into BurntSushi:master Apr 23, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants