Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate filesystem URLs at planning time #251

Open
mwylde opened this issue Aug 15, 2023 · 2 comments
Open

Validate filesystem URLs at planning time #251

mwylde opened this issue Aug 15, 2023 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers ux

Comments

@mwylde
Copy link
Member

mwylde commented Aug 15, 2023

For filesystem sources created via SQL, we do not validate them as part of the SQL planning process. This causes panics at runtime when the source is instantiated on the worker:

2023-08-15T03:28:44.132894Z ERROR arroyo_server_common: panicked at 'called `Result::unwrap()` 
on an `Err` value: RelativeUrlWithoutBase', /opt/arroyo/src/arroyo-worker/src/connectors/filesystem/mod.rs:119:65 
panic.file="/opt/arroyo/src/arroyo-worker/src/connectors/filesystem/mod.rs" panic.line=119 panic.column=65
@mwylde mwylde added enhancement New feature or request good first issue Good for newcomers ux labels Aug 15, 2023
@hilmialf
Copy link

hilmialf commented Oct 10, 2023

Hi @mwylde
I also would like to give this one a shot. Could you guide me the details how to start?
Perhaps could you tell me where the SQL planning is done?
IIUC, the planning is delegated to datafusion?

@rohitrastogi
Copy link

rohitrastogi commented Apr 24, 2024

@mwylde I'm not able to reproduce this specific panic (RelativeUrlWithoutBase) exactly, but I did notice a few related issues when trying to reproduce it using ghcr.io/arroyosystems/arroyo-single:0.10-dev:

  1. Pipelines/previews succeed even if the path for filesystem source created via SQL does not exist. I'd expect there to be some sort of failure if the path does not exist.
  2. Path "file:///" for filesystem source created with SQL panics during query execution with ERROR arroyo_server_common: panicked at crates/arroyo-connectors/src/filesystem/source.rs:69:17: could not get next path: Generic LocalFileSystem error: Unable to walk dir: File system loop found: /sys/class/vtconsole/vtcon0/subsystem points to an ancestor /sys/class/vtconsole panic.file="crates/arroyo-connectors/src/filesystem/source.rs" panic.line=69 panic.column=17
  1. An S3 path without valid S3 creds for the filesystem source created with SQL panics during query execution with: panicked at crates/arroyo-connectors/src/filesystem/source.rs:69:17: could not get next path: Generic s3 error: Couldn't find AWS credentials in environment, credentials file, or IAM role. panic.file="crates/arroyo-connectors/src/filesystem/source.rs" panic.line=69 panic.colum
  2. Creating filesystem sources in the UI always succeeds, even if the inputted path is malformed. See:
    tokio::task::spawn(async move {
    let message = TestSourceMessage {
    error: false,
    done: true,
    message: "Successfully validated connection".to_string(),
    };
    tx.send(message).await.unwrap();
    });
  3. Kafka sources created via SQL panic if the topic does not exist. panicked at crates/arroyo-worker/src/lib.rs:622:14: called Result::unwrap()on anErr value: SendError { .. } panic.file="crates/arroyo-worker/src/lib.rs" panic.line=622 panic.column=14

What do you think about running the same connection test() logic that is run when creating connectors in the UI when planning sources during the scheduling phase? If each connector properly implements the test() logic, it should solve all of the problems above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers ux
Projects
None yet
Development

No branches or pull requests

3 participants