Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad address error for files larger ~122 MB #5

Open
cre4ture opened this issue May 12, 2024 · 2 comments
Open

bad address error for files larger ~122 MB #5

cre4ture opened this issue May 12, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@cre4ture
Copy link
Contributor

cre4ture commented May 12, 2024

Hello,

I'm interested in distributed and redundant data storage systems. Additionally, I'm currently working on writing rust code. That's why I stumbled across tifs. For testing purposes, I setup a TiKV storage cluster on my two NAS at home and mounted a tifs on it.

This is how I was able to do it after finally having the cluster running:

uli@hp13-ulix:~/homebuild/tifs/target/release$ sudo ./tifs --foreground tifs:192.168.178.23:9379 ~/nosync-data/mnt_tifs/
[sudo] password for uli: 
May 12 22:06:50.274 INFO connect to tikv endpoint: "192.168.178.23:20160"

After having it mounted, I tried to copy one large (13GB) file into it.
Sadly this wa not working out as expected.
I tried this multiple times, but I get an error like this each time:

uli@hp13-ulix:~/nosync-data$ cp Avatar\ -\ Aufbruch\ nach\ Pandora.mkv mnt_tifs/vid3.mkv
cp: error writing 'mnt_tifs/vid3.mkv': Bad address

The exact size of the remaining partially copied file at destination varies between ~122MB and ~125MB. This is why I think it's for sure a bug rather than expected filesystem limitation.

Can someone reproduce this?

UPDATE: Further testing showed that the file size "limit" does actually change: Once I could copy ~250MB and directly afterwards only ~12MB.

@Hexilee Hexilee added the bug Something isn't working label May 13, 2024
@cre4ture
Copy link
Contributor Author

I started to debug the issue myself. As I said I'm keen on doing some rust programming.

...
...
...
[src/fs/tikv_fs.rs:415] ino = 18
[src/fs/tikv_fs.rs:415] fh = 0
[src/fs/tikv_fs.rs:415] offset = 123731968
[src/fs/tikv_fs.rs:415] data.len() = 1048576
[src/fs/tikv_fs.rs:415] _write_flags = 0
[src/fs/tikv_fs.rs:415] _flags = 32769
[src/fs/tikv_fs.rs:415] _lock_owner = None
[src/fs/tikv_fs.rs:419] &len = Ok(
    1048576,
)
[src/fs/tikv_fs.rs:415] ino = 18
[src/fs/tikv_fs.rs:415] fh = 0
[src/fs/tikv_fs.rs:415] offset = 124780544
[src/fs/tikv_fs.rs:415] data.len() = 1048576
[src/fs/tikv_fs.rs:415] _write_flags = 0
[src/fs/tikv_fs.rs:415] _flags = 32769
[src/fs/tikv_fs.rs:415] _lock_owner = None
[src/fs/tikv_fs.rs:419] &len = Err(
    UnknownError(
        "gRPC error: RpcFailure: 4-DEADLINE_EXCEEDED Deadline Exceeded",
    ),
)

It seems to be an issue with the interaction with the server. I will check if there is a possibility to extend the timeout/deadline. Or if a retry might help.

@cre4ture
Copy link
Contributor Author

"Failed to resolve lock" is apparently also an issue

[src/fs/tikv_fs.rs:415] ino = 26
[src/fs/tikv_fs.rs:415] fh = 0
[src/fs/tikv_fs.rs:415] offset = 95420416
[src/fs/tikv_fs.rs:415] data.len() = 1048576
[src/fs/tikv_fs.rs:415] _write_flags = 0
[src/fs/tikv_fs.rs:415] _flags = 32769
[src/fs/tikv_fs.rs:415] _lock_owner = None
[src/fs/transaction.rs:109] "read_fh:" = "read_fh:"
[src/fs/transaction.rs:109] &handler = Ok(
    FileHandler {
        cursor: 0,
    },
)
Unknown Error 2: ResolveLockError, backtrace:
   0: <tifs::fs::error::FsError as core::convert::From<tikv_client_common::errors::Error>>::from
   1: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   2: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   3: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   4: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   8: tokio::runtime::task::core::CoreStage<T>::poll
   9: tokio::runtime::task::harness::Harness<T,S>::poll
  10: tokio::runtime::thread_pool::worker::Context::run_task
  11: tokio::runtime::thread_pool::worker::Context::run
  12: tokio::macros::scoped_tls::ScopedKey<T>::set
  13: tokio::runtime::thread_pool::worker::run
  14: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
  15: tokio::runtime::task::harness::Harness<T,S>::poll
  16: tokio::runtime::blocking::pool::Inner::run
  17: std::sys_common::backtrace::__rust_begin_short_backtrace
  18: core::ops::function::FnOnce::call_once{{vtable.shim}}
  19: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/657bc01888e6297257655585f9c475a0801db6d2/library/alloc/src/boxed.rs:1575:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/657bc01888e6297257655585f9c475a0801db6d2/library/alloc/src/boxed.rs:1575:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/657bc01888e6297257655585f9c475a0801db6d2/library/std/src/sys/unix/thread.rs:71:17
  20: <unknown>
  21: <unknown>

[src/fs/transaction.rs:116] "write_data result:" = "write_data result:"
[src/fs/transaction.rs:116] &result = Err(
    UnknownError(
        "Failed to resolve lock",
    ),
)
[src/fs/tikv_fs.rs:419] &len = Err(
    UnknownError(
        "Failed to resolve lock",
    ),
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants