bad address error for files larger ~122 MB #5

cre4ture · 2024-05-12T20:20:51Z

Hello,

I'm interested in distributed and redundant data storage systems. Additionally, I'm currently working on writing rust code. That's why I stumbled across tifs. For testing purposes, I setup a TiKV storage cluster on my two NAS at home and mounted a tifs on it.

This is how I was able to do it after finally having the cluster running:

uli@hp13-ulix:~/homebuild/tifs/target/release$ sudo ./tifs --foreground tifs:192.168.178.23:9379 ~/nosync-data/mnt_tifs/
[sudo] password for uli: 
May 12 22:06:50.274 INFO connect to tikv endpoint: "192.168.178.23:20160"

After having it mounted, I tried to copy one large (13GB) file into it.
Sadly this wa not working out as expected.
I tried this multiple times, but I get an error like this each time:

uli@hp13-ulix:~/nosync-data$ cp Avatar\ -\ Aufbruch\ nach\ Pandora.mkv mnt_tifs/vid3.mkv
cp: error writing 'mnt_tifs/vid3.mkv': Bad address

The exact size of the remaining partially copied file at destination varies between ~122MB and ~125MB. This is why I think it's for sure a bug rather than expected filesystem limitation.

Can someone reproduce this?

UPDATE: Further testing showed that the file size "limit" does actually change: Once I could copy ~250MB and directly afterwards only ~12MB.

The text was updated successfully, but these errors were encountered:

cre4ture · 2024-05-14T08:26:48Z

I started to debug the issue myself. As I said I'm keen on doing some rust programming.

...
...
...
[src/fs/tikv_fs.rs:415] ino = 18
[src/fs/tikv_fs.rs:415] fh = 0
[src/fs/tikv_fs.rs:415] offset = 123731968
[src/fs/tikv_fs.rs:415] data.len() = 1048576
[src/fs/tikv_fs.rs:415] _write_flags = 0
[src/fs/tikv_fs.rs:415] _flags = 32769
[src/fs/tikv_fs.rs:415] _lock_owner = None
[src/fs/tikv_fs.rs:419] &len = Ok(
    1048576,
)
[src/fs/tikv_fs.rs:415] ino = 18
[src/fs/tikv_fs.rs:415] fh = 0
[src/fs/tikv_fs.rs:415] offset = 124780544
[src/fs/tikv_fs.rs:415] data.len() = 1048576
[src/fs/tikv_fs.rs:415] _write_flags = 0
[src/fs/tikv_fs.rs:415] _flags = 32769
[src/fs/tikv_fs.rs:415] _lock_owner = None
[src/fs/tikv_fs.rs:419] &len = Err(
    UnknownError(
        "gRPC error: RpcFailure: 4-DEADLINE_EXCEEDED Deadline Exceeded",
    ),
)

It seems to be an issue with the interaction with the server. I will check if there is a possibility to extend the timeout/deadline. Or if a retry might help.

cre4ture · 2024-05-14T09:20:56Z

"Failed to resolve lock" is apparently also an issue

[src/fs/tikv_fs.rs:415] ino = 26
[src/fs/tikv_fs.rs:415] fh = 0
[src/fs/tikv_fs.rs:415] offset = 95420416
[src/fs/tikv_fs.rs:415] data.len() = 1048576
[src/fs/tikv_fs.rs:415] _write_flags = 0
[src/fs/tikv_fs.rs:415] _flags = 32769
[src/fs/tikv_fs.rs:415] _lock_owner = None
[src/fs/transaction.rs:109] "read_fh:" = "read_fh:"
[src/fs/transaction.rs:109] &handler = Ok(
    FileHandler {
        cursor: 0,
    },
)
Unknown Error 2: ResolveLockError, backtrace:
   0: <tifs::fs::error::FsError as core::convert::From<tikv_client_common::errors::Error>>::from
   1: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   2: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   3: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   4: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   8: tokio::runtime::task::core::CoreStage<T>::poll
   9: tokio::runtime::task::harness::Harness<T,S>::poll
  10: tokio::runtime::thread_pool::worker::Context::run_task
  11: tokio::runtime::thread_pool::worker::Context::run
  12: tokio::macros::scoped_tls::ScopedKey<T>::set
  13: tokio::runtime::thread_pool::worker::run
  14: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
  15: tokio::runtime::task::harness::Harness<T,S>::poll
  16: tokio::runtime::blocking::pool::Inner::run
  17: std::sys_common::backtrace::__rust_begin_short_backtrace
  18: core::ops::function::FnOnce::call_once{{vtable.shim}}
  19: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/657bc01888e6297257655585f9c475a0801db6d2/library/alloc/src/boxed.rs:1575:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/657bc01888e6297257655585f9c475a0801db6d2/library/alloc/src/boxed.rs:1575:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/657bc01888e6297257655585f9c475a0801db6d2/library/std/src/sys/unix/thread.rs:71:17
  20: <unknown>
  21: <unknown>

[src/fs/transaction.rs:116] "write_data result:" = "write_data result:"
[src/fs/transaction.rs:116] &result = Err(
    UnknownError(
        "Failed to resolve lock",
    ),
)
[src/fs/tikv_fs.rs:419] &len = Err(
    UnknownError(
        "Failed to resolve lock",
    ),
)

Hexilee added the bug Something isn't working label May 13, 2024

This was referenced May 20, 2024

update rust toolchain and tikv-client version #6

Merged

heavy load: increase timeouts and retries for spinning of transaction #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bad address error for files larger ~122 MB #5

bad address error for files larger ~122 MB #5

cre4ture commented May 12, 2024 •

edited

cre4ture commented May 14, 2024

cre4ture commented May 14, 2024

bad address error for files larger ~122 MB #5

bad address error for files larger ~122 MB #5

Comments

cre4ture commented May 12, 2024 • edited

cre4ture commented May 14, 2024

cre4ture commented May 14, 2024

cre4ture commented May 12, 2024 •

edited