Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing fails because of a stat when doing a cp #66

Open
calebwin opened this issue Oct 21, 2021 · 0 comments
Open

Writing fails because of a stat when doing a cp #66

calebwin opened this issue Oct 21, 2021 · 0 comments
Labels
banyan-data-frames-jl Concerning BanyanDataFrames.jl bug Something isn't working

Comments

@calebwin
Copy link
Contributor

Some logs:

Sending EVALUATION_END for 2021-10-19-1345239a639451e0afd719059d54319a7d78c2
Finished sending EVALUATION_END message in 1 chunks
In CopyFrom with loc_name=Value and loc_params=Dict("value" => Colon())
In CopyFrom with loc_name=Value and loc_params=Dict("value" => :sepal_length)
In CopyFrom with loc_name=Value and loc_params=Dict("value" => Colon())
In CopyFrom with loc_name=Value and loc_params=Dict("value" => :sepal_length)
In CopyFrom with loc_name=Client and loc_params=Dict("value_id" => "22")
In CopyFrom with loc_name=Client and loc_params=Dict("value_id" => "22")
In CopyFrom Client
received = (4,)
In CopyFrom Client
Reading a block with efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9 from Disk with batch_idx=1
received = (4,)
Reading a block with efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9 from Disk with batch_idx=1
Still in reading a block
Still _still_ in reading a block
Still in reading a block
Still _still_ in reading a block
Still _still_  *still* in reading a block
Still _still_  *still* in reading a block
loc_params = Dict{String, Any}("nrows" => 4, "files" => Any[Dict{String, Any}("nrows" => 1, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part2_nrows=1.arrow"), Dict{String, Any}("nrows" => 3, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part4_nrows=3.arrow")], "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9")
loc_params = Dict{String, Any}("nrows" => 4, "files" => Any[Dict{String, Any}("nrows" => 1, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part2_nrows=1.arrow"), Dict{String, Any}("nrows" => 3, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part4_nrows=3.arrow")], "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9")
Considering Dict{String, Any}("nrows" => 1, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part2_nrows=1.arrow")
Reading from efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part2_nrows=1.arrow on batch 1
Considering Dict{String, Any}("nrows" => 1, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part2_nrows=1.arrow")
Considering Dict{String, Any}("nrows" => 3, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part4_nrows=3.arrow")
Reading from efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part4_nrows=3.arrow on batch 1
Still _still_  *STILL* in reading a block but after having 1 dfs
In CopyFrom with loc_name=Disk and loc_params=Dict("path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_8")
Reading a block with efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_8 from Disk with batch_idx=1
Considering Dict{String, Any}("nrows" => 3, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part4_nrows=3.arrow")
Still _still_  *STILL* in reading a block but after having 1 dfs
In CopyFrom with loc_name=Disk and loc_params=Dict("path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_8")
Reading a block with efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_8 from Disk with batch_idx=1
Still in reading a block
Still in reading a block
At end of CopyFrom
In CopyFrom with loc_name=Value and loc_params=Dict("value" => (+))
At end of CopyFrom
In CopyFrom with loc_name=Value and loc_params=Dict("value" => (+))
In CopyFrom with loc_name=Value and loc_params=Dict("value" => Colon())
In CopyFrom with loc_name=Value and loc_params=Dict("value" => Colon())
In CopyFrom with loc_name=Value and loc_params=Dict("value" => Base.Iterators.Pairs())
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Value and loc_params=Dict("value" => Base.Iterators.Pairs())
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In getindexIn CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In getindexIn CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In getindexStart write
In Write on worker 2 on batch 1
Writing data frame with 2 batches on batch 1
Before second barrier in write
In getindexStart write
In Write on worker 1 on batch 1
Writing data frame with 2 batches on batch 1
Before second barrier in write
After second barrier in write
Going to write to efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9_tmp/part1_nrows=1.arrow
After second barrier in write
Going to write to efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9_tmp/part3_nrows=1.arrow
Wrote to efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9_tmp/part3_nrows=1.arrow
Wrote to efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9_tmp/part1_nrows=1.arrow
Finished writing data frame
Finished writing data frame
In CopyTo with loc_name=Disk and loc_params=Dict("path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_8") and worker_idx=1 and batch_idx=1
In CopyTo with loc_name=Disk and loc_params=Dict("path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_8") and worker_idx=2 and batch_idx=1
Start write
In Write on worker 1 on batch 1
In CopyTo with loc_name=Disk and loc_params=Dict("path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_22") and worker_idx=1 and batch_idx=1
In CopyTo with loc_name=Disk and loc_params=Dict("path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_22") and worker_idx=2 and batch_idx=1
Start write
In Write on worker 1 on batch 1
In CopyFrom with loc_name=Value and loc_params=Dict("value" => Colon())
In CopyFrom with loc_name=Value and loc_params=Dict("value" => Colon())
In CopyFrom with loc_name=Value and loc_params=Dict("value" => :sepal_length)
In CopyFrom with loc_name=Value and loc_params=Dict("value" => :sepal_length)
In CopyFrom with loc_name=Client and loc_params=Dict("value_id" => "22")
In CopyFrom with loc_name=Client and loc_params=Dict("value_id" => "22")
In CopyFrom Client
received = (4,)
In CopyFrom Client
Reading a block with efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9 from Disk with batch_idx=2
received = (4,)
Still in reading a block
Reading a block with efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9 from Disk with batch_idx=2
Still _still_ in reading a block
Still _still_  *still* in reading a block
loc_params = Dict{String, Any}("nrows" => 4, "files" => Any[Dict{String, Any}("nrows" => 1, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part2_nrows=1.arrow"), Dict{String, Any}("nrows" => 3, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part4_nrows=3.arrow")], "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9")
Still in reading a block
Considering Dict{String, Any}("nrows" => 1, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part2_nrows=1.arrow")
Considering Dict{String, Any}("nrows" => 3, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part4_nrows=3.arrow")
Reading from efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part4_nrows=3.arrow on batch 2
Still _still_ in reading a block
Still _still_  *still* in reading a block
Still _still_  *STILL* in reading a block but after having 1 dfs
In CopyFrom with loc_name=Disk and loc_params=Dict("path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_8")
Reading a block with efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_8 from Disk with batch_idx=1
loc_params = Dict{String, Any}("nrows" => 4, "files" => Any[Dict{String, Any}("nrows" => 1, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part2_nrows=1.arrow"), Dict{String, Any}("nrows" => 3, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part4_nrows=3.arrow")], "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9")
Considering Dict{String, Any}("nrows" => 1, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part2_nrows=1.arrow")
Considering Dict{String, Any}("nrows" => 3, "path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part4_nrows=3.arrow")
Reading from efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part4_nrows=3.arrow on batch 2
Still in reading a block
Still _still_  *STILL* in reading a block but after having 1 dfs
In CopyFrom with loc_name=Disk and loc_params=Dict("path" => "job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_8")
Reading a block with efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_8 from Disk with batch_idx=1
At end of CopyFrom
In CopyFrom with loc_name=Value and loc_params=Dict("value" => (+))
Still in reading a block
At end of CopyFrom
In CopyFrom with loc_name=Value and loc_params=Dict("value" => (+))
In CopyFrom with loc_name=Value and loc_params=Dict("value" => Colon())
In CopyFrom with loc_name=Value and loc_params=Dict("value" => Colon())
In CopyFrom with loc_name=Value and loc_params=Dict("value" => Base.Iterators.Pairs())
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Value and loc_params=Dict("value" => Base.Iterators.Pairs())
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In getindexIn CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In getindexIn CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In getindexStart write
In Write on worker 2 on batch 2
Writing data frame with 2 batches on batch 2
Before first barrier in write
In CopyFrom with loc_name=Memory and loc_params=Dict{Any, Any}()
In getindexStart write
In Write on worker 1 on batch 2
Writing data frame with 2 batches on batch 2
Before first barrier in write
After first barrier in write
Before second barrier in write
After first barrier in write
Before second barrier in write
After second barrier in write
After second barrier in write
Going to write to efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9_tmp/part4_nrows=1.arrow
Going to write to efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9_tmp/part2_nrows=1.arrow
Wrote to efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9_tmp/part2_nrows=1.arrow
Wrote to efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9_tmp/part4_nrows=1.arrow
Created efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9
Created efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9
Copied efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9_tmp/part1_nrows=1.arrow to efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part1_nrows=1.arrow
Copied efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9_tmp/part2_nrows=1.arrow to efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part2_nrows=1.arrow
ERROR: LoadError: IOError: stat("efs/job_2021-10-19-1345239a639451e0afd719059d54319a7d78c2_val_9/part3_nrows=1.arrow"): Unknown system error -116 (Unknown system error -116)
Stacktrace:
 [1] uv_error
   @ ./libuv.jl:97 [inlined]
 [2] stat(path::String)
   @ Base.Filesystem ./stat.jl:69
 [3] ispath
   @ ./stat.jl:311 [inlined]
 [4] checkfor_mv_cp_cptree(src::String, dst::String, txt::String; force::Bool)
   @ Base.Filesystem ./file.jl:298
 [5] cp(src::String, dst::String; force::Bool, follow_symlinks::Bool)
   @ Base.Filesystem ./file.jl:349
 [6] cp
   @ ./file.jl:349 [inlined]
 [7] Write(src::Nothing, part::DataFrames.DataFrame, params::Dict{String, Any}, batch_idx::Int64, nbatches::Int64, comm::MPI.Comm, loc_name::String, loc_params::Dict{String, String})
   @ Banyan ~/f695258daeb2b50796e2916f17970bbc178c1c3da666c1ef8ab4c26398dde990/banyan-julia/Banyan/src/pfs.jl:622
 [8] exec_code(banyan_data::Dict{Any, Any})
   @ Main ./string:146
 [9] top-level scope
   @ ~/executor.jl:100
in expression starting at /home/ec2-user/executor.jl:69
--------------------------------------------------------------------------
An MPI communication peer process has unexpectedly disconnected.  This
usually indicates a failure in the peer process (e.g., a crash or
otherwise exiting without calling MPI_FINALIZE first).

Although this local MPI process will likely now behave unpredictably
(it may even hang or crash), the root cause of this problem is the
failure of the peer -- that is what you need to investigate.  For
example, there may be a core file that you can examine.  More
generally: such peer hangups are frequently caused by application bugs
or other external events.

  Local host: compute-dy-t3large-1
  Local PID:  22905
  Peer host:  compute-dy-t3large-2
--------------------------------------------------------------------------
srun: error: compute-dy-t3large-2: task 1: Exited with exit code 1
@calebwin calebwin added banyan-data-frames-jl Concerning BanyanDataFrames.jl bug Something isn't working labels Oct 21, 2021
@calebwin calebwin changed the title Writing fails because of a stat Writing fails because of a stat when doing a cp Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
banyan-data-frames-jl Concerning BanyanDataFrames.jl bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant