Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: TBasket not defined #14

Open
mmikhasenko opened this issue Dec 10, 2020 · 15 comments
Open

ERROR: TBasket not defined #14

mmikhasenko opened this issue Dec 10, 2020 · 15 comments
Labels
bug Something isn't working

Comments

@mmikhasenko
Copy link
Member

mmikhasenko commented Dec 10, 2020

The file contains two trees that look the same

f["Scaled"] # works
f["DecayTree"] # ERROR: zlib error: incorrect header check (code: -3)

Here is a hear of the stacktrace:

changemode!(::TranscodingStreams.TranscodingStream{CodecZlib.ZlibDecompressor,IOStream}, ::Symbol) at .julia\packages\TranscodingStreams\MsN8d\src\stream.jl:717
callprocess(::TranscodingStreams.TranscodingStream{CodecZlib.ZlibDecompressor,IOStream}, ::TranscodingStreams.Buffer, ::TranscodingStreams.Buffer) at .julia\packages\TranscodingStreams\MsN8d\src\stream.jl:649
fillbuffer(::TranscodingStreams.TranscodingStream{CodecZlib.ZlibDecompressor,IOStream}; eager::Bool) at .julia\packages\TranscodingStreams\MsN8d\src\stream.jl:577
fillbuffer at .julia\packages\TranscodingStreams\MsN8d\src\stream.jl:564
eof(::TranscodingStreams.TranscodingStream{CodecZlib.ZlibDecompressor,IOStream}) at .julia\packages\TranscodingStreams\MsN8d\src\stream.jl:188
readbytes!(::TranscodingStreams.TranscodingStream{CodecZlib.ZlibDecompressor,IOStream}, ::Array{UInt8,1}, ::Int32) at .julia\packages\TranscodingStreams\MsN8d\src\stream.jl:371
read(::TranscodingStreams.TranscodingStream{CodecZlib.ZlibDecompressor,IOStream}, ::Int32) at .\io.jl:941
datastream(::IOStream, ::UnROOT.TKey32) at .julia\packages\UnROOT\T4A6o\src\types.jl:108
UnROOT.TTree(::IOStream, ::UnROOT.TKey32, ::Dict{Int32,Any}) at .julia\packages\UnROOT\T4A6o\src\bootstrap.jl:679
getindex(::ROOTFile, ::SubString{String}) at .julia\packages\UnROOT\T4A6o\src\root.jl:98
getindex(::ROOTFile, ::String) at .julia\packages\UnROOT\T4A6o\src\root.jl:93
array(::ROOTFile, ::String; raw::Bool) at .julia\packages\UnROOT\T4A6o\src\root.jl:142
array at .julia\packages\UnROOT\T4A6o\src\root.jl:139
binlineshape(::String, ::String, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}) at c:\Users\mikha.julia\dev\OmegacDecay\script\feeddown\lineshape_from_saras_files.jl:14

I can read them with uproot.py

what can it be?

@tamasgal
Copy link
Member

That's weird, it seems that the basket reading is somehow messed up. Is it possible to upload the file somewhere? I'll try to find some time to investigate...

@mmikhasenko
Copy link
Member Author

it is 600Mb, are you willing to download :) ?

@tamasgal
Copy link
Member

Yep sure ;)

@tamasgal tamasgal added the bug Something isn't working label Dec 10, 2020
@mmikhasenko
Copy link
Member Author

mmikhasenko commented Dec 10, 2020

https://cernbox.cern.ch/index.php/s/AzrPeo78d0bPGMA

f = ROOTFile(joinpath(pathto_folder, "Ob2XicKK_tree.root"))
f["DecayTree"] # ERROR: zlib error: incorrect header check (code: -3)
#
f = ROOTFile(joinpath(pathto_folder, "Ob2XicpKpi_tree.root"))
f["DecayTree"] # ERROR: UndefVarError: TBasket not defined

thanks!

@tamasgal
Copy link
Member

Thanks! Do you happen to have also the other file with the zlib-error?

@mmikhasenko
Copy link
Member Author

mmikhasenko commented Dec 10, 2020

(edited) I misread the message first
No, I started with UnROOT today. I will let you know if notice it with other files.

@tamasgal
Copy link
Member

OK I see, so for now I will have a look at this TBasket thing ;)

UnROOT is currently an experimental package and I made it work with files from our own experiment (with custom streamers), so don't expect too much. I also have little time currently, but I will try my best to fix trivial issues!

@tamasgal
Copy link
Member

Now I see the second file "Ob2XicKK_tree.root" in your CERN box, it was not there before ;)

@mmikhasenko
Copy link
Member Author

ah, perhaps, the cernbox took time to upload it

@tamasgal
Copy link
Member

I have no real progress yet, I suspect that it might be something related to the compression library I use. I will do a side-by-side comparison with uproot.

@Moelf
Copy link
Member

Moelf commented Jul 1, 2021

TKey says the uncompressed data is slightly longer than 2^24, reading first 2^22 seems okay. I wonder if it's because some kind of default chunking in Zlib. maybe we just need to find that keyword argument.

@Moelf
Copy link
Member

Moelf commented Jul 1, 2021

I know what's going on. The maximum of uncompressedbytes is 0xffffff, which is smaller than what TKey.fObjlen reports, which probably means we need to automatically do it in multiple shots?

using the compressedbytes and uncompressedbytes defined here I'm able to decompress without crashing. by doing (see the PR below)

Which makes both of those files run into:

TBasket not defined

now


upon further investigation, uproot would unpack TBasket in-place when it runs into one so there's not really a type/struct we need to define. will investigate soon

@Moelf Moelf changed the title ERROR: zlib error: incorrect header check (code: -3) ERROR: TBasket not defined Jul 5, 2021
@Moelf
Copy link
Member

Moelf commented Jul 16, 2021

update:

julia> r["DecayTree"]
position(io) = 627
ERROR: UndefVarError: TBasket not defined
Stacktrace:

julia> ds = UnROOT.datastream(i, tkey)

julia> seek(ds, 627)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=30304363, maxsize=Inf, ptr=628, mark=-1)

julia> UnROOT.unpack(ds, UnROOT.TBasketKey)
UnROOT.TBasketKey
  fNbytes: Int32 0
  fVersion: Int16 2
  fObjlen: Int32 255922
  fDatime: UInt32 0x9c421000
  fKeylen: Int16 78
  fCycle: Int16 0
  fSeekKey: Int64 0
  fSeekPdir: Int64 0
  fClassName: String "TBasket"
  fName: String "nEvent"
  fTitle: String "DecayTree"
  fBufferSize: Int32 256000
  fNevBufSize: Int32 4
  fNevBuf: Int32 42640
  fLast: Int32 170638

it is indeed just a basket in the middle of no where...

@tamasgal
Copy link
Member

tamasgal commented Apr 7, 2022

OK, so it seems that we need to keep track of the TBaskets when they appear out of nowhere (I am thinking about caching just the location and the key). I remember @jpivarski mentioned that these baskets can show up in different places but I don't find any useful information in my notes about it anymore. Maybe Jim can point us to some existing "docs" ;)

@jpivarski
Copy link
Member

It's not the sort of thing that would be documented as such; it's an artifact of how TTrees get written when an error interrupts the write.

The completely different place is embedded within the TBranch—i.e. in the TTree object that contains all TBranches and TLeaves. A TBranch has a single TBasket attribute, which is the uncompressed TBasket that it was filling at the time when the writing process shut down. Under normal conditions, this TBasket is filled until it reaches its maximum capacity, then it's compressed and stored as an independent object, where we normally get TBasket data from.

I can point you to some Uproot code that deals with this specifically. (The ROOT code doesn't call it out as a separate thing because that embedded TBasket is part of the TBranch streamer.)

Ordinarily (when minimal_ttree_metadata=True), Uproot skips the deserialization of embedded TBaskets because it adds a lot of time to processing files with thousands of TBranches:

https://github.com/scikit-hep/uproot4/blob/a0c117900bc3c365a8acda5949217d8135266ad3/src/uproot/models/TBranch.py#L494-L503

(I'm reminded by the code above that it's not just one TBasket: it's a TObjArray of embedded TBaskets.)

As a last stage of creating the TBranch (postprocess), we determine whether the embedded TBaskets would ever be needed by seeing if the total number of entries in normal TBaskets add up to the TTree's number of entries. If the embedded TBaskets are needed, they aren't read yet (we don't know yet if the user is interested in this TBranch), but the _embedded_baskets is set to None, rather than an empty list, as a signal that they're needed (last else clause below).

https://github.com/scikit-hep/uproot4/blob/a0c117900bc3c365a8acda5949217d8135266ad3/src/uproot/behaviors/TBranch.py#L2694-L2733

The embedded TBasket(s) number(s) is taken to be after the last normal TBasket, since this is the last data that ROOT was working on when it died. If that's included in the user's entry range when they read a TBranch, then it will go through the embedded_baskets property:

https://github.com/scikit-hep/uproot4/blob/a0c117900bc3c365a8acda5949217d8135266ad3/src/uproot/behaviors/TBranch.py#L2631-L2652

In addition to always being uncompressed, embedded TBaskets have a slightly different structure from free-standing TBaskets. This deserialization code shows the difference:

https://github.com/scikit-hep/uproot4/blob/a0c117900bc3c365a8acda5949217d8135266ad3/src/uproot/models/TBasket.py#L248-L317


Notes

I think in practice, I've never seen more than one embedded TBasket. How could there be? It's the last one the TTree was writing when it died. Nevertheless, it's a TObjArray of them, so I treat it everywhere like a list. (A list of one element vs a list of zero elements vs None is a useful way to distinguish different states of having already read it, not having any to read, and needing it but not yet having read it. So the list does come in handy.)

Uproot 3 code called this "recovery" and "recovered" TBaskets. My impression when I first encountered this was that it was obviously a corrupted file. But I've since learned that this is an intended feature, how it's supposed to work, so that failures during writing produce files that are nevertheless readable. For that reason, Uproot 4 calls them "embedded."

Uproot's writing does not have this feature: if it fails before writing a standalone (normal) TBasket, then that TBasket is simply unavailable. So Uproot's writing process always makes empty embedded TBaskets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants