New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write_ipc overwrites value in previously loaded dataframe #16119
Comments
Using loaded_data = pl.read_ipc("test.arrow", memory_map=False) works |
With this modification, the process can be crashed, by printing the dataframe after write_ipc: import random
import polars as pl
def create_testdata(size: int) -> pl.DataFrame:
return pl.DataFrame(
[
{
"value": random.random(),
}
for _ in range(size)
]
)
create_testdata(500).write_ipc("test.arrow")
loaded_data = pl.read_ipc("test.arrow")
print("before", loaded_data)
create_testdata(4).write_ipc("test.arrow")
print("after", loaded_data) output:
And if you reduce it from 500 to 20, you get all sorts of funny values in the output:
|
You shouldn't write where you read. We should see if we can keep the file handle around to give you a proper error. |
Even if we hold a file handle, you can open a file in write mode, so if you want to write to the same file you should turn off memory mapping. I think we should add some docs about this. |
It is a consequence of memory mapping. Don't memory map if you want to write to the same file. |
Yeah, I figured that much.
Sounds good
Even better
If one is handling cached data in an event-based application there could be easily a race-condition, in which the cache is updated after the cache has been read by another service. So either some file-locking using semaphores is engineered in a cache-service that handles the file, or memory map should be turned off. But I don't feel like "not writing where you read" can be a general rule for things. Anyhow, thanks for looking into this! |
Checks
(I have searched for issues mentioning
write_ipc
)Reproducible example
Log output
Issue description
A previously loaded dataframe from a given file is changed, once the original file is overwritten
Expected behavior
For the loaded dataframe to be immutable
Installed versions
The text was updated successfully, but these errors were encountered: