New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pandas dataframe with list as values #407
Comments
Apologies for the slow response, I've been busy with classes lately. This is most likely a bug, I'll look into this. |
Update: This is definitely a bug, and it'll be hard to fix. Pandas doesn't have a string dtype, only an object dtype, so it's hard for jsonpickle's pandas extension to differentiate between non-integer types, such as a list and a string. I'm working on fixing the encoding for that, but it'll be difficult as I need to integrate type() checking too, in addition to dtype checking. |
Update 2: I just realized this'll be harder to fix than I previously thought, since the lists can contain more than one dtype. For example, one could have a dataframe like so: |
Apparently this is basically the same issue as #358. |
I wanted to use jsonpickle for a bioinformatics project of mine, but this behavior is a killer; I regularly have dataframes with lists of floats. Could an implementation like this, which stores lists as lists of tuples of (value, type), be useful? It roughly doubles the size of the representation of lists as values in pandas Series, but wouldn't touch elsewhere, I think. # a naive implementation
def list_encode(list_input: List[Any]) -> str:
nested_list = [f"({each}, {type(each).__name__})" for each in list_input]
return json.dumps(nested_list)
def decode_list(str_input: str) -> List[Any]:
obj = json.loads(str_input)
final_list: List[Any] = []
for each in obj:
value, str_type = each.split(",")
str_type = str_type.strip()
if str_type == "int":
final_list.append(int(value))
elif str_type == "float":
final_list.append(float(value))
# Add datetime, etc; or recursively call for nested lists
else:
final_list.append(value)
return final_list Then, an example: # Example Encoding
sample = [1, 1.0, "Sue"]
list_encode(sample)
# Output
'["1, int", "1.0, float", "Sue, str"]'
# Example Decoding
str_sample = '["1, int", "1.0, float", "Sue, str"]'
decode_list(str_sample)
# Output
[1, 1.0, 'Sue'] So, you'd add a new check in the if isinstance(value, list):
value = list_encode(value) |
Oh, if that works I'd be happy to merge it! I'll try and test it over the next few days, thanks so much for giving some example code! |
Thanks, @Theelx ! Looking back at it this naïve implementation I wrote, we might need to use a special character in the f-string-- as I wrote it, if any of the string values in the list contain a comma, it will error out. Given that |
Hm, breaking on a special character isn't a good idea for library code. I'll try to change the behavior so it works for everything. |
Can you please assign me for the issue #407? |
No need to explicitly assign you to this issue - just start working on it and open a PR (maybe as a draft at first). |
Consider this code:
if you run this, you get:
i.e. a
list[str]
has been changed to astr
Is there any way to avoid this?
The text was updated successfully, but these errors were encountered: