simplify row building and avoid costly applymap #62

thunderbug1 · 2022-01-28T10:48:39Z

not sure if there was a special reason not to use the pandas to_json but for me it improved performance significantly and it also makes the code simpler.
... it also handles the nested dataframe case nicely without the need to convert it to a JSON string seperately.

not sure if there was a special reason not to use the pandas to_json but for me it improved performance significantly and it also makes the code simpler. ... it also handles the nested dataframe case nicely without the need to convert it to a JSON string seperately.

cmayoracurzio · 2022-02-23T11:48:18Z

Haven't been able to test this yet, but super interested in this potential performance improvement!

thunderbug1 · 2022-02-23T12:22:46Z

Since I sometimes run into issues with the data packet of the dataframe being too large, I also thought about possible alternatives to JSON as well.
Maybe the arrow IPC format could be used insead of JSON. It would look something like this: https://gist.github.com/camerondavison/cbe43c326f37ab0fe34680baa960634e

However, I haven't made any experiments in that direction yet.

thunderbug1 · 2022-03-23T15:49:22Z

Since I sometimes run into issues with the data packet of the dataframe being too large, I also thought about possible alternatives to JSON as well. Maybe the arrow IPC format could be used insead of JSON. It would look something like this: https://gist.github.com/camerondavison/cbe43c326f37ab0fe34680baa960634e

However, I haven't made any experiments in that direction yet.

The pandas to_json method also comes with a compression option. I think we could probably easily use that to minimize the size of the json tranmitted over the wire, especially since it is record oriented it should have quite a big effect.
We just have to figure out how to unpack it on the typescript side again, but I don't think that that would be a big issue.

stantont · 2022-04-05T21:32:03Z

I just made a similar change myself in my local code in order to make the ag-grid Sparklines work. I was going to suggest the change but I see it has already been suggested.

I have a list in a column and the list is getting turned into a string by the existing code. So the Pandas column

"change": [[1, 2, 3], [4, 5, 6]]

would get turned into a JSON string column instead of a list of numbers, which the Sparklines can't use

"change": "[1, 2, 3]" 
//...
"change": "[4, 5, 6]"

This change made the JSON have lists of numbers and allowed the Sparklines to work in the cell.

"change": [1, 2, 3]
//...
"change": [4, 5, 6]

lukedyer-peak mentioned this pull request May 27, 2022

Feat: increase data serialization speed #85

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simplify row building and avoid costly applymap #62

simplify row building and avoid costly applymap #62

thunderbug1 commented Jan 28, 2022

cmayoracurzio commented Feb 23, 2022

thunderbug1 commented Feb 23, 2022

thunderbug1 commented Mar 23, 2022

stantont commented Apr 5, 2022

simplify row building and avoid costly applymap #62

Are you sure you want to change the base?

simplify row building and avoid costly applymap #62

Conversation

thunderbug1 commented Jan 28, 2022

cmayoracurzio commented Feb 23, 2022

thunderbug1 commented Feb 23, 2022

thunderbug1 commented Mar 23, 2022

stantont commented Apr 5, 2022