New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

BLOG: The "Polars vs pandas" difference nobody is talking about #843

Draft

MarcoGorelli wants to merge 2 commits into Quansight:develop from MarcoGorelli:groupby

+167 −0

Contributor

MarcoGorelli commented Apr 26, 2024

Text styling

The blog is written with plain language (where relevant).
If there are headers, they use the proper header tags in order to do so (with only one level-one header).
All links describe where they link to (for example, check the Quansight labs website).
Any kind of styling that the author uses (for example, bold for emphasis) is consistent throughout the blog.

Non-text contents

Blog post featured image is in PNG or JPEG format, not SVG.
All content is represented as text (for example, images need alt text and videos need captions or descriptive transcripts).
If there are emojis, there are not more than three in a row.
Don't use flashing gifs or videos.
If it were to be read as plain text, the blog still makes sense and no information is missing.

vercel bot commented Apr 26, 2024 •

edited

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
labs	❌ Failed (Inspect)			May 10, 2024 1:40pm


          wip: polars innovation group-by

4732c8e

MarcoGorelli force-pushed the groupby branch from e13ce26 to 4732c8e Compare

April 26, 2024 09:53

vercel bot had a problem deploying to Preview

April 26, 2024 09:58

Failure

MarcoGorelli marked this pull request as ready for review

May 10, 2024 13:35

MarcoGorelli requested review from pavithraes, trallard, rgommers and gabalafou as code owners

May 10, 2024 13:35

MarcoGorelli force-pushed the groupby branch from 4599204 to 50ec768 Compare

May 10, 2024 13:35


          improve structure

ff7d28d

MarcoGorelli force-pushed the groupby branch from 50ec768 to ff7d28d Compare

May 10, 2024 13:36

MarcoGorelli changed the title ~~WIP: The Polars innovation nobody is talking about~~ The "Polars vs pandas" difference nobody is talking about

vercel bot had a problem deploying to Preview

May 10, 2024 13:40

Failure

MarcoGorelli changed the title ~~The "Polars vs pandas" difference nobody is talking about~~ BLOG: The "Polars vs pandas" difference nobody is talking about

MarcoGorelli marked this pull request as draft

May 28, 2024 16:47

gabalafou reviewed

View reviewed changes

apps/labs/posts/dataframe-group-by.md

		@@ -0,0 +1,167 @@
		---
		title: 'The "Polars vs pandas" difference nobody is talking about'

Contributor

gabalafou May 28, 2024

Suggested change

      
            title: 'The "Polars vs pandas" difference nobody is talking about'
          
            title: 'The Polars vs pandas difference nobody is talking about'

apps/labs/posts/dataframe-group-by.md


		# The "Polars vs pandas" difference nobody is talking about

		I attended PyData Berlin 2024 this week, and it was a blast! I met so many colleagues, collaborators, and friends.

Contributor

gabalafou May 28, 2024

"this week"?

apps/labs/posts/dataframe-group-by.md

+              category: [PyData ecosystem]
+              ---
+              # The "Polars vs pandas" difference nobody is talking about

Contributor

gabalafou May 28, 2024

Suggested change

      
            # The "Polars vs pandas" difference nobody is talking about
          
            # The Polars vs pandas difference nobody is talking about

apps/labs/posts/dataframe-group-by.md

Comment on lines +15 to +19

+              - lazy execution;
+              - Rust;
+              - consistent handling of null values;
+              - multithreading;
+              - query optimisation.

Contributor

gabalafou May 28, 2024

Suggested change

      
            - lazy execution;
          
            - Rust;
          
            - consistent handling of null values;
          
            - multithreading;
          
            - query optimisation.
          
            - lazy execution
          
            - Rust
          
            - consistent handling of null values
          
            - multithreading
          
            - query optimisation

apps/labs/posts/dataframe-group-by.md

+              If we want a single scalar value per group, we can use a reduction ('mean', 'sum', 'std', ...):
+              ```python
+              df.group_by('a').agg(pl.sum('b'))

Contributor

gabalafou May 28, 2024

I'm already a little lost. Is this Polars or pandas?

apps/labs/posts/dataframe-group-by.md

+              This isn't too bad, but it involves doing two group-bys, and so is at least twice as slow as it could
+              be.
+              Finally, can rely on `GroupBy` caching its groups, in-place mutation of the original dataframe, and the

Contributor

gabalafou May 28, 2024

Suggested change

      
            Finally, can rely on `GroupBy` caching its groups, in-place mutation of the original dataframe, and the
          
            Finally, we can rely on `GroupBy` caching its groups, in-place mutation of the original dataframe, and the

apps/labs/posts/dataframe-group-by.md


		is a common refrain among Polars users.

		There may be a more general lesson here: if you have the courage to do things differently, you may be rewarded.

Contributor

gabalafou May 28, 2024

This line doesn't really connect for me with anything else in the blog post, seems to come out of nowhere.

apps/labs/posts/dataframe-group-by.md

+              There may be a more general lesson here: if you have the courage to do things differently, you may be rewarded.
+              If you'd like to learn about how to use Polars effectively, or how to solve problems in your organisation
+              using Polars, Quansight is here to help - you can get in touch [here](https://quansight.com/about-us/#bookacallform).

Contributor

gabalafou May 28, 2024

I wonder if there's some way we could or should set this off from the rest of the blog post, something like the following (not saying we should do it exactly like this):

And now, a small message from our fellow coworkers at Quansight Consulting... If you'd like to learn about how to use Polars effectively, or how to solve problems in your organisation
using Polars, please get in touch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

gabalafou gabalafou left review comments

pavithraes Awaiting requested review from pavithraes pavithraes is a code owner

trallard Awaiting requested review from trallard trallard is a code owner

rgommers Awaiting requested review from rgommers rgommers is a code owner

At least 1 approving review is required to merge this pull request.