Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLOG: The "Polars vs pandas" difference nobody is talking about #843

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

MarcoGorelli
Copy link
Contributor

Text styling

  • The blog is written with plain language (where relevant).
  • If there are headers, they use the proper header tags in order to do so (with only one level-one header).
  • All links describe where they link to (for example, check the Quansight labs website).
  • Any kind of styling that the author uses (for example, bold for emphasis) is consistent throughout the blog.

Non-text contents

  • Blog post featured image is in PNG or JPEG format, not SVG.
  • All content is represented as text (for example, images need alt text and videos need captions or descriptive transcripts).
  • If there are emojis, there are not more than three in a row.
  • Don't use flashing gifs or videos.
  • If it were to be read as plain text, the blog still makes sense and no information is missing.

Copy link

vercel bot commented Apr 26, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
labs ❌ Failed (Inspect) May 10, 2024 1:40pm

@MarcoGorelli MarcoGorelli changed the title WIP: The Polars innovation nobody is talking about The "Polars vs pandas" difference nobody is talking about May 10, 2024
@MarcoGorelli MarcoGorelli changed the title The "Polars vs pandas" difference nobody is talking about BLOG: The "Polars vs pandas" difference nobody is talking about May 26, 2024
@MarcoGorelli MarcoGorelli marked this pull request as draft May 28, 2024 16:47
@@ -0,0 +1,167 @@
---
title: 'The "Polars vs pandas" difference nobody is talking about'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: 'The "Polars vs pandas" difference nobody is talking about'
title: 'The Polars vs pandas difference nobody is talking about'


# The "Polars vs pandas" difference nobody is talking about

I attended PyData Berlin 2024 this week, and it was a blast! I met so many colleagues, collaborators, and friends.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this week"?

category: [PyData ecosystem]
---

# The "Polars vs pandas" difference nobody is talking about
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The "Polars vs pandas" difference nobody is talking about
# The Polars vs pandas difference nobody is talking about

Comment on lines +15 to +19
- lazy execution;
- Rust;
- consistent handling of null values;
- multithreading;
- query optimisation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- lazy execution;
- Rust;
- consistent handling of null values;
- multithreading;
- query optimisation.
- lazy execution
- Rust
- consistent handling of null values
- multithreading
- query optimisation


If we want a single scalar value per group, we can use a reduction ('mean', 'sum', 'std', ...):
```python
df.group_by('a').agg(pl.sum('b'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm already a little lost. Is this Polars or pandas?

This isn't too bad, but it involves doing two group-bys, and so is at least twice as slow as it could
be.

Finally, can rely on `GroupBy` caching its groups, in-place mutation of the original dataframe, and the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Finally, can rely on `GroupBy` caching its groups, in-place mutation of the original dataframe, and the
Finally, we can rely on `GroupBy` caching its groups, in-place mutation of the original dataframe, and the


is a common refrain among Polars users.

There may be a more general lesson here: if you have the courage to do things differently, you may be rewarded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line doesn't really connect for me with anything else in the blog post, seems to come out of nowhere.

There may be a more general lesson here: if you have the courage to do things differently, you may be rewarded.

If you'd like to learn about how to use Polars effectively, or how to solve problems in your organisation
using Polars, Quansight is here to help - you can get in touch [here](https://quansight.com/about-us/#bookacallform).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's some way we could or should set this off from the rest of the blog post, something like the following (not saying we should do it exactly like this):

And now, a small message from our fellow coworkers at Quansight Consulting... If you'd like to learn about how to use Polars effectively, or how to solve problems in your organisation
using Polars, please get in touch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants