New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the most efficient way to iterate over Pandas's DataFrame row by row? #10334
Comments
In python, iterating over the rows is going to be (a lot) slower than doing vectorized operations. The types are being converted in your second method because that's how numpy arrays (which is what If you describe your problem with a minimal working example, we might be able to help you vectorize it. You may also have luck on StackOverflow with the pandas tag. |
Basically, I want to do the following:
I don't own the Another similar example is in machine learning, where you may have a model that has predict API at row level only. |
Still a bit too vague to be helpful. But if |
I don't see how it can be clearer. Yes,
|
I think
|
Thanks @shoyer! That's what I need. |
Iterating through pandas dataFrame objects is generally slow. Pandas Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a List Comprehensions , vectorized solution or DataFrame.apply() method for iterating through DataFrame. Pandas DataFrame loop using list comprehension
Pandas DataFrame loop using DataFrame.apply()
|
@linehammer no need to keep posting links on closed issues to what I presume is your website |
I have tried the function
df.iterrows()
but its performance is horrible. Which is not surprising given thatiterrows()
returns aSeries
with full schema and meta data, not just the values (which all that I need).The second method that I have tried is
for row in df.values
, which is significantly faster. However, I have recently realized thatdf.values
is not the internal data storage of the DataFrame, becausedf.values
converts alldtypes
to a commondtype
. For example, one of my columns has dtypeint64
but the dtype ofdf.values
is allfloat64
. So I suspect thatdf.values
actually creates another copy of the internal data.Also, another requirement is that the row iteration must return a list of values that preserve the original
dtype
of the data.The text was updated successfully, but these errors were encountered: