New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When adding a Series to a DataFrame with a different index, the Series gets turned into all NaNs #450
Comments
I wonder if it's related to this issue I found also this morning: >>> df = pandas.DataFrame(index=[1,2,3,4])
>>> df["test"] = pandas.Series(["B", "fdf", "344", np.nan])
>>> df["test2"] = ["B", "fdf", "344", np.nan]
>>> df test test2
1 fdf B
2 344 fdf
3 NaN 344
4 NaN nan Looks like some kind of off-by-one error to me. |
Further digging leads to the call to >>> data
0 B
1 fdf
2 344
3 NaN
>>> df.index = ["A", "B", "C", "D"]
>>> data.reindex(df.index).values
array([nan, nan, nan, nan], dtype=object) |
Even more digging leads to >>> data.index.reindex(df.index)
(Index([A, B, C, D], dtype=object), array([-1, -1, -1, -1], dtype=int32)) These -1 are then translated to NaNs. |
Updated bug title with more correct description. |
The Series is given an implicit 0, ..., N-1 index when you don't supply one-- so this is exactly the behavior I would expect. If
and it conforms the series exactly to the index of
would work fine in your example |
In that case perhaps it should be documented somewhere if it isn't already. In the mean time I'll adjust my own code as you suggested, thanks. |
http://pandas.sourceforge.net/dsintro.html#column-selection-addition-deletion
|
What is the idea behind the fact that when inserting a Series that does not have the same index as the DataFrame, it will be conformed to the DataFrame’s index? When creating a DataFrame from series, the resulting index covers all individual series indexes. So why is this idea not used when df['new_column'] = series? In [256]: df = pandas.DataFrame({'A': pandas.Series(['foo', 'bar'], index=['a', 'b']),
.....: 'B': pandas.Series([10, 20], index=['b', 'c'])})
In [257]: df
Out[257]:
A B
a foo NaN
b bar 10.000
c NaN 20.000
In [258]: df['C'] = pandas.Series(range(3), index=['a', 'c', 'd'])
In [259]: df
Out[259]:
A B C
a foo NaN 0.000
b bar 10.000 NaN
c NaN 20.000 1.000 In the example above i would expect a row 'd' in the DataFrame. |
Well, I think the basic idea is that DataFrame is a "fixed length dict-like container of Series". When you construct a DataFrame with a dict of Series without an explicit index, there is no obvious index other than the union of them all. I can see the argument for implicitly extending the index, but there are tradeoffs either way |
Case in point:
This also happens with float objects and the like. I am not sure in what the trigger is.
The text was updated successfully, but these errors were encountered: