Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow assigning lists and tuples to columns #3438

Open
kevinanewman opened this issue Mar 6, 2023 · 4 comments
Open

Allow assigning lists and tuples to columns #3438

kevinanewman opened this issue Mar 6, 2023 · 4 comments
Labels
new feature Feature requests for new functionality

Comments

@kevinanewman
Copy link

kevinanewman commented Mar 6, 2023

Add feature to be able to assign/update columns from lists or tuples of values if the list or tuple contains the correct number of elements

Pandas, numpy and datable (in Frame constructors) support column creation directly from lists or tuples, but in datatable once the Frame is created it's not possible to create new or update existing columns directly from lists.

fail using list of values

import datatable as dt
import numpy as np

DT1 = dt.Frame({'A': [1, 2, 3]}). # create single-column Frame from list of values SUCCESS
DT1['A'] = [7, 8, 9]  # update existing column from list of values FAIL
DT1['B'] = [4, 5, 6]  # create new column from list of values FAIL

Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "", line 1, in
datatable.exceptions.ValueError: The LHS of the replacement has 1 columns, while the RHS has 3 replacement expressions

fail using tuple of values

DT1['B'] = (4, 5, 6) # FAIL

Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "", line 1, in
datatable.exceptions.ValueError: The LHS of the replacement has 1 columns, while the RHS has 3 replacement expressions

fail using update() and list of values

DT1[:, update(B=[7, 8, 9])] # FAIL

Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevconsole.py", line 364, in runcode
coro = func()
File "", line 1, in
datatable.exceptions.ValueError: The LHS of the replacement has 1 columns, while the RHS has 3 replacement expressions

success using np.array() and range()

DT1['B'] = np.array([4, 5, 6])  # SUCCESS
DT1['C'] = range(3)  # SUCCESS
print(DT1)

DT1
   |     A      B      C
   | int32  int64  int32
-- + -----  -----  -----
 0 |     1      4      0
 1 |     2      5      1
 2 |     3      6      2
[3 rows x 3 columns]
  • What was the expected behavior?
    In case it is not obvious, please tell us what result should your code
    produce.

as above

  • Your environment?
    What is your datatable version, python version, and operating system?

datatable version 1.0.0, python 3.8 (x86), macOS 13.2.1 (22D68)
and also
datatable version 1.1.0a0+build.1677978121.kevinnewman, python 3.9 (M1 ARM), macOS 13.2.1 (22D68)

@oleksiyskononenko
Copy link
Contributor

I don't see we ever claimed it is possible to assign a list of python primitives to a frame column: https://datatable.readthedocs.io/en/latest/api/frame/__setitem__.html

But if you do

DT1['B'] = dt.Frame([4, 5, 6])

this will actually work.

@kevinanewman
Copy link
Author

Well, that's true, but it would be welcome and very handy, especially given the stated goal of "Interoperability with pandas / numpy / pyarrow / pure python: the users should have the ability to convert to another data-processing framework with ease." Lists and tuples are of course foundational in Python and I haven't used pyarrow, but pandas and numpy can both load columns/arrays, etc directly from lists or tuples.

Should I close this issue and perhaps create a feature request instead?

Thank you for your response,

Kevin

@oleksiyskononenko
Copy link
Contributor

@kevinanewman Yes, I don’t really know why this is not supported yet, may be there was a reason. Could also be it was just missed, because we do support assigning scalars or numpy arrays/ranges (that looks undocumented).

No need to close this issue, just rename it and update the text, so that it is clear this is FR and not a bug.

@kevinanewman
Copy link
Author

Ok, I updated the title and revised the original post to make it consistent with a feature request.

Thank you!

@oleksiyskononenko oleksiyskononenko changed the title [bug] Can't assign a list of values to a column - is this expected behavior? Allow assigning lists and tuples to columns Mar 8, 2023
@oleksiyskononenko oleksiyskononenko added the new feature Feature requests for new functionality label Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature Feature requests for new functionality
Projects
None yet
Development

No branches or pull requests

2 participants