Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetchdictarray rebased #1270

Closed
wants to merge 14 commits into from

Conversation

ndmlny-qs
Copy link
Contributor

Rebased against master. This is the PR to allow numpy arrays.

@ndmlny-qs
Copy link
Contributor Author

The errors are coming from when we try to build pyodbc with numpy support. Since setuptools does not have access to a locally installed version of numpy during the build, it errors stating that no module named numpy can be found.

I've looked into how not to make numpy a required dependency in the pyproject.toml file for this feature, and I've come up empty. I'm look at how Pillow implements functional extensions using a custom build backend, reference is here https://github.com/python-pillow/Pillow/pull/7171/files#diff-14b169061c0b607511e83931a16fa2059738b12503c447588f81dd4775c59633. This is also discussed in the documentation for setuptools here https://setuptools.pypa.io/en/latest/build_meta.html. I need to do more investigation, however, I think when we build pyodbc and pip is given a flag like --config-settings=WITH_NUMPY=1, then the custom build backend will install numpy so that you can use this feature.

@mkleehammer any thoughts or concerns about going this route?

@mkleehammer
Copy link
Owner

My first question is would a numpy version of pyodbc even be able to load if numpy is not installed? That is, does it link with libraries that need to be present. That would be a problem.

I wondered if it would be easier to write a separate library like pyodbc_numpy that pyodbc could pass internals to.

@ndmlny-qs
Copy link
Contributor Author

My first question is would a numpy version of pyodbc even be able to load if numpy is not installed?

I need to modify a few things, but the advice I was given was to ensure pyodbc functions without NumPy, even if it was compiled with NumPy support. Then make NumPy a build requirement in the pyrpoject.toml file, but an optional runtime dependency. Finally, I need to ensure there are two configs for the tests suite; one that works without NumPy, and that works with it.

This would keep the project from fracturing into two packages, and is something I'd like to try.

@ndmlny-qs
Copy link
Contributor Author

@mkleehammer I have this branch successfully building to pyodbc 5.0.0b4. There are tests that should be made for when a user optionally installs with numpy, i.e. python -m pip install --force-reinstall .[numpy] for a dev install, or pip install pyodbc[numpy] for a user install, but I think this feature is okay for review now.

@ndmlny-qs
Copy link
Contributor Author

This commit will successfully build pyodbc with NumPy as a build requirement, and an optional runtime dependency. To test the build, I am using what was posted in a discussion here; #1253. This will start a local Microsoft SQL 2017 server that we can query against.

Installing pyodbc without NumPy gets the expected behavior of a ModuleNotFoundError propagating from the C++ code. Installing pyodbc from this branch without numpy

pip install --force-reinstall .

The following Python code will create the expected error behavior, and will not crash an (ipython or python) REPL nor will it segfault.

# test.py
import pyodbc

connection_string = (
    "DRIVER={ODBC Driver 17 for SQL Server};SERVER=127.0.0.1,1401;"
    "UID=sa;PWD=StrongPassword2017;DATABASE=test"
)
connection = pyodbc.connect(connection_string)
cursor = connection.cursor()
cursor.execute("SELECT name, database_id, create_date FROM sys.databases;")
cursor.fetchdictarray()
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[8], line 1
----> 1 cursor.fetchdictarray()

ModuleNotFoundError: No module named 'numpy'

With NumPy support as an optional dependency

pip install --force-reinstall .[numpy]

and using the same test.py results in the following returned results (in an ipython session)

{'name': array(['master', 'tempdb', 'model', 'msdb', 'test'], dtype='<U129'),
 'database_id': array([1, 2, 3, 4, 5], dtype=int32),
 'create_date': array(['2003-04-08T09:13:36.390000', '2023-10-18T15:44:45.917000',
        '2003-04-08T09:13:36.390000', '2023-01-25T10:14:45.980000',
        '2023-10-18T15:45:14.713000'], dtype='datetime64[us]')}

If a user wants to install with numpy support, then using pip they would need to run the command pip install pyodbc[numpy].

@ndmlny-qs
Copy link
Contributor Author

Looks like there is still work to be done for getting Windows builds, so I will continue to investigate.

Comment on lines +1612 to +1613
TRACE_NOLOC("\n\nCursor fetchdictarray\n\tnrows:%d\n\treturn_nulls:%s\n\tnull_suffix:%s\n\thandle:%p\n\tunicode_results:%s\n",
(int)nrows, preserve_nulls?"yes":"no", null_suffix, (void*)cursor->hstmt);

Check warning

Code scanning / CodeQL

Too few arguments to formatting function Medium

Format for pyodbc_trace_func (in a macro expansion) expects 5 arguments but given 4
@ndmlny-qs
Copy link
Contributor Author

@mkleehammer I had to modify the AppVeyor build to no longer include --no-isolation since the pyproject.toml file has a build-time dependency on numpy, while it is a run-time optional dependency. Also, the CodeQL CI was not building pyodbc, so I modified the file to do the build exactly as is in the github action.

This looks to be ready for review. I'll keep monitoring the PR for any required changes for anyone that reviews.

mkleehammer and others added 14 commits November 27, 2023 15:19
This commit allows `pyodbc` to build. The following items have been
updated.

- `pyproject.toml` has been modified to include `numpy` as a hard
  _build_ dependency, but as an optional install dependency for a user.
- `setup.py` has been modified to include `numpy` as it is a hard build
  requirement as defined in the `pyproject.toml` file.
- `src/cursor.cpp` has been updated to move the enum to the
  corresponding `src/cursor.h` file.
- `src/cursor.h` also includes the `Cursor_Validate` object.
- `src/npcontainer.{cpp,h}` have been updated to ensure the build
  happens.
This file is a duplicate of `tests/mysql_test.py` and must have slipped
through the rebasing.
This commit modifies `appveyor/install.cmd` to use the `pyproject.toml`
file. During testing of this branch, appveyor's logs show the following.

```
call .\appveyor\install.cmd
*** pip install pytest and other dev requirements ***
  DEPRECATION: typed-ast is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at pypa/pip#8559
```

This occurs because the appveyor file installs dependencies with pip
using a requirements-dev.txt file.

```
"%PYTHON_HOME%\python" -m pip install -r requirements-dev.txt --quiet --no-warn-script-location
```

The `pyproject.toml` file has been updated with the version numbers of
tools found in `requirements-dev.txt`, and two additional arrays have
been added to the optional-dependencies table; `qa` and `test`. We will
use these arrays to install optional runtime dependencies used for QA
and testing. Since tests for the `fetchdictarray` feature have not been
completed, the optional runtime dependency of NumPy has not been added
to the appveyor install.cmd script.
This commit successfully builds `pyodbc` with and without NumPy support.
If NumPy is installed as an optional dependency, then you can retrieve
results from a database as NumPy arrays. If NumPy is not installed as an
optional dependency, then an error is given to the user indicating that
NumPy cannot be found.

- `setup.py` removes the `WITH_NUMPY` macro. `pyodbc` is set to build
  with NumPy support, as indicated in the `pyproject.toml` file. All
  references to this macro have also been removed in the C++ code.
- `src/cursor.cpp` removes the guards for allowing the `fetchdictarray`,
  on the cursor object, which is the way a user gets NumPy array objects
  back. It also aligns the objects in the `Cursor_methods` object.
- `src/npcontainer.cpp` removes NumPy build guards, moves the
  initialization of NumPy (with the import_array() macro).
- `src/npcontainer.h` removes unused guards.
- `src/pyodbcmodule.cpp` removes unused NumPy guards.
@ndmlny-qs
Copy link
Contributor Author

@mkleehammer after careful consideration we have decided to build out this functionality as a new project. We are closing out this PR and moving our numpy-related efforts to a new project called npyodbc, which layers additional numpy related functionality on top of pyodbc. If there is a desire in the future to re-merge, please feel free to reach out. Thank you for all your help over the years, with your permission, we will include attribution and a link to this project in our new repo. Please let us know if you have any thoughts or concerns as we move in this direction.

@ndmlny-qs ndmlny-qs closed this Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants