Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage measurement of Cython code in a meson build #6186

Open
oscarbenjamin opened this issue May 6, 2024 · 18 comments
Open

Coverage measurement of Cython code in a meson build #6186

oscarbenjamin opened this issue May 6, 2024 · 18 comments

Comments

@oscarbenjamin
Copy link

oscarbenjamin commented May 6, 2024

Describe your issue

There is a related spin issue: scientific-python/spin#182.

I have created a repo to demonstrate how a spin/meson build of a Cython project looks and what is needed to get coverage of Cython code working. The README explains the problems:
https://github.com/oscarbenjamin/cython_coverage_demo

There are two related things that prevent Cython's coverage plugin from working in a meson-based project:

The paths recorded in the generated C files in a meson build look like

/* #### Code section: filename_table ### */

static const char *__pyx_f[] = {
  "thing.pyx",
  "<stringsource>",
};

When doing an inplace build with setuptools they instead look like:

/* #### Code section: filename_table ### */

static const char *__pyx_f[] = {
  "src/stuff/thing.pyx",
  "<stringsource>",
};

I'm not sure if this is just something that should be changed in spin/meson when doing a coverage-enabled build or something. Possibly this is just the way that cython is invoked. In any case Cython's coverage plugin breaks if the path to the source file is not included here.

The other problem is that Cython's coverage plugin assumes that all files are in-tree like:

$ tree src
src
└── stuff
    ├── __init__.py
    ├── meson.build
    ├── __pycache__
    │   ├── __init__.cpython-311.pyc
    │   └── test_thing.cpython-311-pytest-8.2.0.pyc
    ├── test_thing.py
    ├── thing.c
    ├── thing.cpython-311-x86_64-linux-gnu.so
    └── thing.pyx

2 directories, 8 files

This is not how meson ever lays things out. Instead in a spin/meson build:

  1. The source tree is left pristine (no .c or .so files).
  2. Intermediate built files (e.g. .c) are in the build directory.
  3. Final output files (.so and .py) are assembled in the build-install directory.

It is the build-install directory that is actually used for running tests and generates the coverage data.

When running tests with a coverage-enabled Cython build Cython's coverage plugin fails to find the files that are needed and then coverage reports warnings like:

.../site-packages/coverage/report_core.py:115: CoverageWarning: Couldn't parse '/home/oscar/current/active/cython_coverage_demo/thing.pyx': No source for code: '/home/oscar/current/active/cython_coverage_demo/thing.pyx'. (couldnt-parse)
  coverage._warn(msg, slug="couldnt-parse")

I can get coverage working if I patch Cython like this:

--- Coverage.py.backup	2024-05-06 15:17:09.336636857 +0100
+++ Coverage.py	2024-05-06 15:19:55.101376672 +0100
@@ -66,7 +66,19 @@ C_FILE_EXTENSIONS = ['.c', '.cpp', '.cc'
 MODULE_FILE_EXTENSIONS = set(['.py', '.pyx', '.pxd'] + C_FILE_EXTENSIONS)
 
 
+def _find_in_dir(name, dirpath):
+    for root, dirs, files in os.walk(dirpath):
+        if name in files:
+            return os.path.join(root, name)
+
+
 def _find_c_source(base_path):
+    if os.path.exists('build'):
+        pyxc_name = os.path.basename(base_path) + '.pyx.c'
+        cfile = _find_in_dir(pyxc_name, 'build')
+        if cfile is not None:
+            return cfile
+
     file_exists = os.path.exists
     for ext in C_FILE_EXTENSIONS:
         file_name = base_path + ext
@@ -79,6 +91,12 @@ def _find_dep_file_path(main_file, file_
     abs_path = os.path.abspath(file_path)
     if not os.path.exists(abs_path) and (file_path.endswith('.pxi') or
                                          relative_path_search):
+        src_path = os.path.join('src', file_path)
+        if os.path.exists(src_path):
+            abs_path = os.path.abspath(src_path)
+            cpath = canonical_filename(abs_path)
+            return cpath
+
         # files are looked up relative to the main source file
         rel_file_path = os.path.join(os.path.dirname(main_file), file_path)
         if os.path.exists(rel_file_path):

I also need to patch the generated C files:

diff --git a/build/src/stuff/thing.cpython-311-x86_64-linux-gnu.so.p/src/stuff/thing.pyx.c b/build/src/stuff/thing.cpython-311-x86_64-linux-gnu.so.p/src/stuff/thing.pyx.c
index 4b628fe..38af623 100644
--- a/build/src/stuff/thing.cpython-311-x86_64-linux-gnu.so.p/src/stuff/thing.pyx.c
+++ b/build/src/stuff/thing.cpython-311-x86_64-linux-gnu.so.p/src/stuff/thing.pyx.c
@@ -1467,7 +1467,7 @@ static const char *__pyx_filename;
 /* #### Code section: filename_table ### */
 
 static const char *__pyx_f[] = {
-  "thing.pyx",
+  "src/stuff/thing.pyx",
   "<stringsource>",
 };
 /* #### Code section: utility_code_proto_before_types ### */

That Cython patch is not really suitable for a PR and so I am wondering what is the proper way to fix it. From my perspective it would be fine if there was a way to explicitly tell the coverage plugin what files are where rather than depending on any guesswork.

@oscarbenjamin
Copy link
Author

The paths recorded in the generated C files in a meson build look like

The reason for this difference is that Cython is being run from the build directory e.g.:

$ cython -3 src/stuff/thing.pyx -o out.c
$ grep thing.pyx out.c
  "src/stuff/thing.pyx",
...
$ cd tmp/
$ cython -3 ../src/stuff/thing.pyx -o out.c
$ grep thing.pyx out.c
  "thing.pyx",
...

I don't see an immediate cython option for controlling that...

@oscarbenjamin
Copy link
Author

I don't think that cython's coverage plugin is going to be able to figure out where the files are in a meson build. Instead though what would be useful is if we could just pass in something like a json file that says where the files are.

There are several steps to making the coverage report work:

  1. If line tracing is enabled then trace events are emitted (__Pyx_TraceCall) at runtime and collected by coverage. These events include a filename which is the __pyx_f[0] string referred to above.
  2. Then when coverage goes to collect the results Cython's coverage plugin's file_reporter method is called for every filename and should select those that it wants to handle.
  3. For each file that Cython wants to handle (i.e. foo.pyx files) the plugin tries to find the matching foo.c file.
  4. Cython then parses the foo.c file which itself contains references back to the original foo1.pyx, foo2.pyx etc sources.

The way that this works with setuptools in my demo is:

  1. For an in-place build run in project root __pyx_f[0] is set to the relative path from project root (src/stuff/thing.pyx). This is also CWD when running coverage. I think that coverage turns this into an absolute path /path/to/proj/src/stuff/thing.pyx that comes back to file_reporter().
  2. The plugin then looks for an adjacent .c file /path/to/proj/src/stuff/thing.c and finds it.
  3. The plugin parses thing.c which has comments referring to stuff/thing.pyx which is the relative path to the original source file from the src directory.
  4. When the plugin wants to report the source file for a trace event it uses its map from __pyx_f[0] to /path/to/proj/src/stuff/thing.c and the filenames that it parsed from thing.c like stuff/thing.pyx. The code that does that is here:

    cython/Cython/Coverage.py

    Lines 77 to 108 in 45db483

    def _find_dep_file_path(main_file, file_path, relative_path_search=False):
    abs_path = os.path.abspath(file_path)
    if not os.path.exists(abs_path) and (file_path.endswith('.pxi') or
    relative_path_search):
    # files are looked up relative to the main source file
    rel_file_path = os.path.join(os.path.dirname(main_file), file_path)
    if os.path.exists(rel_file_path):
    abs_path = os.path.abspath(rel_file_path)
    abs_no_ext = os.path.splitext(abs_path)[0]
    file_no_ext, extension = os.path.splitext(file_path)
    # We check if the paths match by matching the directories in reverse order.
    # pkg/module.pyx /long/absolute_path/bla/bla/site-packages/pkg/module.c should match.
    # this will match the pairs: module-module and pkg-pkg. After which there is nothing left to zip.
    abs_no_ext = os.path.normpath(abs_no_ext)
    file_no_ext = os.path.normpath(file_no_ext)
    matching_paths = zip(reversed(abs_no_ext.split(os.sep)), reversed(file_no_ext.split(os.sep)))
    for one, other in matching_paths:
    if one != other:
    break
    else: # No mismatches detected
    matching_abs_path = os.path.splitext(main_file)[0] + extension
    if os.path.exists(matching_abs_path):
    return canonical_filename(matching_abs_path)
    # search sys.path for external locations if a valid file hasn't been found
    if not os.path.exists(abs_path):
    for sys_path in sys.path:
    test_path = os.path.realpath(os.path.join(sys_path, file_path))
    if os.path.exists(test_path):
    return canonical_filename(test_path)
    return canonical_filename(abs_path)

    This tries a few things:
  5. First it tries to combine thing.c's directory with the path like /path/to/proj/src/stuff/stuff/thing.pyx. That doesn't work because we have double stuff in the path.
  6. Then it tries removing the file extensions and compares the paths one component at a time in reverse to try to match an absolute path to thing.c's directory with a relative path to thing.pyx's directory. In my case that succeeds.
  7. If that fails then it searches sys.path to see if any directory there has stuff/thing.pyx.

In the spin meson setup the directory structure (for CWD) at the time the coverage plugin runs is like:

$ tree src build/src build-install/
src
└── stuff
    ├── __init__.py
    ├── meson.build
    ├── test_thing.py
    └── thing.pyx
build/src
└── stuff
    ├── thing.cpython-311-x86_64-linux-gnu.so
    └── thing.cpython-311-x86_64-linux-gnu.so.p
        ├── meson-generated_src_stuff_thing.pyx.c.o
        └── src
            └── stuff
                ├── thing.pyx.c
                └── thing.pyx.c.dep
build-install/
└── usr
    └── lib
        └── python3.11
            └── site-packages
                └── stuff
                    ├── __init__.py
                    ├── __pycache__
                    │   ├── __init__.cpython-311.pyc
                    │   ├── test_thing.cpython-311.pyc
                    │   └── test_thing.cpython-311-pytest-8.2.0.pyc
                    ├── test_thing.py
                    └── thing.cpython-311-x86_64-linux-gnu.so

The current working directory here is project root and build-install is the only directory that is on sys.path.

I don't think it is reasonable to extend the heuristics currently used in the plugin to the extent that they would be able to locate thing.pyx.c and then from there find thing.pyx. Realistically I need to be able to provide that information to the plugin.

Stripping this all back all that is really wanted for the plugin to work is:

  1. To output firstly a name in __pyx_f[0] that we will be able to recognise later.
  2. A map from the filenames in the trace events to the foo.c file.
  3. To be able to locate the files referred to in each foo.c which are the original Cython sources based on their names in foo.c.

It is not hard though for me to generate all of the information that the plugin needs in the form of e.g. a json file like:

{
    # map from __pyx_f[0] to c file path
    "c_files": {
        "src/stuff/thing.pyx":
            "build/src/stuff/thing.cpython-311-x86_64-linux-gnu.so.p/src/stuff/thing.pyx.c"
    }
    # maps from paths in foo.c etc to source .pyx etc files
    "source_files": {
        "build/src/stuff/thing.cpython-311-x86_64-linux-gnu.so.p/src/stuff/thing.pyx.c": {
            "stuff/thing.pyx": "src/stuff/thing.pyx"
        }
    }
}

With spin/meson we can easily generate this information because we know where we put the .c files and we also now what the corresponding source files are from thing.pyx.c.dep (via cython --depfile):

$ cat build/src/stuff/thing.cpython-311-x86_64-linux-gnu.so.p/src/stuff/thing.pyx.c.dep 
src/stuff/thing.cpython-311-x86_64-linux-gnu.so.p/src/stuff/thing.pyx.c: \
  ../src/stuff/thing.pyx

It would not be hard then to either add a custom target in the project's meson build that builds this json file or for it to be something that spin does automatically for any project that wants it.

What I would need from Cython to make this work is just two things:

  1. I need to control the path that is stored in __pyx_f[0] ideally through a command line argument to cython.
  2. I need a way to tell the plugin to load and use my json file rather than doing heuristic path searches.

I can implement this and send a PR but I'm just wondering if this sounds reasonable and if anyone has any suggestions for what would be the best interface for these last two points.

@rgommers
Copy link
Contributor

rgommers commented May 8, 2024

The reason for this difference is that Cython is being run from the build directory e.g.:
[...]
I don't see an immediate cython option for controlling that...

This part seems like a problem in the cython frontend, which doesn't show up with Cython's build_ext class (and maybe not with cythonize)?

For a cython invocation like cython -3 path/to/thing.pyx -o output/path/to/thing.c, the paths in thing.c should always be correct from within this file, right? So in this example, it should be ../../../path/to/thing.pyx. The working directory from which cython is invoked should never come into play I'd think, it's just the .c -> .pyx connection that should always be maintained independent of where both files are located. And it shouldn't require a flag passed to the cython frontend - unless I'm misunderstanding, there is only ever a single correct answer.

  1. For each file that Cython wants to handle (i.e. foo.pyx files) the plugin tries to find the matching foo.c file.

So this is the reverse problem of the above - but harder indeed, because the .pyx -> .c mapping needs to be in a separate file. Having the build system write that out in the way you describe in a json file sounds like a good solution to me.

@oscarbenjamin
Copy link
Author

This part seems like a problem in the cython frontend, which doesn't show up with Cython's build_ext class (and maybe not with cythonize)?

For a cython invocation like cython -3 path/to/thing.pyx -o output/path/to/thing.c, the paths in thing.c should always be correct from within this file, right? So in this example, it should be ../../../path/to/thing.pyx. The working directory from which cython is invoked should never come into play I'd think, it's just the .c -> .pyx connection that should always be maintained independent of where both files are located.

What happens is that meson calls cython with absolute paths:

cython -M --fast-fail -3 /home/oscar/current/active/cython_coverage_demo/src/stuff/thing.pyx -o src/stuff/thing.cpython-311-x86_64-linux-gnu.so.p/src/stuff/thing.pyx.c

Then cython internally strips off the current working directory but only if the file is inside CWD:

# Prefer relative paths to current directory (which is most likely the project root) over absolute paths.
workdir = os.path.abspath('.') + os.sep
self.file_path = filename[len(workdir):] if filename.startswith(workdir) else filename

Given that cython was invoked with absolute paths there is no well defined way for it to know how to turn those into relative paths unless it is just a relative path from the directory containing foo.c to the location of foo.pyx.

On the other hand though cython knows how to generate paths relative to sys.path i.e. relative to ./src in the demo repo. It already uses those paths for the comments like:

/* "stuff/thing.pyx":1
 * cdef class Thing:             # <<<<<<<<<<<<<<
 *     def method(self):
 *         return 2
 */

These are the paths that are parsed from the c file and used to locate its source .pyx files. I am not sure how cython discerns this path.

I think it would make sense for __pyx_f[0] to be a path relative to sys.path as well i.e. it should be stuff/thing.pyx rather than relative to project root (src/stuff/thing.pyx) or relative to the location of the c file (../src/stuff/thing.pyx). Then the json can use the same name stuff/thing.pyx in both c_files and source_files:

{
    # map from __pyx_f[0] to c file path
    "c_files": {
        "stuff/thing.pyx":
            "build/src/stuff/thing.cpython-311-x86_64-linux-gnu.so.p/src/stuff/thing.pyx.c"
    }
    # maps from paths in foo.c etc to source .pyx etc files
    "source_files": {
        "build/src/stuff/thing.cpython-311-x86_64-linux-gnu.so.p/src/stuff/thing.pyx.c": {
            "stuff/thing.pyx": "src/stuff/thing.pyx"
        }
    }
}

Note that if we weren't using the src-layout then __pyx_f[0] would be stuff/thing.pyx because in that case project root, current working directory and sys.path[0] are all the same.

@oscarbenjamin
Copy link
Author

  • I need to control the path that is stored in __pyx_f[0] ideally through a command line argument to cython.
  • I need a way to tell the plugin to load and use my json file rather than doing heuristic path searches.

Another option would be for spin to have its own Cython coverage plugin that handles all the path finding. Then the only thing that needs changing in Cython is setting __pyx_f[0] correctly when the source file is not in CWD.

@oscarbenjamin
Copy link
Author

In a more complicated situation with cimport etc the __pyx_f table looks like:

static const char *__pyx_f[] = {
  "src/flint/types/fmpz.pyx",
  "src/flint/utils/conversion.pxd",
  "src/flint/types/fmpz.pxd",
  "src/flint/utils/typecheck.pxd",
  "src/flint/flint_base/flint_base.pxd",
  "type.pxd",
};

The tracing events are sent out with any of these filenames and Cython tries to guess which .c file is involved. The pxd files involved are included in many .c files though so there is no straight-forward map from e.g. fmpz.pxd to fmpz.c.

With the out of tree meson build that becomes:

static const char *__pyx_f[] = {
  "fmpz.pyx",
  "conversion.pxd",
  "fmpz.pxd",
  "flint_base.pxd",
  "type.pxd",
};

This is where the __pyx_f paths get stripped off if they are still absolute paths:

file_path = source_desc.get_filenametable_entry()
if isabs(file_path):
file_path = basename(file_path) # never include absolute paths

This is possibly where absolute paths become relative for what is shown in the annotated comments:

def get_description(self):
try:
return os.path.relpath(self.path_description)
except ValueError:
# path not under current directory => use complete file path
return self.path_description

That's relative to CWD but possibly cythonize changes the working directory:
base_dir, ext_modules = args
script_args = ['build_ext', '-i']
cwd = os.getcwd()
temp_dir = None
try:
if base_dir:
os.chdir(base_dir)
temp_dir = tempfile.mkdtemp(dir=base_dir)
script_args.extend(['--build-temp', temp_dir])
setup(
script_name='setup.py',
script_args=script_args,
ext_modules=ext_modules,
)

In fact that is one option for making cython get the paths correct:

$ cython --help
...
  -w WORKING_PATH, --working WORKING_PATH
                        Sets the working directory for Cython (the directory modules are searched
                        from)

@oscarbenjamin
Copy link
Author

Here is where the relative to sys.path paths are generated:

# Set up source object
cwd = os.getcwd()
abs_path = os.path.abspath(source)
full_module_name = full_module_name or context.extract_module_name(source, options)
full_module_name = EncodedString(full_module_name)
Utils.raise_error_if_module_name_forbidden(full_module_name)
if options.relative_path_in_code_position_comments:
rel_path = full_module_name.replace('.', os.sep) + source_ext
if not abs_path.endswith(rel_path):
rel_path = source # safety measure to prevent printing incorrect paths
else:
rel_path = abs_path
source_desc = FileSourceDescriptor(abs_path, rel_path)
source = CompilationSource(source_desc, full_module_name, cwd)

@rgommers
Copy link
Contributor

rgommers commented May 8, 2024

Nice detective work!

there is no well defined way for it to know how to turn those into relative paths unless it is just a relative path from the directory containing foo.c to the location of foo.pyx.

That relative path from foo.c to foo.pyx is what I meant indeed. It's the simplest and most reliable thing to do I'd think, and should work independent of how or from where cython is called. Having the paths be relative to some other location is an extra level of indirection and seems to be a bit fragile.

The other option may be to tell cython what the project/source root dir is, so it doesn't have to guess (the "strip if inside CWD" is guessing) - which it sounds like can be done with --working already. However, I suspect that this may not interact too well with other functionality, like inferring the module name. Maybe using both --working and --module-name together will work? (not pretty though)

@oscarbenjamin
Copy link
Author

Using --working also affects the interpretation of the output path e.g. meson gives a relative path for the output file like:

cython /path/to/proj/src/stuff/thing.pyx -o path/in/build/dir/thing.pyx.c

If --working is used then the name of the output path needs to be relative to the given directory like ../build-dir-name/....

However, I suspect that this may not interact too well with other functionality, like inferring the module name.

Cython has to be able to figure out what is the base directory from which Python/Cython imports are relative so that it can resolve from stuff.thing cimport Thing. It always knows the fully qualified module name for any main .pyx file and it is that relative path stuff.thing -> stuff/thing.pyx that goes in the C code comments. I think that this is generally the best path to use because e.g. some people measure coverage measurement against an installed package (presumably with nox etc): neither the build nor source directory should be part of this path.

We also need to be able to collect multiple references to thing.pyx that arise from different extension modules i.e. difference c files potentially in different packages. The coverage plugin merges these all together so that we get a single coverage report for thing.pyx that includes the code that cimports Thing into each separate module. It makes sense that we want a canonical name for this .pyx file rather than relative paths from different .c files.

It is the __pyx_f table that is relative to CWD at the time that cython runs that is problematic. The path is recorded as the relative path delta from cwd to the .pyx file when cython runs. It is then presumed that either:

  1. The relative path from cwd to thing.pyx file will be the same as delta when coverage runs.
  2. thing.c and thing.pyx are in the same directory with matching names.
  3. The same relative path delta represents an offset from a directory on sys.path.

None of these assumptions works with meson. In fact even without meson delta = src/stuff/thing.pyx will never be the right offset from sys.path so assumption 3 is always broken in a src-layout.

I think that the path to record should be stuff/thing.pyx which is what is needed for assumption 3 anyway. This way it is the same path reproducibly regardless of the location of the source and build directories. Note that this is also the path that is already recorded when using setuptools if you do not have a src-layout.

Then it is just necessary at coverage collection time to understand that the path is not necessarily relative to . but rather to wherever is the import base i.e. it could be src or somewhere in sys.path. Basically the coverage plugin needs to be told somehow that we are using a src layout if we expect it to locate the files in src.

@oscarbenjamin
Copy link
Author

It seems that coverage itself assumes that the path is relative to CWD before calling the plugin. The path needs to be src/stuff/thing.pyx because coverage will turn that into /path/to/proj/src/stuff/thing.pyx and if that file does not exist then it won't even call Cython's coverage plugin.

@oscarbenjamin
Copy link
Author

It seems that coverage itself assumes that the path is relative to CWD before calling the plugin.

A workaround for this is to add a [paths] entry in .coveragerc:

[run]
plugins = Cython.Coverage

[paths]
source =
    src/
    .

@scoder
Copy link
Contributor

scoder commented May 8, 2024 via email

@oscarbenjamin
Copy link
Author

Is there a reason why Cython cannot run from the main project directory here?

I don't know enough about the internals of meson/ninja to give a good answer here. Generally the idea in the design is that the build directory can be anywhere and all commands are run from the build directory. When you run meson it generates a dependency graph of commands to be run and then ninja runs them all from the same directory. It is of course possible to make a wrapper that chdir's before running cython but that is not needed for other compilers.

Turning the question around: should the cython CLI not be usable in a way that does not depend on the current working directory?

I could understand cython needing to know the directory where imports should begin (i.e. ./src in a src-layout) but that is not the current working directory and I don't see why cython should need to be in a particular directory for any other reason.

This already works with cython apart from __pyx_f being set incorrectly:

def __init__(self, filename, path_description=None):
filename = Utils.decode_filename(filename)
self.path_description = path_description or filename
self.filename = filename
# Prefer relative paths to current directory (which is most likely the project root) over absolute paths.
workdir = os.path.abspath('.') + os.sep
self.file_path = filename[len(workdir):] if filename.startswith(workdir) else filename

file_path = source_desc.get_filenametable_entry()
if isabs(file_path):
file_path = basename(file_path) # never include absolute paths

Here basename(file_path) is basically never the right thing to do. I can understand the desire to avoid using an absolute path but basename throws away all directory information making two files in different locations with the same name indistinguishable.

This change would make it fall back to using the import-relative path (stuff/thing.pyx) when writing the module file table:

diff --git a/Cython/Compiler/ModuleNode.py b/Cython/Compiler/ModuleNode.py
index 67511fc..f30aab6 100644
--- a/Cython/Compiler/ModuleNode.py
+++ b/Cython/Compiler/ModuleNode.py
@@ -968,7 +968,7 @@ class ModuleNode(Nodes.Node, Nodes.BlockNode):
             for source_desc in code.globalstate.filename_list:
                 file_path = source_desc.get_filenametable_entry()
                 if isabs(file_path):
-                    file_path = basename(file_path)  # never include absolute paths
+                    file_path = source_desc.get_description() # never include absolute paths
                 escaped_filename = file_path.replace("\\", "\\\\").replace('"', r'\"')
                 escaped_filename = as_encoded_filename(escaped_filename)
                 code.putln('%s,' % escaped_filename.as_c_string_literal())

With that one change I could make everything else work outside of the Cython codebase by extending Cython's coverage plugin with one that knows where meson puts the C files.

@rgommers
Copy link
Contributor

rgommers commented May 9, 2024

Generally the idea in the design is that the build directory can be anywhere and all commands are run from the build directory.

This is basically it, and it's not specific to only Meson. CMake is pretty similar, building a project with CMake may look like:

mkdir build && cd build
cmake .. <configure flags here>
cmake --build . -j

I don't know enough about Bazel & co to say for sure, but I suspect they may behave similarly.

If Cython makes an implicit assumption about being run from the project root, that is in general not correct and should be avoided. It stems from a "distutils is all that matters" era where it did happen to be true.

I can understand the desire to avoid using an absolute path

+1 relative paths are better especially for paths that end up in the final extensions modules (absolute paths are a hindrance to achieving reproducible builds).

With that one change I could make everything else work outside of the Cython codebase by extending Cython's coverage plugin with one that knows where meson puts the C files.

Nice!

@scoder
Copy link
Contributor

scoder commented May 9, 2024 via email

@oscarbenjamin
Copy link
Author

I understand that the current way is not optimal, but I doubt that there's the one correct way to do it.

Thinking about what is "optimal" is hard but in my experience tooling that does not make guesses and does not use implicit assumptions is easier to maintain from the inside and easier to understand and use from the outside. From that perspective I would say that the first thing to do is make it so that all needed information can be provided explicitly and then secondly that defaults are clearly defined, documented and adhered to. From there though I would resist the temptation to guess if the defaults are incorrect in a given situation: the user is better served by an informative error message and should provide the correct information explicitly.

A lot of complexity in the coverage plugin comes from trying to guess information that could have easily been supplied by the user if any mechanism were provided for them to do so. It is difficult to change anything now without breaking someone's workflow though. Once guessing is already implemented as part of the documented way of doing things then people will depend on it and will expect the guessing to be extended indefinitely to cover their different scenarios any time it fails.

For the cython CLI the "optimal" thing in my opinion would be something like:

cython path/to/src/foo.pyx -o otherpath/to/foo.c --base-dir=path/to/src

Then all paths recorded in output are relative to --base-dir. The effect of CWD should just be that relative paths provided in the CLI are relative to CWD (as usual) and that the default value of --base-dir is CWD. I'm not sure if this could be implemented in a non-breaking way though. Currently cython implicitly guesses that src is the --base-dir for some purposes but uses CWD for others. If you don't have a src-layout and you always run with CWD as project root then there is no distinction. Once you have sources in a subdirectory and/or run from different CWD the equivalence breaks and currently there is no explicit way to control what cython does if its implicit assumptions don't hold.

I have suggested changing the basename path as a least resistance approach because it is unambiguously incorrect so if someone is depending on it then a change is at least clearly defensible if the pitchforks come out. It still means though that the paths depend on the current working directory (whether the source file is under CWD or not) which I would say is not optimal.

@scoder
Copy link
Contributor

scoder commented May 9, 2024 via email

@oscarbenjamin
Copy link
Author

I suppose guess is the wrong word for how cython finds the base path for the package structure. Precisely what it does is to walk up starting from the directory containing the main .pyx file looking for __init__.py (or .pyc etc) files until it finds the first directory that does not look like a package. That is well defined and is what allows cython to compile files correctly in a src-layout when run with CWD as project root.

A directory does not need __init__.py to be a package after PEP 420 (implicit namespace packages) so this is still somewhat a heuristic that would fail in certain conditions. There is a way to override this with --module-name stuff.thing though which is less convenient than --base-dir=src but does work.

It really only looks at the CWD and the main package directory.

Yes and this means if CWD is not a suitable location then relative paths have to be relative to the package directory. That is what the diff I suggested above does: if the main .pyx is file is not anywhere under CWD then use paths relative to the package directory. That means that if we do

cd build && cython ../src/stuff/thing.pyx -o thing.pyx.c

then we get a file build/thing.pyx.c with the __pyx_f path set as

static const char *__pyx_f[] = {
  "stuff/thing.pyx",
};

These relative paths would be found by coverage when running at project root and by default coverage would treat that as meaning /path/to/proj/stuff/thing.pyx which is not correct. We can also tell coverage about the src dir with:

[run]
plugins = Cython.Coverage

[paths]
source =
    src/
    .

and then coverage will find /path/to/proj/src/stuff/thing.pyx and send that path to Cython's coverage plugin.

At that point Cython's coverage plugin would not know how to find build/thing.pyx.c. What we can do though is:

[run]
plugins = spin.cython_coverage_plugin

Then spin's coverage plugin can find and parse all .pyx.c files in the build directory at load time. The plugin will then already know about all of the .pyx etc files when coverage sends the paths through to its file_reporter() method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants