Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release camelot-fork 0.20.1 #353

Merged
merged 17 commits into from
Sep 10, 2023
Merged

Conversation

foarsitter
Copy link
Contributor

@foarsitter foarsitter commented Feb 9, 2023

Leaving this here so the it can be merged if there is any interest.

@foarsitter foarsitter changed the title Release Release camelot-fork 0.20.1 Feb 9, 2023
@foarsitter
Copy link
Contributor Author

This pull-requests does a lot of things. Since things are getting back to life over here I will explain them a bit more.

  1. The project is merged into cookiecutter-hypermodern-python which comes with a lot of features and documented workflows for i.e. how to release

  2. It removes pdftopng as hard dependency since it is not compatible with python 3.9 and above. Tests that depend on poppler/pdftopng are skipped as long as pdftopng is not updated. The fork is compatible with 3.8 and above. It is up to discussion if we need to add more python versions.

  3. It aligns the type of PDFHandler.filepath with PdfReader.stream so Path and IO objects are accepted. filepath can now be an in memory file upload for example.

  4. I ran pre-commit to the whole branch to apply black & isort to the codebase.

  5. The first pull-requests with bugfixes are merged: https://github.com/foarsitter/camelot/releases/tag/v0.20.1

From my point of view it makes sense to release version 1.0 as a stable release where we support ghostscript and do minor bugfixes and keep our dependencies up to date.

In version 2.0 we can add support for poppler/pdftopng.

What do you tink? Which changes can be cherry-picked and which changes need to dropped?

foarsitter and others added 11 commits February 26, 2023 13:42
…nd IO objects are accepted

(cherry picked from commit 436998b)
Here Table.df is initialized as an empty DataFrame instead of None.
This is for language servers to understand what members it might contain.
For example, in the following line of code,
```
are_table_rows: List[bool] = [True] * table.df.shape[0]
```
Pyright complaints `Cannot access member "shape" for type "None"`.
Such misunderstanding would not occur with this commit.
Cause: Empty horizontal and vertical dictionaries in t_bbox in method _text_bbox(t_bbox)

Resolution: Added len check condition on t_bbox before updating (xmin, ymin, xmax, ymax) and initialised (xmin, ymin, xmax, ymax) to 0 if len is 0
changed the test name to be more aligned with other tests
@ExSidius
Copy link

ExSidius commented Mar 7, 2023

@foarsitter @vinayak-mehta do you have a timeline around when this will be merged in?

I'm running into issues installing camelot on python 3.10 because of pdftopng.

Let me know if I can help at all.

@foarsitter
Copy link
Contributor Author

@foarsitter @vinayak-mehta do you have a timeline around when this will be merged in?

I'm running into issues installing camelot on python 3.10 because of pdftopng.

Let me know if I can help at all.

Sorry, there is no timeline. If you need a solution asap install the fork (pip install camelot-fork).

@MartinThoma
Copy link
Contributor

@foarsitter I would recommend to break a couple of changes out of this PR. That would have advantages:

  1. It's more likely we get progress sooner
  2. It's easier to review changes
  3. Less merge conflicts

Especially black+isort would be VERY desirable to have in its own PR. That PR should not add / change anything else + it should document which commands were executed.

@MartinThoma
Copy link
Contributor

@foarsitter #358 applies only black

README.md Outdated Show resolved Hide resolved
@vinayak-mehta
Copy link
Member

This is a big one, let me go through it this weekend

@foarsitter foarsitter mentioned this pull request Jul 15, 2023
Copy link

@bosd bosd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good to me. Few minor comments.

Only concern that this would possibly be incompatible with the multithreading.
But maybe thats a concern after we merged this.

.github/workflows/constraints.txt Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@bosd
Copy link

bosd commented Jul 15, 2023

Fixes #292 , #313, #329
Closes #269

@bosd
Copy link

bosd commented Jul 15, 2023

@MartinThoma Are you a maintainer here? Can you give the last approval to get this one in.
(This fixes gh actions, which is currently broken and blocking other PR's)

LICENSE Outdated Show resolved Hide resolved
Copy link
Contributor

@MartinThoma MartinThoma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the license / copyright change to be explained / discussed.

@foarsitter
Copy link
Contributor Author

Didn't know the cookiecutter template added a copyright with my name. Will remove it tomorrow. Thanks for checking.

@foarsitter
Copy link
Contributor Author

Found some references at other places that needed to be updated. Sorry for the inconvenience.

Copy link
Contributor

@MartinThoma MartinThoma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the many improvements 🙏

I am not super familiar with camelot (and especially its CI). As a lot of files were touched (+4583 -975! and 145 files!) I recommend making some smoke-tests locally to ensure basic functionality.

In general, this would have been easier to review if the automatic changes (ruff? black? pyupgrade? isort?) were done in separate PRs.

@foarsitter
Copy link
Contributor Author

The changes your are describing are in separated commits, you can select commits on the top left in the Files changed tab
image

What kind of smoke-tests do you have in mind? We are using the fork in production and all the tests of this PR are succeeding. Camelot has a pretty decent test coverage (88%) and a lot of different pdf types are tested: https://github.com/camelot-dev/camelot/tree/master/tests/files

One of the reasons I'm using the hypermodern cookiecutter template is that it has a lot of documentation. In example you can read about all the Github actions this PR includes over here and how to make a release over here.

I agree with you that the road I took is quite the spartan way but I think it is the fastest way forward.

@MartinThoma
Copy link
Contributor

The changes your are describing are in separated commits

Interesting. Most of the time I don't look at single commits as they contain a lot of noise. As a maintainer of pypdf, I also don't like this model as it "packages" different unrelated changes. Hence it removes the freedom of choosing which of those I want to take in and which I don't want. The result is typically that it takes overall longer to merge those PRs + causes more work.

In the case of camelot I can understand it though. It's been a while since the last release. Please take my comment here more as a general wish. It's not meant as criticism in this specific case.

We are using the fork in production

Oh, nice! That should do it :-)

@bosd
Copy link

bosd commented Jul 16, 2023

So.. concluding.. all lights are green ❤️ It's a go 😃 can we get a merge?

@MartinThoma
Copy link
Contributor

Interesting, I thought you could merge @bosd .
@foarsitter Can you merge? Do you know who can?

Just to clarify: I can merge, but not release to PyPI. I'm a bit hesitant as I would rather like to support once in a while than become a core maintainer of another big project 😅

@foarsitter
Copy link
Contributor Author

@MartinThoma yes I'm enable to merge. Since it is a large PR and a lot happened during last weekend I think it is appropriate to have a cooldown of a few days before merging so everyone can react.

Releasing to camelot on pypi is something we need to solve. In the meantime I can release to camelot-fork by forking this repo.

@MartinThoma
Copy link
Contributor

Sounds good!

Regarding the releases, I've opened #389

@bosd
Copy link

bosd commented Jul 17, 2023

Interesting, I thought you could merge @bosd .
Do you know who can?

No, I don't have any permissions here to merge.
I'm not listed as a maintainer (However I offered to be one).

As to my knowledge only @MartinThoma and @foarsitter are the only currently active maintainers.

@foarsitter
Copy link
Contributor Author

Squash and merge is the only option for merging.... That need to be changed before merging.

@MartinThoma
Copy link
Contributor

For this specific PR I agree. In general, I prefer squash-and-merge. PRs should only contain one change (once we are back on track)

@bosd
Copy link

bosd commented Aug 4, 2023

Squash and merge is the only option for merging.... That need to be changed before merging.

So this means that dev is blocked/stalled again?? Or does any of you have the super powers to change this?

@MartinThoma
Copy link
Contributor

image

I don't have access to the settings.

@foarsitter
Copy link
Contributor Author

As long there is nobody around with full permission for GitHub and pypi this project could be considered dead.

@MartinThoma
Copy link
Contributor

@vinayak-mehta Can you please give me full access so that I can adjust the project settings (temporarily enabling a merge-commit for this specific PR)?

@MartinThoma
Copy link
Contributor

@foarsitter It seems as if I will not get the necessary permissions any time soon. Would it be ok for you if I squash-merge this PR and then make a release to PyPI?

@foarsitter
Copy link
Contributor Author

Sure, no objection if its the only way forward. Go ahead :)

@MartinThoma MartinThoma merged commit e12fba4 into camelot-dev:master Sep 10, 2023
11 checks passed
@MartinThoma
Copy link
Contributor

@MartinThoma
Copy link
Contributor

"Poetry could not find a pyproject.toml file in /home/runner/work/camelot/camelot or its parents"

I don't know why this part is failing.

However, the PyPI part should be removed as it's very likely not configured.

bosd pushed a commit to bosd/pypdf_table_extraction that referenced this pull request Mar 26, 2024
This PR contains a ton of improvements. They are merged via a squash-commit, but the individual changes are visible in the PR:

* Add hypermodern python to the original repo
* Add conftest.py and skip tests that depend on poppler
* Align the type of PDFHandler.filepath with PdfReader.stream so Path and IO objects are accepted
* Pre-commit
* Initialized Table.df as empty DataFrame
* Fixed: ValueError: min() arg is an empty sequence
* Fix: unwanted data leaks into the last cell
* added test case for method bbox_no_intersection method
* Update test fixtures and remove unused imports
* Fixed ZeroDivisionError in text_in_bbox
* camelot-fork 0.20.1
* chore: revert changes to README.md
* Remove references from pyproject to the fork
* Fix install ghostscript
* Poetry update
* Removed references to me as person or to the camelot-fork repo

---------

Co-authored-by: Jinjun Liang <kumkee@users.noreply.github.com>
Co-authored-by: Manish Patwari <manish.patwari@ymail.com>
Co-authored-by: zhy <zhyn4098@corp.netease.com>
Co-authored-by: Rahul.Bhave <rahul.bhave@qxf2.com>
Co-authored-by: Constantine Parkhimovich <const@fundcogito.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants