Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCT measure-table-structure-accuracy-command doesn't drop index #2962

Open
mallorih opened this issue May 2, 2024 · 1 comment
Open

CCT measure-table-structure-accuracy-command doesn't drop index #2962

mallorih opened this issue May 2, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@mallorih
Copy link
Contributor

mallorih commented May 2, 2024

Describe the bug
The CCT command measure-table-structure-accuracy-command doesn't drop the extra index when it doesn't find a table to process (i.e. the documents have the wrong format).

To Reproduce

PYTHONPATH=. python unstructured/ingest/evaluate.py measure-table-structure-accuracy-command --output_dir ground_truth_text_as_html --source_dir predicted_text_as_html --output_dir output_metrics

Expected behavior
Screenshot 2024-05-02 at 3 50 36 PM

Screenshots
Error

  File "/Users/mallori/unstructured/unstructured/ingest/evaluate.py", line 276, in <module>
    main()
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/mallori/unstructured/unstructured/ingest/evaluate.py", line 236, in measure_table_structure_accuracy_command
    return measure_table_structure_accuracy(
  File "/Users/mallori/unstructured/unstructured/metrics/evaluate.py", line 375, in measure_table_structure_accuracy
    agg_df.columns = agg_headers
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py", line 5915, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py", line 823, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 230, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 6 elements, new values have 5 elements

Environment Info

Python version:  3.9.13
unstructured version:  0.13.3
unstructured-inference version:  0.7.23
pytesseract version:  0.3.10
Torch version:  2.1.0
Detectron2 version:  0.6
PaddleOCR is not installed
Libmagic version:  ==> libmagic: stable 5.45

Additional context
Add any other context about the problem here.

@mallorih mallorih added the bug Something isn't working label May 2, 2024
@mallorih
Copy link
Contributor Author

mallorih commented May 2, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant