Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WhyLabsWriter refactor #1484

Merged
merged 39 commits into from May 13, 2024
Merged

WhyLabsWriter refactor #1484

merged 39 commits into from May 13, 2024

Conversation

richard-rogers
Copy link
Contributor

Description

Writable represents output to be serialized by whylogs. It will write itself to 1 or more temporary files. A Writer takes the temporary file(s) and sends them to their intended destination. The interface allows any Writer to handle any Writable, but in practice some Writers are only interested in certain types of Writables.

WhyLabsWriter acts as the main entry point for sending data to WhyLabs. It makes use of a new WhyLabsClient interface for interacting with WhyLabs REST APIs. WhyLabsWriter has some deprecated methods that duplicate WhyLabsClient functionality the sake of backwards compatibility. WhyLabsWriter delegates to the WhyLabsTransactionWriter, WhyLabsBatchWriter, and WhyLabsReferenceWriter classes according to the WhyLabsWriter::write() use case. The 3 classes correspond to the 3 REST API endpoints for uploading profiles (transaction, batch/async, and reference).

WhyLabsWriterBase implements a few utility methods shared by the various WhyLabs*Writer classes. In particular, _prepare_view_for_upload() handles processing required before uploading a profile (uncompounding, custom performance metric tagging). _send_writable_to_whylabs() serializes a profile in either V0 or V1 format and uploads it to WhyLabs. _upload_view() is a convenience method that just calls _prepare_view_for_upload() then _send_writable_to_whylabs(). WhyLabsWriter::write() accepts a variety of data structures representing a profile: ViewResultSet, ProfileResultSet, SegmentedResultSet, DatasetProfile, DatasetProfileView, and SegmentedDatasetProfileView. WhyLabsWriterBase::_get_view_of_writable() converts all of those except SegmentedResultSet to either DatasetProfileView or SegmentedDatasetProfileView, which represent a single profile/segment to be uploaded to WhyLabs. The WhyLabs*Writer classes generally iterate over SegmentedResultSet uploading each segment.

  • Transactions do not support zipped profiles or reference profiles
  • Segmented batch or reference profiles can be zipped by adding zip=True argument to write()

Changes

TODO: describe API changes

Related

zipped batch profiles

@richard-rogers richard-rogers changed the title WhyLogsWriter refactor WhyLabsWriter refactor Apr 3, 2024
Copy link
Contributor

@FelipeAdachi FelipeAdachi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two examples that breaks with the changes in this PR:

  • performance_estimation.ipynb

error:
AttributeError: 'WhyLabsEstimationResultWriter' object has no attribute '_org_id'

It looks like _org_id is now under _whylabs_client._org_id

  • Writing_Reference_Profiles_to_WhyLabs.ipynb

error:
❌ Failed to upload reference profile: cannot unpack non-iterable bool object Traceback (most recent call last): File "/mnt/c/Users/felip/Documents/Projects-WhyLabs/whylogs/python/whylogs/api/whylabs/session/notebook_logger.py", line 104, in notebook_session_log result = session.upload_reference_profiles(profiles) File "/mnt/c/Users/felip/Documents/Projects-WhyLabs/whylogs/python/whylogs/api/whylabs/session/session.py", line 283, in upload_reference_profiles results.append(*[id for status, id in result if status]) File "/mnt/c/Users/felip/Documents/Projects-WhyLabs/whylogs/python/whylogs/api/whylabs/session/session.py", line 283, in <listcomp> results.append(*[id for status, id in result if status]) TypeError: cannot unpack non-iterable bool object

On a superficial inspection, the cause seems to be the output of ResultSet's write(), which used to be a list of tuples. Now the result seems to be a single tuple.

@richard-rogers
Copy link
Contributor Author

I found two examples that breaks with the changes in this PR:

  • performance_estimation.ipynb
    error: AttributeError: 'WhyLabsEstimationResultWriter' object has no attribute '_org_id'

This one's fixed. I also found a bug where it possible to try to write an EstimationResult with a None accuracy. Is there a use car for None accuracies?

  • Writing_Reference_Profiles_to_WhyLabs.ipynb

error: `❌ Failed to upload reference profile: cannot unpack non-iterable bool object ...

Also fixed.

Copy link
Contributor

@jamie256 jamie256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please summarize what parts of the public API are being removed and deprecated with this change. We have to work through how much this affects the public surface area of our API and decide on major/minor version bump before we can merge these changes.

python/examples/basic/Getting_Started.ipynb Show resolved Hide resolved
python/whylogs/core/feature_weights.py Outdated Show resolved Hide resolved
python/whylogs/api/writer/__init__.py Show resolved Hide resolved
@richard-rogers
Copy link
Contributor Author

richard-rogers commented Apr 25, 2024

Please summarize what parts of the public API are being removed and deprecated with this change. We have to work through how much this affects the public surface area of our API and decide on major/minor version bump before we can merge these changes.

The main breaking change is in the Writable interface, which hopefully no-one is using:

    def write(self, path: Optional[str] = None, **kwargs: Any) -> Tuple[bool, str]:

becomes

    def write(
        self, path: Optional[str] = None, filename: Optional[str] = None, **kwargs: Any
    ) -> Tuple[bool, Union[str, List[str]]]:  # TODO: Union[str, List[Tupble[bool, str]]] ?

Writables were inconsistent about whether the original path was a directory to write to or the full filename, so now they're explicitly separate. A Writable can now write multiple files (e.g., SegmentedResultSet), so it can return a list of files written.

The writer() convenience method is also moved into the Writable class, and returns a generic wrapper that can handle any combination of Writable and Writer rather than having a separate wrapper class for each Writable.

Writer::write() can also optionally return a list of results when it writes multiple things:

    def write(
        self,
        file: Writable,
        dest: Optional[str] = None,
        **kwargs: Any,
-  ) -> Tuple[bool, str]:
+  ) -> Tuple[bool, Union[str, List[Tuple[bool, str]]]]:

@jamie256
Copy link
Contributor

Please summarize what parts of the public API are being removed and deprecated with this change. We have to work through how much this affects the public surface area of our API and decide on major/minor version bump before we can merge these changes.

Ok its not breaking examples or public API, but adding functionality and some behavior cleanup, I think we can release in 1.4.0rc0

@jamie256 jamie256 self-requested a review May 7, 2024 20:39
Copy link
Contributor

@jamie256 jamie256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for 1.4.0

@richard-rogers richard-rogers enabled auto-merge (squash) May 13, 2024 17:17
@richard-rogers richard-rogers merged commit 27c8a7e into mainline May 13, 2024
19 checks passed
@richard-rogers richard-rogers deleted the dev/richard/wlw branch May 13, 2024 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants