Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev(narugo): add support for headers and binary body in recorder #683

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

narugo1992
Copy link

@narugo1992 narugo1992 commented Oct 20, 2023

Related issue: #682

In addition, after our testing, this version can be compatible with the yaml format of the previous version.

@narugo1992 narugo1992 marked this pull request as draft October 20, 2023 11:46
@narugo1992 narugo1992 marked this pull request as ready for review October 20, 2023 12:26
@narugo1992
Copy link
Author

narugo1992 commented Oct 21, 2023

one more thing

Aftering saving the binary responses to yaml file (some images from danbooru, 0.5MB~50MB per image) using this pr, I found it be really slow when calling _all_from_file, upto 5secs to 5minutes each file.

Maybe a better format should be used to optimize it?

I'm cosidering to save the binary body to extra binary files instead of simply put it in yaml with base64 code.

@beliaev-maksim
Copy link
Collaborator

Performance will be always an issue if we try to save binaries in yaml

We need to enhance the method to allow to supply file path. Then we can store binaries close to yamls. That will be more human readable and performant

Can you please update the PR with the change

@narugo1992
Copy link
Author

Performance will be always an issue if we try to save binaries in yaml

We need to enhance the method to allow to supply file path. Then we can store binaries close to yamls. That will be more human readable and performant

Can you please update the PR with the change

yep, I am also re-designing these code according to this idea.

draft it for now

@narugo1992 narugo1992 marked this pull request as draft October 21, 2023 08:24
@narugo1992
Copy link
Author

narugo1992 commented Oct 21, 2023

@beliaev-maksim now the binary data are saved to files.

For example, if you specify the config file responses.yaml, the binary files will be placed at responses/1.bin, responses/2.bin, etc. And the relative path of binary files will be put into the responses.yaml (the relative paths are relative to responses.yaml's parent directory, so the yaml and binary files can be easily moved to somewhere without any change).

I'm now using the code in this pr for the unittest in my own project, and it works well on ubuntu20.04, Python3.8. Here are some yaml and binary files I created to my unittest (the zip files in this repo): https://huggingface.co/datasets/deepghs/waifuc_responses/tree/main .

I think cross-platform testing is needed now, especially on the Windows platform, and the / in the relative path may cause errors.

@narugo1992 narugo1992 marked this pull request as ready for review October 21, 2023 10:58
responses/_recorder.py Outdated Show resolved Hide resolved
example_bins/202.bin Show resolved Hide resolved
example_bins/400.bin Show resolved Hide resolved
example_bins/500.bin Show resolved Hide resolved
@narugo1992
Copy link
Author

narugo1992 commented Oct 25, 2023

Discovered a new issue: 'responses' cannot handle HEAD requests with the Content-Length header properly, causing an IncompleteRead exception to be thrown during length checking in urllib3.

Update: fixed in commit 523857e

@narugo1992
Copy link
Author

@beliaev-maksim @markstory

Now ready for a review, as this version has already been deployed in our own project and is running as expected.

Additionally, all unit tests have passed in Github Actions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants