Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ingest2parquet issuing ERROR messages, but WARNING is preferred. #73

Closed
1 of 2 tasks
daw3rd opened this issue May 6, 2024 · 1 comment
Closed
1 of 2 tasks
Assignees
Labels
bug Something isn't working fixed Marks an issues as fixed in the dev branch

Comments

@daw3rd
Copy link
Member

daw3rd commented May 6, 2024

Search before asking

  • I searched the issues and found no similar issues.

Component

Tools/ingest2parquet

What happened + What you expected to happen

Testing of ingest2parquet show lots of ERROR messages, but does not fail the test.

Reproduction script

Lots of error messages without faling the run of ingest2parquet_local.py? Perhaps these can be changed to WARNINGS?

cd tools/ingest2parquet
make venv
make test-src

gets

...
Executing: python src/ingest2parquet_local.py
13:02:37 INFO - data factory data_ is using local data accessinput_folder - /home/dawood/git/fm-data-engineering/tools/ingest2parquet/test-data/input output_folder - /home/dawood/git/fm-data-engineering/tools/ingest2parquet/src/../test-data/output
13:02:37 INFO - data factory data_ max_files -1, n_sample -1
13:02:37 INFO - data factory data_ Not using data sets, checkpointing False, max files -1, random samples -1, files to use ['.zip']
Number of files is 2 
filepath /home/dawood/git/fm-data-engineering/tools/ingest2parquet/src/utils/lang_extensions.json
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x80 in position 11: invalid start byte
 skipping environments-master/cfortunes/diebenkorn_notes.dat Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xc3 in position 7: invalid continuation byte
 skipping environments-master/cfortunes/obliquestrategies.dat Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xfc in position 10: invalid start byte
 skipping application-java/lib/application-java.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xe5 in position 14: invalid continuation byte
 skipping application-java/lib/fabric-gateway-java-2.1.1.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xf9 in position 10: invalid start byte
 skipping application-java/lib/fabric-sdk-java-2.1.1.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte
 skipping application-java/lib/grpc-protobuf-1.23.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xe1 in position 10: invalid continuation byte
 skipping application-java/lib/protobuf-java-util-3.10.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xaa in position 11: invalid start byte
 skipping application-java/lib/api-common-1.9.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xba in position 25: invalid start byte
 skipping environments-master/commands/grel Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode bytes in position 40-41: invalid continuation byte
 skipping environments-master/commands/ldid Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xb7 in position 10: invalid start byte
 skipping application-java/lib/milagro-crypto-java-0.4.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte
 skipping application-java/lib/grpc-stub-1.23.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte
 skipping application-java/lib/grpc-netty-1.23.0.jar Error: No contents decoded
output_file_name /home/dawood/git/fm-data-engineering/tools/ingest2parquet/src/../test-data/output/https___github.com_00000o1_environments_archive_refs_heads_master.parquet
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte
 skipping application-java/lib/grpc-core-1.23.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte
 skipping application-java/lib/grpc-protobuf-lite-1.23.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte
 skipping application-java/lib/grpc-api-1.23.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xfe in position 50: invalid start byte
 skipping application-java/lib/guava-29.0-jre.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xfe in position 50: invalid start byte
 skipping application-java/lib/failureaccess-1.0.1.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xf3 in position 50: invalid continuation byte
 skipping application-java/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xe7 in position 12: invalid continuation byte
 skipping application-java/lib/perfmark-api-0.17.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xf0 in position 10: invalid continuation byte
 skipping application-java/lib/jsr305-3.0.2.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xac in position 10: invalid start byte
 skipping application-java/lib/checker-qual-2.11.1.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x82 in position 12: invalid start byte
 skipping application-java/lib/error_prone_annotations-2.3.4.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x99 in position 53: invalid start byte
 skipping application-java/lib/j2objc-annotations-1.3.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xf6 in position 10: invalid start byte
 skipping application-java/lib/cloudant-client-2.19.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x9d in position 89: invalid start byte
 skipping application-java/lib/netty-tcnative-boringssl-static-2.0.30.Final.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa8 in position 10: invalid start byte
 skipping application-java/lib/netty-codec-http2-4.1.49.Final.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa7 in position 10: invalid start byte
 skipping application-java/lib/protobuf-java-3.10.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x9c in position 11: invalid start byte
 skipping application-java/lib/bcpkix-jdk15on-1.62.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xc5 in position 10: invalid continuation byte
 skipping application-java/lib/httpclient-4.5.12.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte
 skipping application-java/lib/commons-logging-1.2.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xaa in position 14: invalid start byte
 skipping application-java/lib/commons-cli-1.4.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xca in position 14: invalid continuation byte
 skipping application-java/lib/commons-compress-1.20.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xf5 in position 10: invalid start byte
 skipping application-java/lib/cloudant-http-2.19.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xcf in position 15: invalid continuation byte
 skipping application-java/lib/commons-io-2.6.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xc5 in position 10: invalid continuation byte
 skipping application-java/lib/apache-log4j-extras-1.2.17.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa6 in position 12: invalid start byte
 skipping application-java/lib/log4j-1.2.17.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xfd in position 10: invalid start byte
 skipping application-java/lib/futures-extra-4.2.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xb3 in position 10: invalid start byte
 skipping application-java/lib/javax.json-1.1.4.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xe7 in position 10: invalid continuation byte
 skipping application-java/lib/snakeyaml-1.26.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x97 in position 10: invalid start byte
 skipping application-java/lib/jaxb-api-2.3.1.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xc9 in position 10: invalid continuation byte
 skipping application-java/lib/javax.annotation-api-1.3.2.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte
 skipping application-java/lib/gson-2.8.5.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xaa in position 10: invalid start byte
 skipping application-java/lib/commons-codec-1.11.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xfb in position 10: invalid start byte
 skipping application-java/lib/netty-handler-proxy-4.1.38.Final.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x88 in position 11: invalid start byte
 skipping application-java/lib/proto-google-common-protos-1.12.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x88 in position 10: invalid start byte
 skipping application-java/lib/netty-codec-http-4.1.49.Final.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x96 in position 12: invalid start byte
 skipping application-java/lib/netty-handler-4.1.49.Final.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xb2 in position 10: invalid start byte
 skipping application-java/lib/netty-codec-socks-4.1.38.Final.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x96 in position 12: invalid start byte
 skipping application-java/lib/netty-codec-4.1.49.Final.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x96 in position 12: invalid start byte
 skipping application-java/lib/netty-transport-4.1.49.Final.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x96 in position 12: invalid start byte
 skipping application-java/lib/netty-buffer-4.1.49.Final.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x96 in position 12: invalid start byte
 skipping application-java/lib/netty-resolver-4.1.49.Final.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xcc in position 10: invalid continuation byte
 skipping application-java/lib/netty-common-4.1.49.Final.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x9b in position 11: invalid start byte
 skipping application-java/lib/bcprov-jdk15on-1.62.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x94 in position 16: invalid start byte
 skipping application-java/lib/httpcore-4.4.13.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xac in position 10: invalid start byte
 skipping application-java/lib/auto-value-annotations-1.7.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa5 in position 89: invalid start byte
 skipping application-java/lib/commons-math3-3.6.1.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa8 in position 10: invalid start byte
 skipping application-java/lib/javax.activation-api-1.2.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x9b in position 11: invalid start byte
 skipping application-java/lib/annotations-4.1.1.4.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x86 in position 10: invalid start byte
 skipping application-java/lib/opencensus-contrib-grpc-metrics-0.21.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x86 in position 11: invalid start byte
 skipping application-java/lib/opencensus-api-0.21.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte
 skipping application-java/lib/grpc-context-1.23.0.jar Error: No contents decoded
13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x8e in position 10: invalid start byte
 skipping application-java/lib/animal-sniffer-annotations-1.17.jar Error: No contents decoded
output_file_name /home/dawood/git/fm-data-engineering/tools/ingest2parquet/src/../test-data/output/application-java.parquet
processing stats generated {'total_files_given': 2, 'total_files_processed': 2, 'total_files_failed_to_processed': 0, 'total_no_of_rows': 54, 'total_bytes_in_memory': 79661, 'failure_details': []}
Metadata file stored - response: {'name': '/home/dawood/git/fm-data-engineering/tools/ingest2parquet/src/../test-data/output/metadata.json', 'size': 445}
[dawood@data-engineering1 ingest2parquet]$ 

Anything else

No response

OS

MacOS (limited support)

Python

3.10.x

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@daw3rd daw3rd added the bug Something isn't working label May 6, 2024
@sapthasurendran sapthasurendran added the fixed Marks an issues as fixed in the dev branch label May 15, 2024
@daw3rd
Copy link
Member Author

daw3rd commented May 15, 2024

verified

@daw3rd daw3rd closed this as completed May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed Marks an issues as fixed in the dev branch
Projects
None yet
Development

No branches or pull requests

2 participants