[AvroTensorDataset] Add more py test to cover various scenarios #1795

lijuanzhang78 · 2023-05-01T16:40:57Z

This pr add more tests for shuffle, autotune, parallelism for various data types.

fix linter

lijuanzhang78 · 2023-05-01T17:59:05Z

@kvignesh1420 Hi! Could you help review this PR adding more py tests? Thanks!

kvignesh1420 · 2023-05-10T21:45:56Z

Thanks @lijuanzhang78. Will take a look soon!

kvignesh1420 · 2023-05-24T17:18:46Z

tests/test_atds_avro/benchmark/test_atds_autotuning_benchmark.py

+BATCH_SIZES = [8, 16, 32, 64, 128, 256, 512, 1024]
+PARALLELISM = [1, 2, 3, 4, 5, 6, tf.data.AUTOTUNE]
+PARAMS = [
+    (batch_size, 1024, "deflate", parallelism)
+    for batch_size in BATCH_SIZES
+    for parallelism in PARALLELISM
+]
+
+
+@pytest.mark.benchmark(
+    group="autotuning",
+)
+@pytest.mark.parametrize(
+    ["batch_size", "shuffle_buffer_size", "codec", "parallelism"], PARAMS
+)
+def test_autotuning(batch_size, shuffle_buffer_size, codec, parallelism, benchmark):
+    data_source = DataSource(
+        scenario=MIXED_TYPES_SCENARIO, num_records=LARGE_NUM_RECORDS
+    )
+    run_atds_benchmark_from_data_source(
+        data_source,
+        batch_size,
+        benchmark,
+        parallelism=parallelism,
+        codec=codec,
+        shuffle_buffer_size=shuffle_buffer_size,
+        rounds=10,
+    )


The file name says that you are testing the autotune functionality. However, I see similar tests in tests/test_atds_avro/benchmark/test_atds_parallelism_benchmark.py as well. Maybe the tests can be combined ?

kvignesh1420 · 2023-05-24T17:19:33Z

tests/test_atds_avro/benchmark/test_codec_atds_benchmark.py

+import pytest
+
+from tests.test_atds_avro.utils.data_source import DataSource
+from tests.test_atds_avro.utils.data_source_registry import SMALL_NUM_RECORDS
+from tests.test_atds_avro.utils.atds_benchmark_utils import (
+    run_atds_benchmark_from_data_source,
+)
+from tests.test_atds_avro.utils.benchmark_utils import MIXED_TYPES_SCENARIO
+
+
+@pytest.mark.benchmark(
+    group="codec",
+)
+@pytest.mark.parametrize(
+    ["batch_size", "codec"], [(128, "null"), (128, "deflate"), (128, "snappy")]
+)
+def test_codec(batch_size, codec, benchmark):
+    data_source = DataSource(
+        scenario=MIXED_TYPES_SCENARIO, num_records=SMALL_NUM_RECORDS
+    )
+    run_atds_benchmark_from_data_source(data_source, batch_size, benchmark, codec=codec)


same issue as before, I see similar tests in tests/test_atds_avro/benchmark/test_atds_parallelism_benchmark.py as well. Maybe the tests can be combined?

kvignesh1420 · 2023-05-24T17:57:14Z

tests/test_atds_avro/memory_benchmark/test_memory_leak_benchmark.py

+@pytest.mark.skipif(
+    os.getenv("ATDS_MEM_LEAK_CHECK") != "1",
+    reason="This benchmark test is only used in memory leak check.",
+)
+@pytest.mark.benchmark(
+    group="all_types_of_tensors",
+)
+@pytest.mark.parametrize("batch_size", [(16)])
+def test_all_types_of_tensors_for_memory_leak_check(batch_size, benchmark):
+    data_source = get_data_source_from_registry(ALL_TYPES_DATA_SOURCE_NAME)
+    shuffle_buffer_size = batch_size * 8
+    run_atds_benchmark_from_data_source(
+        data_source,
+        batch_size,
+        benchmark,
+        codec="deflate",
+        shuffle_buffer_size=shuffle_buffer_size,
+        rounds=1,
+    )


I might be missing something but how exactly is the leak detected (if there is one)?

kvignesh1420 · 2023-06-08T00:33:09Z

@lijuanzhang78 gentle ping as the review comments might have been missed 🙂 .

lijuanzhang78 added 2 commits May 1, 2023 16:33

add more tets

ae0946c

fix linter

fix linter

b6183fc

lijuanzhang78 added 4 commits May 3, 2023 22:31

delete test_parse_avro_eager.py

9d44954

update mixed benchmark

c8e01fc

linter

d36a68c

remove jrps reference in tests

a652c0d

kvignesh1420 reviewed May 24, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AvroTensorDataset] Add more py test to cover various scenarios #1795

[AvroTensorDataset] Add more py test to cover various scenarios #1795

lijuanzhang78 commented May 1, 2023

lijuanzhang78 commented May 1, 2023

kvignesh1420 commented May 10, 2023

kvignesh1420 May 24, 2023

kvignesh1420 May 24, 2023

kvignesh1420 May 24, 2023

kvignesh1420 commented Jun 8, 2023

[AvroTensorDataset] Add more py test to cover various scenarios #1795

Are you sure you want to change the base?

[AvroTensorDataset] Add more py test to cover various scenarios #1795

Conversation

lijuanzhang78 commented May 1, 2023

lijuanzhang78 commented May 1, 2023

kvignesh1420 commented May 10, 2023

kvignesh1420 May 24, 2023

Choose a reason for hiding this comment

kvignesh1420 May 24, 2023

Choose a reason for hiding this comment

kvignesh1420 May 24, 2023

Choose a reason for hiding this comment

kvignesh1420 commented Jun 8, 2023