Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AvroTensorDataset] Add more py test to cover various scenarios #1795

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

lijuanzhang78
Copy link
Contributor

This pr add more tests for shuffle, autotune, parallelism for various data types.

@lijuanzhang78
Copy link
Contributor Author

@kvignesh1420 Hi! Could you help review this PR adding more py tests? Thanks!

@kvignesh1420
Copy link
Member

Thanks @lijuanzhang78. Will take a look soon!

Comment on lines +27 to +54
BATCH_SIZES = [8, 16, 32, 64, 128, 256, 512, 1024]
PARALLELISM = [1, 2, 3, 4, 5, 6, tf.data.AUTOTUNE]
PARAMS = [
(batch_size, 1024, "deflate", parallelism)
for batch_size in BATCH_SIZES
for parallelism in PARALLELISM
]


@pytest.mark.benchmark(
group="autotuning",
)
@pytest.mark.parametrize(
["batch_size", "shuffle_buffer_size", "codec", "parallelism"], PARAMS
)
def test_autotuning(batch_size, shuffle_buffer_size, codec, parallelism, benchmark):
data_source = DataSource(
scenario=MIXED_TYPES_SCENARIO, num_records=LARGE_NUM_RECORDS
)
run_atds_benchmark_from_data_source(
data_source,
batch_size,
benchmark,
parallelism=parallelism,
codec=codec,
shuffle_buffer_size=shuffle_buffer_size,
rounds=10,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name says that you are testing the autotune functionality. However, I see similar tests in tests/test_atds_avro/benchmark/test_atds_parallelism_benchmark.py as well. Maybe the tests can be combined ?

Comment on lines +17 to +37
import pytest

from tests.test_atds_avro.utils.data_source import DataSource
from tests.test_atds_avro.utils.data_source_registry import SMALL_NUM_RECORDS
from tests.test_atds_avro.utils.atds_benchmark_utils import (
run_atds_benchmark_from_data_source,
)
from tests.test_atds_avro.utils.benchmark_utils import MIXED_TYPES_SCENARIO


@pytest.mark.benchmark(
group="codec",
)
@pytest.mark.parametrize(
["batch_size", "codec"], [(128, "null"), (128, "deflate"), (128, "snappy")]
)
def test_codec(batch_size, codec, benchmark):
data_source = DataSource(
scenario=MIXED_TYPES_SCENARIO, num_records=SMALL_NUM_RECORDS
)
run_atds_benchmark_from_data_source(data_source, batch_size, benchmark, codec=codec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue as before, I see similar tests in tests/test_atds_avro/benchmark/test_atds_parallelism_benchmark.py as well. Maybe the tests can be combined?

Comment on lines +30 to +48
@pytest.mark.skipif(
os.getenv("ATDS_MEM_LEAK_CHECK") != "1",
reason="This benchmark test is only used in memory leak check.",
)
@pytest.mark.benchmark(
group="all_types_of_tensors",
)
@pytest.mark.parametrize("batch_size", [(16)])
def test_all_types_of_tensors_for_memory_leak_check(batch_size, benchmark):
data_source = get_data_source_from_registry(ALL_TYPES_DATA_SOURCE_NAME)
shuffle_buffer_size = batch_size * 8
run_atds_benchmark_from_data_source(
data_source,
batch_size,
benchmark,
codec="deflate",
shuffle_buffer_size=shuffle_buffer_size,
rounds=1,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something but how exactly is the leak detected (if there is one)?

@kvignesh1420
Copy link
Member

@lijuanzhang78 gentle ping as the review comments might have been missed 🙂 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants