Bug: ValueError: The dataset should at least contain 2 batches to be split. #1871

gloglo17 · 2023-03-29T14:17:05Z

Hi, I'm just starting with Autokeras, trying out the tutorial example with my dataset but Autokeras doesn't even start. I get this error: ValueError: The dataset should at least contain 2 batches to be split.

Python 3.10.7, Autokeras 1.1.0, Keras 2.12.0, Tensorflow 2.12.0, Pandas 1.5.3, Numpy 1.23.5

Here's the whole code including link to download the dataset.

import pandas as pd
import tensorflow as tf
import autokeras as ak
import numpy as np
import os
import requests
import io


fileName = "0183_SPORT5_limit_10_from_2023-03-29-13-33.npy"
url = f"http://34.28.182.138/{fileName}"

response = requests.get(url)
response.raise_for_status()
data = np.load(io.BytesIO(response.content))


DF = pd.DataFrame(data)
DF.to_csv("data.csv")

clf = ak.StructuredDataClassifier(
    overwrite=True, max_trials=3
)  
clf.fit('data.csv', '1', epochs=10)

Traceback (most recent call last):
  File "/aux/autokera.py", line 24, in <module>
    clf.fit('data.csv', '1', epochs=10)
  File "/home/au/.local/lib/python3.10/site-packages/autokeras/tasks/structured_data.py", line 326, in fit
    history = super().fit(
  File "/home/au/.local/lib/python3.10/site-packages/autokeras/tasks/structured_data.py", line 139, in fit
    history = super().fit(
  File "/home/au/.local/lib/python3.10/site-packages/autokeras/auto_model.py", line 288, in fit
    dataset, validation_data = data_utils.split_dataset(
  File "/home/au/.local/lib/python3.10/site-packages/autokeras/utils/data_utils.py", line 46, in split_dataset
    raise ValueError(
ValueError: The dataset should at least contain 2 batches to be split.```

The text was updated successfully, but these errors were encountered:

ShahzebL · 2023-07-19T00:22:45Z

Hi,

Not sure if you're still encountering this issue. I tried checking out your dataset, but couldn't access it. Are there enough samples in data? Another option if working with small sample sizes is to decrease batch_size significantly in the fit method.

Hope this helps.

rahmatiangit · 2024-01-07T02:04:50Z

- Problem: I get this error if the dataset has 41 or fewer rows. There is no error when the data set is 42 or higher!

- Fix: The Following change in batch_size fixes this problem: default is 32
search.fit(x=X_train, y=y_train, verbose=0, epochs=10, batch_size=12)

- Details: Autokeras 1.1, Code, and passing/failing data sets are attached.

code

url = 'auto-insurance_41a.csv'
dataframe = read_csv(url, header=None)
print(dataframe.shape)

split into input and output elements

data = dataframe.values
data = data.astype('float32')
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)

separate into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
search = StructuredDataRegressor(max_trials=15, loss='mean_absolute_error')
search.fit(x=X_train, y=y_train, verbose=0, epochs=10)

code

error

Reloading Tuner from ./structured_data_regressor/tuner0.json

ValueError Traceback (most recent call last)

in <cell line: 11>()
9 search = StructuredDataRegressor(max_trials=15, loss='mean_absolute_error')
10 # perform the search
---> 11 search.fit(x=X_train, y=y_train, verbose=0, epochs=10)
12 # evaluate the model
13 mae, _ = search.evaluate(X_test, y_test, verbose=0)

2 frames

/usr/local/lib/python3.10/dist-packages/autokeras/tasks/structured_data.py in fit(self, x, y, epochs, callbacks, validation_split, validation_data, **kwargs)
137 self.check_in_fit(x)
138
--> 139 history = super().fit(
140 x=x,
141 y=y,

/usr/local/lib/python3.10/dist-packages/autokeras/auto_model.py in fit(self, x, y, batch_size, epochs, callbacks, validation_split, validation_data, verbose, **kwargs)
286 # Split the data with validation_split.
287 if validation_data is None and validation_split:
--> 288 dataset, validation_data = data_utils.split_dataset(
289 dataset, validation_split
290 )

/usr/local/lib/python3.10/dist-packages/autokeras/utils/data_utils.py in split_dataset(dataset, validation_split)
44 num_instances = dataset.reduce(np.int64(0), lambda x, _: x + 1).numpy()
45 if num_instances < 2:
---> 46 raise ValueError(
47 "The dataset should at least contain 2 batches to be split."
48 )

ValueError: The dataset should at least contain 2 batches to be split.

Error

auto-insurance_41a.csv
auto-insurance_42a.csv

gloglo17 added the bug report label Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: ValueError: The dataset should at least contain 2 batches to be split. #1871

Bug: ValueError: The dataset should at least contain 2 batches to be split. #1871

gloglo17 commented Mar 29, 2023

ShahzebL commented Jul 19, 2023

rahmatiangit commented Jan 7, 2024 •

edited

Bug: ValueError: The dataset should at least contain 2 batches to be split. #1871

Bug: ValueError: The dataset should at least contain 2 batches to be split. #1871

Comments

gloglo17 commented Mar 29, 2023

ShahzebL commented Jul 19, 2023

rahmatiangit commented Jan 7, 2024 • edited

code

split into input and output elements

separate into train and test sets

code

error

Error

rahmatiangit commented Jan 7, 2024 •

edited