Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: ValueError: The dataset should at least contain 2 batches to be split. #1871

Open
gloglo17 opened this issue Mar 29, 2023 · 2 comments
Open

Comments

@gloglo17
Copy link

Hi, I'm just starting with Autokeras, trying out the tutorial example with my dataset but Autokeras doesn't even start. I get this error: ValueError: The dataset should at least contain 2 batches to be split.

Python 3.10.7, Autokeras 1.1.0, Keras 2.12.0, Tensorflow 2.12.0, Pandas 1.5.3, Numpy 1.23.5

Here's the whole code including link to download the dataset.

import pandas as pd
import tensorflow as tf
import autokeras as ak
import numpy as np
import os
import requests
import io


fileName = "0183_SPORT5_limit_10_from_2023-03-29-13-33.npy"
url = f"http://34.28.182.138/{fileName}"

response = requests.get(url)
response.raise_for_status()
data = np.load(io.BytesIO(response.content))


DF = pd.DataFrame(data)
DF.to_csv("data.csv")

clf = ak.StructuredDataClassifier(
    overwrite=True, max_trials=3
)  
clf.fit('data.csv', '1', epochs=10)

Traceback (most recent call last):
  File "/aux/autokera.py", line 24, in <module>
    clf.fit('data.csv', '1', epochs=10)
  File "/home/au/.local/lib/python3.10/site-packages/autokeras/tasks/structured_data.py", line 326, in fit
    history = super().fit(
  File "/home/au/.local/lib/python3.10/site-packages/autokeras/tasks/structured_data.py", line 139, in fit
    history = super().fit(
  File "/home/au/.local/lib/python3.10/site-packages/autokeras/auto_model.py", line 288, in fit
    dataset, validation_data = data_utils.split_dataset(
  File "/home/au/.local/lib/python3.10/site-packages/autokeras/utils/data_utils.py", line 46, in split_dataset
    raise ValueError(
ValueError: The dataset should at least contain 2 batches to be split.```
@ShahzebL
Copy link

Hi,

Not sure if you're still encountering this issue. I tried checking out your dataset, but couldn't access it. Are there enough samples in data? Another option if working with small sample sizes is to decrease batch_size significantly in the fit method.

Hope this helps.

@rahmatiangit
Copy link

rahmatiangit commented Jan 7, 2024

- Problem: I get this error if the dataset has 41 or fewer rows. There is no error when the data set is 42 or higher!

- Fix: The Following change in batch_size fixes this problem: default is 32
search.fit(x=X_train, y=y_train, verbose=0, epochs=10, batch_size=12)

- Details: Autokeras 1.1, Code, and passing/failing data sets are attached.

code

url = 'auto-insurance_41a.csv'
dataframe = read_csv(url, header=None)
print(dataframe.shape)

split into input and output elements

data = dataframe.values
data = data.astype('float32')
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)

separate into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
search = StructuredDataRegressor(max_trials=15, loss='mean_absolute_error')
search.fit(x=X_train, y=y_train, verbose=0, epochs=10)

code

error

Reloading Tuner from ./structured_data_regressor/tuner0.json


ValueError Traceback (most recent call last)

in <cell line: 11>()
9 search = StructuredDataRegressor(max_trials=15, loss='mean_absolute_error')
10 # perform the search
---> 11 search.fit(x=X_train, y=y_train, verbose=0, epochs=10)
12 # evaluate the model
13 mae, _ = search.evaluate(X_test, y_test, verbose=0)

2 frames

/usr/local/lib/python3.10/dist-packages/autokeras/tasks/structured_data.py in fit(self, x, y, epochs, callbacks, validation_split, validation_data, **kwargs)
137 self.check_in_fit(x)
138
--> 139 history = super().fit(
140 x=x,
141 y=y,

/usr/local/lib/python3.10/dist-packages/autokeras/auto_model.py in fit(self, x, y, batch_size, epochs, callbacks, validation_split, validation_data, verbose, **kwargs)
286 # Split the data with validation_split.
287 if validation_data is None and validation_split:
--> 288 dataset, validation_data = data_utils.split_dataset(
289 dataset, validation_split
290 )

/usr/local/lib/python3.10/dist-packages/autokeras/utils/data_utils.py in split_dataset(dataset, validation_split)
44 num_instances = dataset.reduce(np.int64(0), lambda x, _: x + 1).numpy()
45 if num_instances < 2:
---> 46 raise ValueError(
47 "The dataset should at least contain 2 batches to be split."
48 )

ValueError: The dataset should at least contain 2 batches to be split.

Error

auto-insurance_41a.csv
auto-insurance_42a.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants