Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stratified label splits #39

Open
Maikello opened this issue Nov 19, 2019 · 2 comments
Open

Stratified label splits #39

Maikello opened this issue Nov 19, 2019 · 2 comments

Comments

@Maikello
Copy link

The current label splits are not stratified. This could cause issues with not all labels being present in the train or test set, which gives errors when training the model. Please replace the following code with the code down below:

newdf1 = np.random.rand(len(rnewdf)) < 0.8

train = rnewdf[newdf1]
test = rnewdf[~newdf1]

trainfeatures = train.iloc[:, :-1]
trainlabel = train.iloc[:, -1:]
testfeatures = test.iloc[:, :-1]
testlabel = test.iloc[:, -1:]


from sklearn.model_selection import StratifiedShuffleSplit
X = rnewdf.iloc[:, :-1]
y = rnewdf.iloc[:, -1:]

def dataSplitting(X, y):
"""Returns training and test set matrices/vectors for X and y"""
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2)
sss.get_n_splits(X, y)
for train_index, test_index in sss.split(X, y):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]

return X_train, X_test, y_train, y_test 

trainfeatures, testfeatures, trainlabel, testlabel = dataSplitting(X, y)

Using this code will ensure that all labels are presented equally when training, causing no errors when making a random selection that would have led to the one hot encoding to a categorical variable not making an output layer of size 10

@thisislohith6
Copy link

Hi,

Initially, I tried to build the model without replacing the code that you mentioned and got an error " ValueError: Shapes (None, 4) and (None, 10) are incompatible" and later I replaced the code that you mentioned above and built the model and at the time fitting the model again I am facing the error "ValueError: Shapes (16, 4) and (16, 10) are incompatible" .

So, could you please suggest me what changes do I need to do?

Thanks in advance, Appreciate your help!

@SyedaFaiqaFIAZ
Copy link

Hi,

Initially, I tried to build the model without replacing the code that you mentioned and got an error " ValueError: Shapes (None, 4) and (None, 10) are incompatible" and later I replaced the code that you mentioned above and built the model and at the time fitting the model again I am facing the error "ValueError: Shapes (16, 4) and (16, 10) are incompatible" .

So, could you please suggest me what changes do I need to do?

Thanks in advance, Appreciate your help!

same error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants