Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Model Training.... #146

Open
girishkumarbk opened this issue Sep 11, 2018 · 3 comments
Open

Error in Model Training.... #146

girishkumarbk opened this issue Sep 11, 2018 · 3 comments

Comments

@girishkumarbk
Copy link

Hi,

I get this error in the model training playbook.

Area Under the Curve (AUC)
AUC is the area under the receiver operating characteristic curve (ROC curve), which is 1.0 for ideal classifiers and 0.5 for those that do no better than random guessing. Let's compare the AUC score of the trained model with that of the dummy classifier.

roc_auc_score expects binarized labels

binarizer = LabelBinarizer()
binarizer.fit(Y_train_res)
Y_test_binarized = binarizer.transform(Y_test)

def auc_score(y_true, y_pred):
return roc_auc_score(binarizer.transform(y_true), binarizer.transform(y_pred), average='macro')

print('ROC AUC scores')
print('Trained model: {0}\nDummy classifier: {1}'.format(auc_score(Y_test, Y_predictions),
auc_score(Y_test, Y_dummy)))
ROC AUC scores

ValueError Traceback (most recent call last)
in ()
8
9 print('ROC AUC scores')
---> 10 print('Trained model: {0}\nDummy classifier: {1}'.format(auc_score(Y_test, Y_predictions),
11 auc_score(Y_test, Y_dummy)))

in auc_score(y_true, y_pred)
5
6 def auc_score(y_true, y_pred):
----> 7 return roc_auc_score(binarizer.transform(y_true), binarizer.transform(y_pred), average='macro')
8
9 print('ROC AUC scores')

/anaconda/envs/py35/lib/python3.5/site-packages/sklearn/metrics/ranking.py in roc_auc_score(y_true, y_score, average, sample_weight)
275 return _average_binary_score(
276 _binary_roc_auc_score, y_true, y_score, average,
--> 277 sample_weight=sample_weight)
278
279

/anaconda/envs/py35/lib/python3.5/site-packages/sklearn/metrics/base.py in _average_binary_score(binary_metric, y_true, y_score, average, sample_weight)
116 y_score_c = y_score.take([c], axis=not_average_axis).ravel()
117 score[c] = binary_metric(y_true_c, y_score_c,
--> 118 sample_weight=score_weight)
119
120 # Average the results

/anaconda/envs/py35/lib/python3.5/site-packages/sklearn/metrics/ranking.py in _binary_roc_auc_score(y_true, y_score, sample_weight)
266 def _binary_roc_auc_score(y_true, y_score, sample_weight=None):
267 if len(np.unique(y_true)) != 2:
--> 268 raise ValueError("Only one class present in y_true. ROC AUC score "
269 "is not defined in that case.")
270

ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

ROC AUC score would be good candidate when a single sensitive model evaluation measure is needed.

Any idea what's going wrong here ?

Regards,
/Girish BK

@wdecay
Copy link
Contributor

wdecay commented Sep 11, 2018

The root cause is: ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.
It's likely that you don't have enough data to train the model on. Did you generate the data via DataGeneration.ipynb or use DataIngestion.ipynb?

@girishkumarbk
Copy link
Author

girishkumarbk commented Sep 12, 2018

Hi,
I have used DataIngestion.ipnb for this exercise after data generator had pushed some level of data into IoThub.

1.0) How do we know that data is sufficiently ingested to start the training process ?
2.0) Also, could you please indicate on how to configure DataGeneration.ipynb generate training data directly on ABS ? How to configure DataGenerator to operate in these two modes 1.0) Push data to IoT hub (2) Generate data on ABS directly ?

Regards,
/Girish BK

@wdecay
Copy link
Contributor

wdecay commented Sep 12, 2018

DataGeneration.ipynb doesn't not push data directly to ABS. It writes it to the local hard drive (or DBFS if Databricks is used). Nothing prevents you from pushing this data into ABS with a few lines of code, of course, but that would be outside the scope of the solution.

Re: DataIngestion.ipynb, it's provided mainly for reference to enable a production scenario. If you use the solution AS-IS, the answer to your question (How do we know that data is sufficiently ingested to start the training process ?) is: It's never going to be enough, strictly speaking, unless you run the solution for at least several weeks and generate data for a reasonably large number of devices; in a more loose sense, however, you need to have both positive and negative data points, which means you need to have at least one device that had failed and at least one healthy device. The more the better, of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants