Why some fundament algorithms like LR DT RF is comparable with DES methods on my dataset. #259

chenz1hao · 2021-10-28T12:36:52Z

I mean, the des method does not improve or even worse in the indicators run by my data set.

Menelau · 2021-10-29T17:47:06Z

Hello,

It is impossible to say why without knowing more the data and all the methodological steps used to run the algorithms.

Did you normalized all your data before applying dynamic selection? Did you try different approaches like DES base on clustering to see if that would give you better performance?

chenz1hao · 2021-10-31T08:03:44Z

Dataset: http://bit.ly/xMLdataset (a binary classification task), I ran logistic regression (from sklearn) on this dataset and compare with DES methods (code copy from documentation) no normalized no any preprocessing just original dataset split into train_test dataset and I found there is no obvious performance improving in using DES methods.
maybe you can have a try on this dataset. thank you very much.
Code and result details are as follows:

chenz1hao · 2021-11-01T13:57:16Z

def AUC_plot(algorithmName, test_y, pred_y_prob):
    # print(algorithmName, "AUC图像绘制：")
    fpr, tpr, thresholds = roc_curve(test_y, pred_y_prob)
    auc = roc_auc_score(test_y, pred_y_prob)
    plt.plot(fpr, tpr)
    plt.title(algorithmName+" AUC=%.4f" % (auc))
    plt.xlabel("False Positive Rate")
    plt.ylabel("True Positive Rate")
    plt.fill_between(fpr, tpr, where=(tpr > 0), color='green', alpha=0.5)
    plt.show()


# 输出打印算法性能
def print_performance(algorithm_name, test_y, pred_y, pred_y_prob):
    # TP(True Positive) 预测正确的1
    # FN(False Negative) 预测为-1，真实为1
    # FP(False Positive) 预测为1，真实为-1
    # TN（True Negative) 预测为-1，真实为-1

    TP = []
    FN = []
    FP = []
    TN = []

    for i in range(len(pred_y)):
        if pred_y[i] == 1 and test_y[i] == 1:
            TP.append(i)
        elif pred_y[i] == 0 and test_y[i] == 1:
            FN.append(i)
        elif pred_y[i] == 1 and test_y[i] == 0:
            FP.append(i)
        elif pred_y[i] == 0 and test_y[i] == 0:
            TN.append(i)

    accuracy = (len(TP)+len(TN))/(len(TP)+len(FP)+len(TN)+len(FN))
    precision = len(TP) / (len(TP) + len(FP))
    recall = len(TP) / (len(TP) + len(FN))
    F1_score = 2 * ((precision*recall)/(precision+recall))
    print(algorithm_name, '：')
    print('Accuracy：', accuracy)
    print('Precision：', precision)
    print('Recall：', recall)
    print('F1-SCORE：', F1_score)
    AUC_plot(algorithm_name, test_y, pred_y_prob)
    print('\n')

if __name__ == '__main__':
    dataset = pd.read_csv('data/heloc_dataset_v2.csv')
    X_train, X_test, y_train, y_test = train_test_split(dataset.drop(['target'],axis=1), dataset['target'], test_size=0.30, random_state=666)
    com_lr = LogisticRegression(max_iter=10000)
    com_lr.fit(X_train, y_train)
    print_performance('LR compare', np.array(y_test), com_lr.predict(X_test), com_lr.predict_proba(X_test)[:,1])
    pool_classifiers = BaggingClassifier(base_estimator=DecisionTreeClassifier(),
                                         n_estimators=100,
                                         random_state=666)
    X_train, X_dsel, y_train, y_dsel = train_test_split(X_train, y_train,
                                                        test_size=0.50,
                                                        random_state=666)
    pool_classifiers.fit(X_train, y_train)
    meta = METADES(pool_classifiers, random_state=666)
    names = ['META-DES']
    methods = [meta]
    # Fit the DS techniques
    scores = []
    for method, name in zip(methods, names):
        method.fit(X_dsel, y_dsel)
        scores.append(method.score(X_test, y_test))
        print_performance(name, np.array(y_test), method.predict(X_test), method.predict_proba(X_test)[:,1])

as you can see from the picture above, LR is logistic regression in sklearn, nearly all performance terms on META-DES are not good as logistic regression. I wonder how this would happened?

@Menelau

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why some fundament algorithms like LR DT RF is comparable with DES methods on my dataset. #259

Why some fundament algorithms like LR DT RF is comparable with DES methods on my dataset. #259

chenz1hao commented Oct 28, 2021

Menelau commented Oct 29, 2021

chenz1hao commented Oct 31, 2021 •

edited

chenz1hao commented Nov 1, 2021 •

edited

Why some fundament algorithms like LR DT RF is comparable with DES methods on my dataset. #259

Why some fundament algorithms like LR DT RF is comparable with DES methods on my dataset. #259

Comments

chenz1hao commented Oct 28, 2021

Menelau commented Oct 29, 2021

chenz1hao commented Oct 31, 2021 • edited

chenz1hao commented Nov 1, 2021 • edited

chenz1hao commented Oct 31, 2021 •

edited

chenz1hao commented Nov 1, 2021 •

edited