You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
User can mention how many percentage of trees in sklearn.ensemble.RandomForestClassifier & sklearn.ensemble.RandomForestRegressor will follow which criterion
Advantages Of Implementing Above Functionality
Better results can be achieved in certain domains and this feature will help reserchers
Describe your proposed solution
This Is How The Feature Will Look At User's End When Coding In Python3
importnumpyasnpfromsklearn.datasetsimportload_irisfromsklearn.ensembleimportRandomForestClassifierfromsklearn.metricsimportaccuracy_scorefromsklearn.model_selectionimporttrain_test_split# Load datasetiris=load_iris()
X, y=iris.data, iris.targetX_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=42)
# Give multiple criterionn_estimators=100rfc=RandomForestClassifier(n_estimators=n_estimators, criterion={"gini": 0.4, "entropy": 0.3, "random": 0.3}, random_state=42)
# Model trainingrfc.fit(X_train, y_train)
# Predictionprint(rfc.predict(X_test))
Explanation Of Above Code
After implementation of this new feature, criterion parameter will also accept a dict where percentage can be passed as value for a particular criterion as key
If sum of all values is less than 1 then percentage of trees left will follow default criterion
and if it's more than 1 then an error will be raised
In above code, other than gini and entropy, there is a random criterion also where each tree falling under random criterion can have any random criterion
Describe alternatives you've considered, if relevant
Alternative Code Using np.argmax
importnumpyasnpfromsklearn.datasetsimportload_irisfromsklearn.ensembleimportRandomForestClassifierfromsklearn.metricsimportaccuracy_scorefromsklearn.model_selectionimporttrain_test_splitfromsklearn.ensembleimportVotingClassifier# Load datasetiris=load_iris()
X, y=iris.data, iris.targetX_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=42)
# Create Random Forest classifiers with Gini and Entropyn_estimators=100rf_gini=RandomForestClassifier(n_estimators=n_estimators//2, criterion="gini", random_state=42)
rf_entropy=RandomForestClassifier(n_estimators=n_estimators//2, criterion="entropy", random_state=42)
# Fit the modelsrf_gini.fit(X_train, y_train)
rf_entropy.fit(X_train, y_train)
# Predict with both modelspred_gini=rf_gini.predict(X_test)
pred_entropy=rf_entropy.predict(X_test)
# Average the predictions (convert to probabilities first, then average, then take the argmax)avg_proba= (rf_gini.predict_proba(X_test) +rf_entropy.predict_proba(X_test)) /2avg_pred=np.argmax(avg_proba, axis=1)
# Majority voting (directly count the most common prediction)fromscipy.statsimportmodemajority_pred=mode([pred_gini, pred_entropy])[0][0]
# Evaluate accuracyaccuracy_avg=accuracy_score(y_test, avg_pred)
accuracy_majority=accuracy_score(y_test, majority_pred)
print(f"Accuracy using averaged probabilities: {accuracy_avg:.2f}")
print(f"Accuracy using majority voting: {accuracy_majority:.2f}")
Additional context
I Will Be Able To Make A PR Once The Issue Gets Approval
The text was updated successfully, but these errors were encountered:
But Currently We Can't Give random As A criterion, My Vision Is To Add random As Criterion Where Each Tree In RF Could Have Random Criteria Adding More Randomness To RF
I Could Make The PR, It Can Be Easily Done By Using Python's Stock Package random
You can randomly choose the number if trees in the random forests that you pass to VotingClassifier.
If you want it randomly in each split of a tree, then my opinion is that that’s out of scope for scikit-learn, unless there are strong reasons for inclusion, see https://scikit-learn.org/stable/faq.html#id19.
Describe the workflow you want to enable
Detailed Explanation Of Proposed Workflow
User can mention how many percentage of trees in
sklearn.ensemble.RandomForestClassifier
&sklearn.ensemble.RandomForestRegressor
will follow whichcriterion
Advantages Of Implementing Above Functionality
Better results can be achieved in certain domains and this feature will help reserchers
Describe your proposed solution
This Is How The Feature Will Look At User's End When Coding In Python3
Explanation Of Above Code
After implementation of this new feature,
criterion
parameter will also accept adict
where percentage can be passed as value for a particular criterion as keyIf sum of all values is less than 1 then percentage of trees left will follow default
criterion
and if it's more than 1 then an error will be raised
In above code, other than
gini
andentropy
, there is arandom
criterion also where each tree falling underrandom
criterion can have any randomcriterion
Describe alternatives you've considered, if relevant
Alternative Code Using
np.argmax
Additional context
I Will Be Able To Make A PR Once The Issue Gets Approval
The text was updated successfully, but these errors were encountered: