Cervical cancer behavior risk

Cervical cancer is the second most common type of cancer that affects women around the world. Especially in developed countries. As with all cancers, early detection offers the best chance for successful treatment, so the ability to use behavioral science, which does not require expensive testing, can have a positive effect on initial diagnosis.

Link to presentation.
Link to article.

Main Report.
Report - Data overview.
Report - Model Results.

Variables

The variables come from widely studied behavioral theory, more specifically:

The Health Belief Model (HBM)
Protection Motivation Theory (PMT)
Theory of Planned Behavior (TPB)
Social Cognitive Theory (SCT)

and consist of:

Intention
Attitude
Subjective Norm
Perception
Motivation
Social Support
Empowerment

Variable’s Indicator:

Prevention Behavior of Cervical Cancer:

Y1 : Not put up sexual intercourse risk HPV infection
Y2 : Consume nutritious food balanced
Y3 : Personal hygiene

Intention:

Y4 : Aggregation
Y5 : Compatibility
Y6 : Commitment

Attitude:

Y7 : Direction to behavior prevention
Y8 : Consistency
Y9 : Spontaneity

Subjective Norms:

Y10 : Trust of norms
Y11 : Significant Person
Y12 : The fulfillment of norms which is believed to be

Perception:

Y13 : Susceptibility the perceived.
Y14 : Potential severity the perceived
Y15 : Perceived advantage

Motivation:

Y16 : The strength of a willingness to conduct prevention
Y17 : The number of time provided to behavior prevention
Y18 : Mutual consent leave other task by behavior prevention

Social Support:

X1 : Emotional given the other related behavior prevention
X2 : Instrumental given the other related behavior prevention
X3 : The information given the other related behavior prevention

Empowerment:

X4 : Provisions for needs in preventing
X5 : Provisions for the ability to manage behavior prevention
X6 : Provisions for the ability determines the way prevention

Dataset

This dataset consist of 18 variables that are provided by the seven indicators listed above and dependent variable which describes whether respondent has cervical cancer (1=has cervical cancer, 0=no cervical cancer):

behavior_eating
behavior_personalHygiene
intention_aggregation
intention_commitment
attitude_consistency
attitude_spontaneity
norm_significantPerson
norm_fulfillment
perception_vulnerability
perception_severity
motivation_strength
motivation_willingness
socialSupport_emotionality
socialSupport_appreciation
socialSupport_instrumental
empowerment_knowledge
empowerment_abilities
empowerment_desires
ca_cervix

Number of respondents: 72
Missing Data: 0

Two classification algorithms were used in this study:

Naive Bayes (NB)
Logistic Regression (LR)

A 10-fold cross validation was applied for each of them, and the results obtained are as follows:

NB:
- Accuracy: 91.67%
- AUC: 0.96
LR:
- Accuracy: 87.50%
- AUC: 0.97

Aim of the work

Improving the results obtained by researchers.
Presenting additional metrics that seem more appropriate for medical issues, e.g. precision and recall.

Summary of results

Because the KMO test result showed a middling level of effectiveness we plotted a PCA graph for the two components that explain approximately 47% of the total variance of the dataset. As can be seen with the ellipse, it is possible to mark the area for cases with cancer fairly correctly.

Using Python and Sklearn, the researchers' results were also improved. First, a group representing 25% of the size of the dataset was separated in a way that included, in appropriate numbers, cases from the minority class (those with cancer). In each case, the model was trained on 75% of the original dataset using 10-fold cross validation.

Logistic Regression (LR)

data were standardized
tuned parameters: class weight, C
model accuracy: 0.94667

ROC curve

Since we were dealing with an unbalanced number among the classes (less 1/3 of the cases have cancer) G-Mean was used to determine the best cut-off point and with it, a test of the model's effectiveness was performed on test data (25% that the model had never seen before).

Results after moving the cut-off point.

Naive Bayes (NB)

data have not been standardized
tuned parameter: alpha
model accuracy: 0.87

The same procedure as for logistic regression was used.

ROC curve

Results after moving the cut-off point.

Conclusion

As can be observed, the results we obtained for Naive Bayes are excellent. We achieved 100% correctness in every metric, which means that the model flawlessly detects cancer cases based on behavioral studies.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.idea		.idea
pictures		pictures
reports		reports
README.md		README.md
data_overview.py		data_overview.py
missing_data.py		missing_data.py
models.py		models.py
models_from_article.py		models_from_article.py
pca.py		pca.py
sobar-72.csv		sobar-72.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

pictures

pictures

reports

reports

README.md

README.md

data_overview.py

data_overview.py

missing_data.py

missing_data.py

models.py

models.py

models_from_article.py

models_from_article.py

pca.py

pca.py

sobar-72.csv

sobar-72.csv

Repository files navigation

Cervical cancer behavior risk

Variables

Dataset

Aim of the work

Summary of results

Logistic Regression (LR)

Naive Bayes (NB)

Conclusion

About

Releases

Packages

Languages

m0gr1m/Cervical_cancer_behavior_risk

Folders and files

Latest commit

History

Repository files navigation

Cervical cancer behavior risk

Variables

Dataset

Aim of the work

Summary of results

Logistic Regression (LR)

Naive Bayes (NB)

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Languages