Skip to content

Naive Bayes for Online Classification with Concept Drift

Notifications You must be signed in to change notification settings

aschersleben/NBCD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NBCD: Naive Bayes for Online Classification with Concept Drift

This package provides an online classification method based on Naive Bayes, that is able to handle concept drift. Furthermore, it comes with extended Naive Bayes functions, that can be printed, plotted, predicted and updated (see below). The same holds for the NBCD method.

Travis-CI Build Status AppVeyor Build Status

Installation:

devtools::install_github("aschersleben/NBCD", build_vignettes = TRUE)
library("NBCD")

Simple example:

We use the well-known iris dataset and add a "concept drift":

set.seed(1234)
iris2 <- iris[sample(150), ]
iris2$Sepal.Width <- iris2$Sepal.Width + seq(1, 30, len = 150) # <- adding a "Concept Drift"
model <- makeNBCDmodel(list(x = iris2[1:120, 1:4], class = iris2[1:120, 5], time = 1:120), model = NULL,
                       discretize = "fixed", discParams = list(Sepal.Length = 4:8),
                       init.obs = 20, max.waiting.time = 20, waiting.time = "auto")
print(model)

For plotting, the NBCD package uses ggplot2.

plot(model, ylim = c(25, 35))
plot(model, ylim = c(25, 35), use.lm = TRUE, time = 150)

You can directly add data to the plots, predictions are included automatically:

plot(model, ylim = c(20, 40), use.lm = FALSE,
     data = iris2[140:150, ], class.name = "Species")
plot(model, ylim = c(20, 40), use.lm = TRUE, time = 150,
     data = iris2[140:150, ], class.name = "Species")

About the NBCD method:

See vignette via

vignette("NBCD")

Extended Naive Bayes:

This package also includes nb2(), an extended version of naiveBayes() from e1071. It can be updated with new observations and includes an automated discretization.

At the first look, there is no difference to the e1071 function:

mod <- nb2(iris[, 1:4], iris[, 5])
print(mod)

But you can not only print but also plot the model:

plot(mod)
plot(mod, data = iris, class.name = "Species")

Easy discretization (= specifying limits for the categories):

discParam <- list(Sepal.L = 4:8, Sepal.W = 1:5)
mod2 <- nb2(iris[, 1:4], iris[, 5], discretize = "fixed", discParams = discParam)
print(mod2)
plot(mod2, data = iris, class.name = "Species")

Easy updates (= adding new observations to the model without re-computing):

mod.upd <- update(mod, newdata = iris[1:50, 1:4], y = iris$Species[1:50])
print(mod.upd)

Easy updates for discretized variables (= no previous, manual discretization necessary):

mod2.upd <- update(mod2, newdata = iris[51:100, 1:4], y = iris$Species[51:100])
print(mod2.upd)

Concept Drift:

Read about concept drift in Webb et al. (2016, DOI:10.1007/s10618-015-0448-4).

About

Naive Bayes for Online Classification with Concept Drift

Resources

Stars

Watchers

Forks

Packages

No packages published