title	permalink	categories
Classification example	examples/classification	tutorials

Imagine you want to classify a collection of images into two groups. For simplicity, we can call one positive and the other negative. If these are, for example, scans of documents, they might be documents where there is text versus blank pages (this was the original motivation); or it might be something else entirely.

Here is a possible solution, using mahotas and milk.

Start by creating two directories: positives/ and negatives/ where you will manually pick out a few examples of positive and negative.
I will assume that the rest of the data is in an unlabeled/ directory
Compute features for all of the images in positives and negatives learn a classifier.
Use that classifier on the unlabeled images

In the code below I used jug to give you the possibility of running it on multiple processors, but the code also works if you remove every line which mentions TaskGenerator:

from glob import glob
import mahotas
import mahotas.features
import milk
from jug import TaskGenerator


@TaskGenerator
def features(imname):
    img = mahotas.imread(imname)
    return mahotas.features.haralick(img).mean(0)

@TaskGenerator
def learn_model(features, labels):
    learner = milk.defaultclassifier()
    return learner.train(features, labels)

@TaskGenerator
def classify(model, features):
     return model.apply(features)

positives = glob('positives/*.jpg')
negatives = glob('negatives/*.jpg')
unlabeled = glob('unlabeled/*.jpg')


features = map(features, negatives + positives)
labels = [0] * len(negatives) + [1] * len(positives)

model = learn_model(features, labels)

labeled = [classify(model, features(u)) for u in unlabeled]

This uses texture features, which is probably good enough, but you can play with other features in mahotas.features if you’d like (or try mahotas.surf, but that gets more complicated).

(I originally wrote this as a response to a question on Stackoverflow [http://stackoverflow.com/q/5426482/248279]).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classification-example.md

classification-example.md

Files

classification-example.md

Latest commit

History

classification-example.md

File metadata and controls