Classification assigns individual items to discrete groups pre-specified by the user, based on features of the items. The classification module within the PAL Framework provides a unified API for three classification algorithms: Transformed Weight-Normalized Complement Naive Bayes, Maximum Entropy, and Decision Trees.
Overview
Classification assigns individual items to discrete groups pre-specified by the user, based on features of the items. The classification module within the PAL Framework provides a unified API for classifiers and implementations of three specific classification mechanisms:
- Transformed Weight-normalized Complement Naive Bayes
- MaxEnt
- Decision Tree
A classifier performs a mapping from a feature space X to a discrete set of labels Y. In other words, a classifier assigns a pre-defined class label to a sample. A classifier takes an object or a situation described by a set of attributes as an input and returns a “decision” – the predicted label. For example, a spam classifier labels an email as “Spam” or “Non Spam” based on some attributes of the email (sender, body, etc…).
Classifiers are used in many different areas including computer vision (medical image analysis, optical character recognition), speech recognition, natural language processing, drug discovery, document classification, internet search, etc. See the Limitations section below regarding the currently supported algorithms.
Different classification algorithms have different characteristics that can impact their suitability for a given problem. The classification framework provides a flexible means of accessing different classification algorithms though a common API.
Prerequisites
- Java 1.5 or above
Limitations
- The TWCNB and MaxEnt classifiers only support textual data.
Available Classifiers
Name | Description | Advantages | Disadvantages |
---|---|---|---|
TWCNB |
|
|
|
MaxEnt |
|
|
|
Decision Tree |
|
|
|
Overview: DISTAR 14982 – Approved for Public Release, Distribution Unlimited
API and Example: DISTAR 15075 – Approved for Public Release, Distribution Unlimited