MinorThird provides methods for storing, annotating, and categorizing text as well as learning to extract entities.
Overview
MinorThird provides methods for storing, annotating, and categorizing text as well as learning to extract entities. MinorThird was developed by researchers at Carnegie Mellon University, primarily with DARPA funding. Additional information on MinorThird is available at http://minorthird.sourceforge.net/old/doc/.
MinorThird contains a number of methods for learning to extract annotations (logical assertions) of documents and for learning to classify documents based on these annotations. MinorThird is bundled with a special-purpose annotation language called Mixup. This language can generate features for learning algorithms. Within the MinorThird system, learning is performed using the classification package that is tightly integrated with Mixup.
MinorThird includes several state-of-the-art sequential learning methods (like conditional random fields and discriminative training methods for training hidden Markov models).
One practical difficulty when using learning techniques to solve Natural Language Processing (NLP) problems is that the input to learners is the result of a complex chain of transformations, which begin with text and end with very low-level representations. Verifying the correctness of this chain of derivations can be difficult. To address this problem, MinorThird also includes several tools for visualizing transformed data and relating it to the text from which it was derived. In this way, MinorThird provides an integrated system for annotating and visualizing text with state-of-the-art learning methods.
Prerequisites
- Java 1.5 or later
Overview: DISTAR 14982 – Approved for Public Release, Distribution Unlimited