The following sections show how to obtain predictions/classifications without writing your own Java code via the command line.
After a model has been saved, one can make predictions for a test set, whether that set contains valid class values or not. The output will contain both the actual and predicted class. (Note that if the test class contains simply '?' for the class label for each instance, the "actual" class label for each instance will not contain useful information, but the predicted class label will.) The -T command-line switch specifies the dataset of instances whose classes are to be predicted, while the -p switch allows the user to write out a range of attributes (examples: "1-2" for the first and second attributes, or "0" for no attributes). Sample command line:
java weka.classifiers.trees.J48 -T unclassified.arff -l j48.model -p 0
The format of the output is as follows:
where "+" occurs only for those items that were mispredicted. Note that if the actual class label is always "?" (i.e., the dataset does not include known class labels), the error column will always be empty.
inst# actual predicted error prediction 1 1:? 1:0 0.757 2 1:? 1:0 0.824 3 1:? 1:0 0.807 4 1:? 1:0 0.807 5 1:? 1:0 0.79 6 1:? 2:1 0.661 .
In this case, taken directly from a test dataset where all class attributes were marked by "?", the "actual" column, which can be ignored, simply states that each class belongs to an unknown class. The "predicted" column shows that instances 1 through 5 are predicted to be of class 1, whose value is 0, and instance 6 is predicted to be of class 2, whose value is 1. The error field is empty; if predictions were being performed on a labeled test set, each instance where the prediction failed to match the label would contain a "+". The probability that instance 1 actually belongs to class 0 is estimated at 0.757.
Notes:
The AddClassification filter (package weka.filters.supervised.attribute ) can either train a classifier on the input data and transform this or load a serialized model to transform the input data (even though the filter was introduced in 3.5.4, due to a bug in the commandline option handling, it is recommended to download a version >3.5.5 from the Weka homepage). This filter can add the classification, class distribution and the error per row as extra attributes to the dataset.
java \ weka.filters.supervised.attribute.AddClassification \ -W "weka.classifiers.trees.J48" \ -classification \ -remove-old-class \ -i train.arff \ -o train_classified.arff \ -c last
* using a serialized model, e.g., a J48 model, to replace the class values with the ones predicted by the serialized model:
java \ weka.filters.supervised.attribute.AddClassification \ -serialized /some/where/j48.model \ -classification \ -remove-old-class \ -i train.arff \ -o train_classified.arff \ -c last
The Weka GUI allows you as well to output predictions based on a previously saved model.
See the Explorer section of the Saving and loading models article to setup the Explorer. Additionally, you need to check the Output predictions options in the More options dialog. Right-clicking on the respective results history item and selecting Re-evaluate model on current test set will output then the predictions as well (the statistics will be useless due to missing class values in the test set, so just ignore them). The output is similar to the one produced by the commandline.
Example output for the anneal UCI dataset:
== Predictions on test set == inst#, actual, predicted, error, probability distribution 1 ? 3:3 + 0 0 *1 0 0 0 2 ? 3:3 + 0 0 *1 0 0 0 3 ? 3:3 + 0 0 *1 0 0 0 . 17 ? 6:U + 0 0 0 0 0 *1 18 ? 6:U + 0 0 0 0 0 *1 19 ? 3:3 + 0 0 *1 0 0 0 20 ? 3:3 + 0 0 *1 0 0 0 .
Note: The developer version (>3.5.6) can also output additional attributes like the commandline with the -p option. In the More options. dialog you can specify those attribute indices with Output additional attributes, e.g., first or 1-7. In contrast to the commandline, this output also works for cross-validation.
With the PredictionAppender (from the Evaluation toolbar) you cannot use an already saved model, but you can train a classifier on a dataset and output an ARFF file with the predictions appended as additional attribute. Here's an example setup:
/---dataSet--> TrainingSetMaker ---trainingSet--\ ArffLoader --< >--> J48. \---dataSet--> TestSetMaker -------testSet------/ . J48 --batchClassifier--> PredictionAppender --testSet--> ArffSaver
The AddClassification filter can be used in the KnowledgeFlow as well, either for training a model, or for using a serialized model to perform the predictions. An example setup could look like this:
ArffLoader --dataSet--> ClassAssigner --dataSet--> AddClassification --dataSet--> ArffSaver
If you want to perform the classification within your own code, see the classifying instances section of this article, explaining the Weka API in general.
The developer version shortly before the release of 3.5.6 was used as basis for this article.