Package org.apache.lucene.classification
Class KNearestNeighborClassifier
- java.lang.Object
-
- org.apache.lucene.classification.KNearestNeighborClassifier
-
- All Implemented Interfaces:
Classifier<BytesRef>
- Direct Known Subclasses:
KNearestNeighborDocumentClassifier
public class KNearestNeighborClassifier extends Object implements Classifier<BytesRef>
A k-Nearest Neighbor classifier (seehttp://en.wikipedia.org/wiki/K-nearest_neighbors) based onMoreLikeThis- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Field Summary
Fields Modifier and Type Field Description protected StringclassFieldNamethe name of the field used as the output textprotected IndexSearcherindexSearcheranIndexSearcherused to perform queriesprotected intkthe no.protected MoreLikeThismltaMoreLikeThisinstance used to perform MLT queriesprotected QueryqueryaQueryused to filter the documents that should be used from this classifier's underlyingLeafReaderprotected String[]textFieldNamesthe name of the fields used as the input text
-
Constructor Summary
Constructors Constructor Description KNearestNeighborClassifier(IndexReader indexReader, Similarity similarity, Analyzer analyzer, Query query, int k, int minDocsFreq, int minTermFreq, String classFieldName, String... textFieldNames)Creates aKNearestNeighborClassifier.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ClassificationResult<BytesRef>assignClass(String text)Assign a class (with score) to the given text Stringprotected List<ClassificationResult<BytesRef>>buildListFromTopDocs(TopDocs topDocs)build a list of classification results from search resultsprotected ClassificationResult<BytesRef>classifyFromTopDocs(TopDocs knnResults)TODOList<ClassificationResult<BytesRef>>getClasses(String text)Get all the classes (sorted by score, descending) assigned to the given text String.List<ClassificationResult<BytesRef>>getClasses(String text, int max)Get the firstmaxclasses (sorted by score, descending) assigned to the given text String.StringtoString()
-
-
-
Field Detail
-
mlt
protected final MoreLikeThis mlt
aMoreLikeThisinstance used to perform MLT queries
-
textFieldNames
protected final String[] textFieldNames
the name of the fields used as the input text
-
classFieldName
protected final String classFieldName
the name of the field used as the output text
-
indexSearcher
protected final IndexSearcher indexSearcher
anIndexSearcherused to perform queries
-
k
protected final int k
the no. of docs to compare in order to find the nearest neighbor to the input text
-
query
protected final Query query
aQueryused to filter the documents that should be used from this classifier's underlyingLeafReader
-
-
Constructor Detail
-
KNearestNeighborClassifier
public KNearestNeighborClassifier(IndexReader indexReader, Similarity similarity, Analyzer analyzer, Query query, int k, int minDocsFreq, int minTermFreq, String classFieldName, String... textFieldNames) throws IOException
Creates aKNearestNeighborClassifier.- Parameters:
indexReader- the reader on the index to be used for classificationanalyzer- anAnalyzerused to analyze unseen textsimilarity- theSimilarityto be used by the underlyingIndexSearcherornull(defaults toBM25Similarity)query- aQueryto eventually filter the docs used for training the classifier, ornullif all the indexed docs should be usedk- the no. of docs to select in the MLT results to find the nearest neighborminDocsFreq-MoreLikeThis.minDocFreqparameterminTermFreq-MoreLikeThis.minTermFreqparameterclassFieldName- the name of the field used as the output for the classifiertextFieldNames- the name of the fields used as the inputs for the classifier, they can contain boosting indication e.g. title^10- Throws:
IOException
-
-
Method Detail
-
assignClass
public ClassificationResult<BytesRef> assignClass(String text) throws IOException
Description copied from interface:ClassifierAssign a class (with score) to the given text String- Specified by:
assignClassin interfaceClassifier<BytesRef>- Parameters:
text- a String containing text to be classified- Returns:
- a
ClassificationResultholding assigned class of typeTand score - Throws:
IOException- If there is a low-level I/O error.
-
classifyFromTopDocs
protected ClassificationResult<BytesRef> classifyFromTopDocs(TopDocs knnResults) throws IOException
TODO- Throws:
IOException
-
getClasses
public List<ClassificationResult<BytesRef>> getClasses(String text) throws IOException
Description copied from interface:ClassifierGet all the classes (sorted by score, descending) assigned to the given text String.- Specified by:
getClassesin interfaceClassifier<BytesRef>- Parameters:
text- a String containing text to be classified- Returns:
- the whole list of
ClassificationResult, the classes and scores. Returnsnullif the classifier can't make lists. - Throws:
IOException- If there is a low-level I/O error.
-
getClasses
public List<ClassificationResult<BytesRef>> getClasses(String text, int max) throws IOException
Description copied from interface:ClassifierGet the firstmaxclasses (sorted by score, descending) assigned to the given text String.- Specified by:
getClassesin interfaceClassifier<BytesRef>- Parameters:
text- a String containing text to be classifiedmax- the number of return list elements- Returns:
- the whole list of
ClassificationResult, the classes and scores. Cut for "max" number of elements. Returnsnullif the classifier can't make lists. - Throws:
IOException- If there is a low-level I/O error.
-
buildListFromTopDocs
protected List<ClassificationResult<BytesRef>> buildListFromTopDocs(TopDocs topDocs) throws IOException
build a list of classification results from search results- Parameters:
topDocs- the search results as aTopDocsobject- Returns:
- a
ListofClassificationResult, one for each existing class - Throws:
IOException- if it's not possible to get the stored value of class field
-
-