Class Axiomatic
- java.lang.Object
-
- org.apache.lucene.search.similarities.Similarity
-
- org.apache.lucene.search.similarities.SimilarityBase
-
- org.apache.lucene.search.similarities.Axiomatic
-
- Direct Known Subclasses:
AxiomaticF1EXP,AxiomaticF1LOG,AxiomaticF2EXP,AxiomaticF2LOG,AxiomaticF3EXP,AxiomaticF3LOG
public abstract class Axiomatic extends SimilarityBase
Axiomatic approaches for IR. From Hui Fang and Chengxiang Zhai 2005. An Exploration of Axiomatic Approaches to Information Retrieval. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '05). ACM, New York, NY, USA, 480-487.There are a family of models. All of them are based on BM25, Pivoted Document Length Normalization and Language model with Dirichlet prior. Some components (e.g. Term Frequency, Inverted Document Frequency) in the original models are modified so that they follow some axiomatic constraints.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer
-
-
Constructor Summary
Constructors Constructor Description Axiomatic()Default constructorAxiomatic(boolean discountOverlaps, float s, int queryLen, float k)Constructor setting all Axiomatic hyperparametersAxiomatic(float s)Constructor setting only s, letting k and queryLen to defaultAxiomatic(float s, int queryLen)Constructor setting s and queryLen, letting k to defaultAxiomatic(float s, int queryLen, float k)Constructor setting all Axiomatic hyperparameters and using default discountOverlaps value.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected voidexplain(List<Explanation> subs, BasicStats stats, double freq, double docLen)Subclasses should implement this method to explain the score.protected Explanationexplain(BasicStats stats, Explanation freq, double docLen)Explains the score.protected abstract doublegamma(BasicStats stats, double freq, double docLen)compute the gamma component (only for F3EXp and F3LOG)protected abstract doubleidf(BasicStats stats, double freq, double docLen)compute the inverted document frequency componentprotected abstract ExplanationidfExplain(BasicStats stats, double freq, double docLen)Explain the score of the inverted document frequency component for a single documentprotected abstract doubleln(BasicStats stats, double freq, double docLen)compute the document length componentprotected abstract ExplanationlnExplain(BasicStats stats, double freq, double docLen)Explain the score of the document length component for a single documentdoublescore(BasicStats stats, double freq, double docLen)Scores the documentdoc.protected abstract doubletf(BasicStats stats, double freq, double docLen)compute the term frequency componentprotected abstract ExplanationtfExplain(BasicStats stats, double freq, double docLen)Explain the score of the term frequency component for a single documentprotected abstract doubletfln(BasicStats stats, double freq, double docLen)compute the mixed term frequency and document length componentprotected abstract ExplanationtflnExplain(BasicStats stats, double freq, double docLen)Explain the score of the mixed term frequency and document length component for a single documentabstract StringtoString()Name of the axiomatic method.-
Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
fillBasicStats, log2, newStats, scorer
-
Methods inherited from class org.apache.lucene.search.similarities.Similarity
computeNorm, getDiscountOverlaps
-
-
-
-
Constructor Detail
-
Axiomatic
public Axiomatic(float s, int queryLen, float k)Constructor setting all Axiomatic hyperparameters and using default discountOverlaps value.- Parameters:
s- hyperparam for the growth functionqueryLen- the query lengthk- hyperparam for the primitive weighting function
-
Axiomatic
public Axiomatic(boolean discountOverlaps, float s, int queryLen, float k)Constructor setting all Axiomatic hyperparameters- Parameters:
discountOverlaps- true if overlap tokens should not impact document length for scoring.s- hyperparam for the growth functionqueryLen- the query lengthk- hyperparam for the primitive weighting function
-
Axiomatic
public Axiomatic(float s)
Constructor setting only s, letting k and queryLen to default- Parameters:
s- hyperparam for the growth function
-
Axiomatic
public Axiomatic(float s, int queryLen)Constructor setting s and queryLen, letting k to default- Parameters:
s- hyperparam for the growth functionqueryLen- the query length
-
Axiomatic
public Axiomatic()
Default constructor
-
-
Method Detail
-
score
public double score(BasicStats stats, double freq, double docLen)
Description copied from class:SimilarityBaseScores the documentdoc.Subclasses must apply their scoring formula in this class.
- Specified by:
scorein classSimilarityBase- Parameters:
stats- the corpus level statistics.freq- the term frequency.docLen- the document length.- Returns:
- the score.
-
explain
protected Explanation explain(BasicStats stats, Explanation freq, double docLen)
Description copied from class:SimilarityBaseExplains the score. The implementation here provides a basic explanation in the format score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via theSimilarityBase.score(BasicStats, double, double)method) and the explanation for the term frequency. Subclasses content with this format may add additional details inSimilarityBase.explain(List, BasicStats, double, double).- Overrides:
explainin classSimilarityBase- Parameters:
stats- the corpus level statistics.freq- the term frequency and its explanation.docLen- the document length.- Returns:
- the explanation.
-
explain
protected void explain(List<Explanation> subs, BasicStats stats, double freq, double docLen)
Description copied from class:SimilarityBaseSubclasses should implement this method to explain the score.explalready contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.The default implementation does nothing.
- Overrides:
explainin classSimilarityBase- Parameters:
subs- the list of details of the explanation to extendstats- the corpus level statistics.freq- the term frequency.docLen- the document length.
-
toString
public abstract String toString()
Name of the axiomatic method.- Specified by:
toStringin classSimilarityBase
-
tf
protected abstract double tf(BasicStats stats, double freq, double docLen)
compute the term frequency component
-
ln
protected abstract double ln(BasicStats stats, double freq, double docLen)
compute the document length component
-
tfln
protected abstract double tfln(BasicStats stats, double freq, double docLen)
compute the mixed term frequency and document length component
-
idf
protected abstract double idf(BasicStats stats, double freq, double docLen)
compute the inverted document frequency component
-
gamma
protected abstract double gamma(BasicStats stats, double freq, double docLen)
compute the gamma component (only for F3EXp and F3LOG)
-
tfExplain
protected abstract Explanation tfExplain(BasicStats stats, double freq, double docLen)
Explain the score of the term frequency component for a single document- Parameters:
stats- the corpus level statisticsfreq- number of occurrences of term in the documentdocLen- the document length- Returns:
- Explanation of how the tf component was computed
-
lnExplain
protected abstract Explanation lnExplain(BasicStats stats, double freq, double docLen)
Explain the score of the document length component for a single document- Parameters:
stats- the corpus level statisticsfreq- number of occurrences of term in the documentdocLen- the document length- Returns:
- Explanation of how the ln component was computed
-
tflnExplain
protected abstract Explanation tflnExplain(BasicStats stats, double freq, double docLen)
Explain the score of the mixed term frequency and document length component for a single document- Parameters:
stats- the corpus level statisticsfreq- number of occurrences of term in the documentdocLen- the document length- Returns:
- Explanation of how the tfln component was computed
-
idfExplain
protected abstract Explanation idfExplain(BasicStats stats, double freq, double docLen)
Explain the score of the inverted document frequency component for a single document- Parameters:
stats- the corpus level statisticsfreq- number of occurrences of term in the documentdocLen- the document length- Returns:
- Explanation of how the idf component was computed
-
-