Class TrecContentSource
- java.lang.Object
-
- org.apache.lucene.benchmark.byTask.feeds.ContentItemsSource
-
- org.apache.lucene.benchmark.byTask.feeds.ContentSource
-
- org.apache.lucene.benchmark.byTask.feeds.TrecContentSource
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public class TrecContentSource extends ContentSource
Implements aContentSourceover the TREC collection.Supports the following configuration parameters (on top of
ContentSource):- work.dir - specifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dir - specifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parser - specifies the
TrecDocParserclass to use for parsing the TREC documents content (default=TrecGov2Parser). - html.parser - specifies the
HTMLParserclass to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser). - content.source.encoding - if not specified, ISO-8859-1 is used.
- content.source.excludeIteration - if true, do not append iteration number to docname
-
-
Field Summary
Fields Modifier and Type Field Description static StringDOCstatic StringDOCNOstatic StringNEW_LINEseparator between lines in the byfferstatic StringTERMINATING_DOCstatic StringTERMINATING_DOCNO-
Fields inherited from class org.apache.lucene.benchmark.byTask.feeds.ContentItemsSource
encoding, forever, logStep, verbose
-
-
Constructor Summary
Constructors Constructor Description TrecContentSource()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Called when reading from this content source is no longer required.DocDatagetNextDocData(DocData docData)Returns the nextDocDatafrom the content source.DateparseDate(String dateStr)voidresetInputs()Resets the input for this content source, so that the test would behave as if it was just started, input-wise.voidsetConfig(Config config)Sets theConfigfor this content source.-
Methods inherited from class org.apache.lucene.benchmark.byTask.feeds.ContentItemsSource
addBytes, addItem, collectFiles, getBytesCount, getConfig, getItemsCount, getTotalBytesCount, getTotalItemsCount, printStatistics, shouldLog
-
-
-
-
Field Detail
-
DOCNO
public static final String DOCNO
- See Also:
- Constant Field Values
-
TERMINATING_DOCNO
public static final String TERMINATING_DOCNO
- See Also:
- Constant Field Values
-
DOC
public static final String DOC
- See Also:
- Constant Field Values
-
TERMINATING_DOC
public static final String TERMINATING_DOC
- See Also:
- Constant Field Values
-
NEW_LINE
public static final String NEW_LINE
separator between lines in the byffer
-
-
Method Detail
-
close
public void close() throws IOExceptionDescription copied from class:ContentItemsSourceCalled when reading from this content source is no longer required.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein classContentItemsSource- Throws:
IOException
-
getNextDocData
public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException
Description copied from class:ContentSourceReturns the nextDocDatafrom the content source. Implementations must account for multi-threading, as multiple threads can call this method simultaneously.- Specified by:
getNextDocDatain classContentSource- Throws:
NoMoreDataExceptionIOException
-
resetInputs
public void resetInputs() throws IOExceptionDescription copied from class:ContentItemsSourceResets the input for this content source, so that the test would behave as if it was just started, input-wise.NOTE: the default implementation resets the number of bytes and items generated since the last reset, so it's important to call super.resetInputs in case you override this method.
- Overrides:
resetInputsin classContentItemsSource- Throws:
IOException
-
setConfig
public void setConfig(Config config)
Description copied from class:ContentItemsSourceSets theConfigfor this content source. If you override this method, you must call super.setConfig.- Overrides:
setConfigin classContentItemsSource
-
-