Class MergePolicy
- java.lang.Object
-
- org.apache.lucene.index.MergePolicy
-
- Direct Known Subclasses:
FilterMergePolicy,LogMergePolicy,NoMergePolicy,TieredMergePolicy
public abstract class MergePolicy extends Object
Expert: a MergePolicy determines the sequence of primitive merge operations.Whenever the segments in an index have been altered by
IndexWriter, either the addition of a newly flushed segment, addition of many segments from addIndexes* calls, or a previous merge that may now need to cascade,IndexWriterinvokesfindMerges(org.apache.lucene.index.MergeTrigger, org.apache.lucene.index.SegmentInfos, org.apache.lucene.index.MergePolicy.MergeContext)to give the MergePolicy a chance to pick merges that are now required. This method returns aMergePolicy.MergeSpecificationinstance describing the set of merges that should be done, or null if no merges are necessary. When IndexWriter.forceMerge is called, it callsfindForcedMerges(SegmentInfos, int, Map, MergeContext)and the MergePolicy should then return the necessary merges.Note that the policy can return more than one merge at a time. In this case, if the writer is using
SerialMergeScheduler, the merges will be run sequentially but if it is usingConcurrentMergeSchedulerthey will be run concurrently.The default MergePolicy is
TieredMergePolicy.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classMergePolicy.MergeAbortedExceptionThrown when a merge was explicitly aborted becauseIndexWriter.abortMerges()was called.static interfaceMergePolicy.MergeContextThis interface represents the current context of the merge selection process.static classMergePolicy.MergeExceptionException thrown if there are any problems while executing a merge.static classMergePolicy.MergeSpecificationA MergeSpecification instance provides the information necessary to perform multiple merges.static classMergePolicy.OneMergeOneMerge provides the information necessary to perform an individual primitive merge operation, resulting in a single new segment.static classMergePolicy.OneMergeProgressProgress and state for an executing merge.
-
Field Summary
Fields Modifier and Type Field Description protected static longDEFAULT_MAX_CFS_SEGMENT_SIZEDefault max segment size in order to use compound file system.protected static doubleDEFAULT_NO_CFS_RATIODefault ratio for compound file system usage.protected longmaxCFSSegmentSizeIf the size of the merged segment exceeds this value then it will not use compound file format.protected doublenoCFSRatioIf the size of the merge segment exceeds this ratio of the total index size then it will remain in non-compound format
-
Constructor Summary
Constructors Modifier Constructor Description protectedMergePolicy()Creates a new merge policy instance.protectedMergePolicy(double defaultNoCFSRatio, long defaultMaxCFSSegmentSize)Creates a new merge policy instance with default settings for noCFSRatio and maxCFSSegmentSize.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected booleanassertDelCount(int delCount, SegmentCommitInfo info)Asserts that the delCount for this SegmentCommitInfo is validabstract MergePolicy.MergeSpecificationfindForcedDeletesMerges(SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext)Determine what set of merge operations is necessary in order to expunge all deletes from the index.abstract MergePolicy.MergeSpecificationfindForcedMerges(SegmentInfos segmentInfos, int maxSegmentCount, Map<SegmentCommitInfo,Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext)Determine what set of merge operations is necessary in order to merge to<=the specified segment count.MergePolicy.MergeSpecificationfindFullFlushMerges(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext)Identifies merges that we want to execute (synchronously) on commit.MergePolicy.MergeSpecificationfindMerges(CodecReader... readers)Define the set of merge operations to perform on provided codec readers inIndexWriter.addIndexes(CodecReader...).abstract MergePolicy.MergeSpecificationfindMerges(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext)Determine what set of merge operations are now necessary on the index.doublegetMaxCFSSegmentSizeMB()Returns the largest size allowed for a compound file segmentdoublegetNoCFSRatio()Returns currentnoCFSRatio.protected booleanisMerged(SegmentInfos infos, SegmentCommitInfo info, MergePolicy.MergeContext mergeContext)Returns true if this single info is already fully merged (has no pending deletes, is in the same dir as the writer, and matches the current compound file settingbooleankeepFullyDeletedSegment(IOSupplier<CodecReader> readerIOSupplier)Returns true if the segment represented by the given CodecReader should be keep even if it's fully deleted.protected longmaxFullFlushMergeSize()Return the maximum size of segments to be included in full-flush merges by the default implementation offindFullFlushMerges(org.apache.lucene.index.MergeTrigger, org.apache.lucene.index.SegmentInfos, org.apache.lucene.index.MergePolicy.MergeContext).protected voidmessage(String message, MergePolicy.MergeContext mergeContext)Print a debug message toMergePolicy.MergeContext'sinfoStream.intnumDeletesToMerge(SegmentCommitInfo info, int delCount, IOSupplier<CodecReader> readerSupplier)Returns the number of deletes that a merge would claim on the given segment.protected StringsegString(MergePolicy.MergeContext mergeContext, Iterable<SegmentCommitInfo> infos)Builds a String representation of the given SegmentCommitInfo instancesvoidsetMaxCFSSegmentSizeMB(double v)If a merged segment will be more than this value, leave the segment as non-compound file even if compound file is enabled.voidsetNoCFSRatio(double noCFSRatio)If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled.protected longsize(SegmentCommitInfo info, MergePolicy.MergeContext mergeContext)Return the byte size of the providedSegmentCommitInfo, pro-rated by percentage of non-deleted documents is set.booleanuseCompoundFile(SegmentInfos infos, SegmentCommitInfo mergedInfo, MergePolicy.MergeContext mergeContext)Returns true if a new segment (regardless of its origin) should use the compound file format.protected booleanverbose(MergePolicy.MergeContext mergeContext)Returnstrueif the info-stream is in verbose mode
-
-
-
Field Detail
-
DEFAULT_NO_CFS_RATIO
protected static final double DEFAULT_NO_CFS_RATIO
Default ratio for compound file system usage. Set to1.0, always use compound file system.- See Also:
- Constant Field Values
-
DEFAULT_MAX_CFS_SEGMENT_SIZE
protected static final long DEFAULT_MAX_CFS_SEGMENT_SIZE
Default max segment size in order to use compound file system. Set toLong.MAX_VALUE.- See Also:
- Constant Field Values
-
noCFSRatio
protected double noCFSRatio
If the size of the merge segment exceeds this ratio of the total index size then it will remain in non-compound format
-
maxCFSSegmentSize
protected long maxCFSSegmentSize
If the size of the merged segment exceeds this value then it will not use compound file format.
-
-
Constructor Detail
-
MergePolicy
protected MergePolicy()
Creates a new merge policy instance.
-
MergePolicy
protected MergePolicy(double defaultNoCFSRatio, long defaultMaxCFSSegmentSize)Creates a new merge policy instance with default settings for noCFSRatio and maxCFSSegmentSize. This ctor should be used by subclasses using different defaults than theMergePolicy
-
-
Method Detail
-
findMerges
public abstract MergePolicy.MergeSpecification findMerges(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException
Determine what set of merge operations are now necessary on the index.IndexWritercalls this whenever there is a change to the segments. This call is always synchronized on theIndexWriterinstance so only one thread at a time will call this method.- Parameters:
mergeTrigger- the event that triggered the mergesegmentInfos- the total set of segments in the indexmergeContext- the IndexWriter to find the merges on- Throws:
IOException
-
findMerges
public MergePolicy.MergeSpecification findMerges(CodecReader... readers) throws IOException
Define the set of merge operations to perform on provided codec readers inIndexWriter.addIndexes(CodecReader...).The merge operation is required to convert provided readers into segments that can be added to the writer. This API can be overridden in custom merge policies to control the concurrency for addIndexes. Default implementation creates a single merge operation for all provided readers (lowest concurrency). Creating a merge for each reader, would provide the highest level of concurrency possible with the configured merge scheduler.
- Parameters:
readers- CodecReader(s) to merge into the main index- Throws:
IOException
-
findForcedMerges
public abstract MergePolicy.MergeSpecification findForcedMerges(SegmentInfos segmentInfos, int maxSegmentCount, Map<SegmentCommitInfo,Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) throws IOException
Determine what set of merge operations is necessary in order to merge to<=the specified segment count.IndexWritercalls this when itsIndexWriter.forceMerge(int)method is called. This call is always synchronized on theIndexWriterinstance so only one thread at a time will call this method.- Parameters:
segmentInfos- the total set of segments in the indexmaxSegmentCount- requested maximum number of segments in the indexsegmentsToMerge- contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos. If the value is True for a given SegmentInfo, that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.mergeContext- the MergeContext to find the merges on- Throws:
IOException
-
findForcedDeletesMerges
public abstract MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException
Determine what set of merge operations is necessary in order to expunge all deletes from the index.- Parameters:
segmentInfos- the total set of segments in the indexmergeContext- the MergeContext to find the merges on- Throws:
IOException
-
findFullFlushMerges
public MergePolicy.MergeSpecification findFullFlushMerges(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException
Identifies merges that we want to execute (synchronously) on commit. By default, this will returnnatural mergeswhose segments are all less than themax segment size for full flushes.Any merges returned here will make
IndexWriter.commit(),IndexWriter.prepareCommit()orIndexWriter.getReader(boolean, boolean)block until the merges complete or untilLiveIndexWriterConfig.getMaxFullFlushMergeWaitMillis()has elapsed. This may be used to merge small segments that have just been flushed, reducing the number of segments in the point in time snapshot. If a merge does not complete in the allotted time, it will continue to execute, and eventually finish and apply to future point in time snapshot, but will not be reflected in the current one.If a
MergePolicy.OneMergein the returnedMergePolicy.MergeSpecificationincludes a segment already included in a registered merge, thenIndexWriter.commit()orIndexWriter.prepareCommit()will throw aIllegalStateException. UseMergePolicy.MergeContext.getMergingSegments()to determine which segments are currently registered to merge.- Parameters:
mergeTrigger- the event that triggered the merge (COMMIT or GET_READER).segmentInfos- the total set of segments in the index (while preparing the commit)mergeContext- the MergeContext to find the merges on, which should be used to determine which segments are already in a registered merge (seeMergePolicy.MergeContext.getMergingSegments()).- Throws:
IOException
-
useCompoundFile
public boolean useCompoundFile(SegmentInfos infos, SegmentCommitInfo mergedInfo, MergePolicy.MergeContext mergeContext) throws IOException
Returns true if a new segment (regardless of its origin) should use the compound file format. The default implementation returnstrueiff the size of the given mergedInfo is less or equal togetMaxCFSSegmentSizeMB()and the size is less or equal to the TotalIndexSize *getNoCFSRatio()otherwisefalse.- Throws:
IOException
-
size
protected long size(SegmentCommitInfo info, MergePolicy.MergeContext mergeContext) throws IOException
Return the byte size of the providedSegmentCommitInfo, pro-rated by percentage of non-deleted documents is set.- Throws:
IOException
-
maxFullFlushMergeSize
protected long maxFullFlushMergeSize()
Return the maximum size of segments to be included in full-flush merges by the default implementation offindFullFlushMerges(org.apache.lucene.index.MergeTrigger, org.apache.lucene.index.SegmentInfos, org.apache.lucene.index.MergePolicy.MergeContext).
-
assertDelCount
protected final boolean assertDelCount(int delCount, SegmentCommitInfo info)Asserts that the delCount for this SegmentCommitInfo is valid
-
isMerged
protected final boolean isMerged(SegmentInfos infos, SegmentCommitInfo info, MergePolicy.MergeContext mergeContext) throws IOException
Returns true if this single info is already fully merged (has no pending deletes, is in the same dir as the writer, and matches the current compound file setting- Throws:
IOException
-
getNoCFSRatio
public double getNoCFSRatio()
Returns currentnoCFSRatio.- See Also:
setNoCFSRatio(double)
-
setNoCFSRatio
public void setNoCFSRatio(double noCFSRatio)
If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled. Set to 1.0 to always use CFS regardless of merge size.
-
getMaxCFSSegmentSizeMB
public double getMaxCFSSegmentSizeMB()
Returns the largest size allowed for a compound file segment
-
setMaxCFSSegmentSizeMB
public void setMaxCFSSegmentSizeMB(double v)
If a merged segment will be more than this value, leave the segment as non-compound file even if compound file is enabled. Set this to Double.POSITIVE_INFINITY (default) and noCFSRatio to 1.0 to always use CFS regardless of merge size.
-
keepFullyDeletedSegment
public boolean keepFullyDeletedSegment(IOSupplier<CodecReader> readerIOSupplier) throws IOException
Returns true if the segment represented by the given CodecReader should be keep even if it's fully deleted. This is useful for testing of for instance if the merge policy implements retention policies for soft deletes.- Throws:
IOException
-
numDeletesToMerge
public int numDeletesToMerge(SegmentCommitInfo info, int delCount, IOSupplier<CodecReader> readerSupplier) throws IOException
Returns the number of deletes that a merge would claim on the given segment. This method will by default return the sum of the del count on disk and the pending delete count. Yet, subclasses that wrap merge readers might modify this to reflect deletes that are carried over to the target segment in the case of soft deletes.Soft deletes all deletes to survive across merges in order to control when the soft-deleted data is claimed.
- Parameters:
info- the segment info that identifies the segmentdelCount- the number deleted documents for this segmentreaderSupplier- a supplier that allows to obtain aCodecReaderfor this segment- Throws:
IOException- See Also:
IndexWriter.softUpdateDocument(Term, Iterable, Field...),IndexWriterConfig.setSoftDeletesField(String)
-
segString
protected final String segString(MergePolicy.MergeContext mergeContext, Iterable<SegmentCommitInfo> infos)
Builds a String representation of the given SegmentCommitInfo instances
-
message
protected final void message(String message, MergePolicy.MergeContext mergeContext)
Print a debug message toMergePolicy.MergeContext'sinfoStream.
-
verbose
protected final boolean verbose(MergePolicy.MergeContext mergeContext)
Returnstrueif the info-stream is in verbose mode- See Also:
message(String, MergeContext)
-
-