Class KnnJoinIndexJudgement<T extends org.locationtech.jts.geom.Geometry,U extends org.locationtech.jts.geom.Geometry>

java.lang.Object
org.apache.sedona.core.joinJudgement.KnnJoinIndexJudgement<T,U>
Type Parameters:
T - extends Geometry - the type of geometries in the left set
U - extends Geometry - the type of geometries in the right set
All Implemented Interfaces:
Serializable, org.apache.spark.api.java.function.FlatMapFunction2<Iterator<T>,Iterator<U>,org.apache.commons.lang3.tuple.Pair<T,U>>

public class KnnJoinIndexJudgement<T extends org.locationtech.jts.geom.Geometry,U extends org.locationtech.jts.geom.Geometry> extends Object implements org.apache.spark.api.java.function.FlatMapFunction2<Iterator<T>,Iterator<U>,org.apache.commons.lang3.tuple.Pair<T,U>>, Serializable
This class is responsible for performing a K-nearest neighbors (KNN) join operation using a spatial index. It extends the JudgementBase class and implements the FlatMapFunction2 interface.
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected final org.apache.spark.util.LongAccumulator
     
    protected final org.apache.spark.util.LongAccumulator
     
    protected final org.apache.spark.util.LongAccumulator
     
    protected final org.apache.spark.util.LongAccumulator
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    KnnJoinIndexJudgement(int k, DistanceMetric distanceMetric, boolean includeTies, org.apache.spark.broadcast.Broadcast<List<T>> broadcastQueryObjects, org.apache.spark.broadcast.Broadcast<org.locationtech.jts.index.strtree.STRtree> broadcastObjectsTreeIndex, org.apache.spark.util.LongAccumulator buildCount, org.apache.spark.util.LongAccumulator streamCount, org.apache.spark.util.LongAccumulator resultCount, org.apache.spark.util.LongAccumulator candidateCount)
    Constructor for the KnnJoinIndexJudgement class.
  • Method Summary

    Modifier and Type
    Method
    Description
    Iterator<org.apache.commons.lang3.tuple.Pair<T,U>>
    call(Iterator<T> queryShapes, Iterator<U> objectShapes)
    This method performs the KNN join operation.
    Iterator<org.apache.commons.lang3.tuple.Pair<T,U>>
    This method performs the KNN join operation using the broadcast spatial index built using all geometries in the object side.
    Iterator<org.apache.commons.lang3.tuple.Pair<T,U>>
    This method performs the KNN join operation using the broadcast query geometries.
    static <U extends org.locationtech.jts.geom.Geometry, T extends org.locationtech.jts.geom.Geometry>
    double
    distance(U key, T value, DistanceMetric distanceMetric)
     
    static double
    distanceByMetric(org.locationtech.jts.geom.Geometry queryGeom, org.locationtech.jts.geom.Geometry candidateGeom, DistanceMetric distanceMetric)
    This method calculates the distance between two geometries using the specified distance metric.
    static org.locationtech.jts.index.strtree.ItemDistance
     
    static org.locationtech.jts.index.strtree.ItemDistance
    This method returns the ItemDistance object based on the specified distance metric.
    protected boolean
    hasNextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
    Iterator model for the nest loop join.
    protected boolean
    hasNextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)
    Iterator model for the index-based join.
    protected void
    Looks up the extent of the current partition.
    protected void
    log(String message, Object... params)
     
    protected org.apache.commons.lang3.tuple.Pair<U,T>
    nextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
    Iterator model for the nest loop join.
    protected org.apache.commons.lang3.tuple.Pair<U,T>
    nextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)
    Iterator model for the index-based join.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • buildCount

      protected final org.apache.spark.util.LongAccumulator buildCount
    • streamCount

      protected final org.apache.spark.util.LongAccumulator streamCount
    • resultCount

      protected final org.apache.spark.util.LongAccumulator resultCount
    • candidateCount

      protected final org.apache.spark.util.LongAccumulator candidateCount
  • Constructor Details

    • KnnJoinIndexJudgement

      public KnnJoinIndexJudgement(int k, DistanceMetric distanceMetric, boolean includeTies, org.apache.spark.broadcast.Broadcast<List<T>> broadcastQueryObjects, org.apache.spark.broadcast.Broadcast<org.locationtech.jts.index.strtree.STRtree> broadcastObjectsTreeIndex, org.apache.spark.util.LongAccumulator buildCount, org.apache.spark.util.LongAccumulator streamCount, org.apache.spark.util.LongAccumulator resultCount, org.apache.spark.util.LongAccumulator candidateCount)
      Constructor for the KnnJoinIndexJudgement class.
      Parameters:
      k - the number of nearest neighbors to find
      distanceMetric - the distance metric to use
      broadcastQueryObjects - the broadcast geometries on queries
      broadcastObjectsTreeIndex - the broadcast spatial index on objects
      buildCount - accumulator for the number of geometries processed from the build side
      streamCount - accumulator for the number of geometries processed from the stream side
      resultCount - accumulator for the number of join results
      candidateCount - accumulator for the number of candidate matches
  • Method Details

    • call

      public Iterator<org.apache.commons.lang3.tuple.Pair<T,U>> call(Iterator<T> queryShapes, Iterator<U> objectShapes) throws Exception
      This method performs the KNN join operation. It iterates over the geometries in the stream side and uses the spatial index to find the k nearest neighbors for each geometry. The method returns an iterator over the join results.
      Specified by:
      call in interface org.apache.spark.api.java.function.FlatMapFunction2<Iterator<T extends org.locationtech.jts.geom.Geometry>,Iterator<U extends org.locationtech.jts.geom.Geometry>,org.apache.commons.lang3.tuple.Pair<T extends org.locationtech.jts.geom.Geometry,U extends org.locationtech.jts.geom.Geometry>>
      Parameters:
      queryShapes - iterator over the geometries in the query side
      objectShapes - iterator over the geometries in the object side
      Returns:
      an iterator over the join results
      Throws:
      Exception - if the spatial index is not of type STRtree
    • callUsingBroadcastObjectIndex

      public Iterator<org.apache.commons.lang3.tuple.Pair<T,U>> callUsingBroadcastObjectIndex(Iterator<T> queryShapes)
      This method performs the KNN join operation using the broadcast spatial index built using all geometries in the object side.
      Parameters:
      queryShapes - iterator over the geometries in the query side
      Returns:
      an iterator over the join results
    • callUsingBroadcastQueryList

      public Iterator<org.apache.commons.lang3.tuple.Pair<T,U>> callUsingBroadcastQueryList(Iterator<U> objectShapes)
      This method performs the KNN join operation using the broadcast query geometries.
      Parameters:
      objectShapes - iterator over the geometries in the object side
      Returns:
      an iterator over the join results
    • distanceByMetric

      public static double distanceByMetric(org.locationtech.jts.geom.Geometry queryGeom, org.locationtech.jts.geom.Geometry candidateGeom, DistanceMetric distanceMetric)
      This method calculates the distance between two geometries using the specified distance metric.
      Parameters:
      queryGeom - the query geometry
      candidateGeom - the candidate geometry
      distanceMetric - the distance metric to use
      Returns:
      the distance between the two geometries
    • getItemDistanceByMetric

      public static org.locationtech.jts.index.strtree.ItemDistance getItemDistanceByMetric(DistanceMetric distanceMetric)
      This method returns the ItemDistance object based on the specified distance metric.
      Parameters:
      distanceMetric - the distance metric to use
      Returns:
      the ItemDistance object
    • distance

      public static <U extends org.locationtech.jts.geom.Geometry, T extends org.locationtech.jts.geom.Geometry> double distance(U key, T value, DistanceMetric distanceMetric)
    • getItemDistance

      public static org.locationtech.jts.index.strtree.ItemDistance getItemDistance(DistanceMetric distanceMetric)
    • initPartition

      protected void initPartition()
      Looks up the extent of the current partition. If found, `match` method will activate the logic to avoid emitting duplicate join results from multiple partitions.

      Must be called before processing a partition. Must be called from the same instance that will be used to process the partition.

    • hasNextBase

      protected boolean hasNextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)
      Iterator model for the index-based join. It checks if there is a next match and populate it to the result.
      Parameters:
      spatialIndex -
      streamShapes -
      buildLeft -
      Returns:
    • hasNextBase

      protected boolean hasNextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
      Iterator model for the nest loop join. It checks if there is a next match and populate it to the result.
      Parameters:
      buildShapes -
      streamShapes -
      Returns:
    • nextBase

      protected org.apache.commons.lang3.tuple.Pair<U,T> nextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)
      Iterator model for the index-based join. It returns 1 pair in the current batch. Each batch contains a list of pairs of geometries that satisfy the join condition. The current batch is the result of the current stream shape against all the build shapes.
      Parameters:
      spatialIndex -
      streamShapes -
      buildLeft -
      Returns:
    • nextBase

      protected org.apache.commons.lang3.tuple.Pair<U,T> nextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
      Iterator model for the nest loop join. It returns 1 pair in the current batch. Each batch contains a list of pairs of geometries that satisfy the join condition. The current batch is the result of the current stream shape against all the build shapes.
      Parameters:
      buildShapes -
      streamShapes -
      Returns:
    • log

      protected void log(String message, Object... params)