github.com/apache/beam/sdks/v2@v2.48.2/java/io/sparkreceiver/2/README.md (about)

     1  <!--
     2      Licensed to the Apache Software Foundation (ASF) under one
     3      or more contributor license agreements.  See the NOTICE file
     4      distributed with this work for additional information
     5      regarding copyright ownership.  The ASF licenses this file
     6      to you under the Apache License, Version 2.0 (the
     7      "License"); you may not use this file except in compliance
     8      with the License.  You may obtain a copy of the License at
     9  
    10        http://www.apache.org/licenses/LICENSE-2.0
    11  
    12      Unless required by applicable law or agreed to in writing,
    13      software distributed under the License is distributed on an
    14      "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    15      KIND, either express or implied.  See the License for the
    16      specific language governing permissions and limitations
    17      under the License.
    18  -->
    19  # SparkReceiverIO
    20  
    21  SparkReceiverIO provides I/O transforms to read messages from an [Apache Spark Receiver](https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html) `org.apache.spark.streaming.receiver.Receiver` as an unbounded source.
    22  
    23  ## Prerequistes
    24  
    25  SparkReceiverIO supports [Spark Receivers](https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html) (Spark version 2.4).
    26  1. Corresponding Spark Receiver should implement [HasOffset](https://github.com/apache/beam/blob/master/sdks/java/io/sparkreceiver/src/main/java/org/apache/beam/sdk/io/sparkreceiver/HasOffset.java) interface.
    27  2. Records should have the numeric field that represents record offset. *Example:* `RecordId` field for Salesforce and `vid` field for Hubspot Receivers.
    28     For more details please see [GetOffsetUtils](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap/utils/GetOffsetUtils.java) class from CDAP plugins examples.
    29  
    30  ## Adding support for a new Spark Receiver
    31  
    32  To add SparkReceiverIO support for a new Spark `Receiver`, perform the following steps:
    33  1. Add Spark Receiver to the Maven Central repository (see [Sonatype publishing guidelines](https://central.sonatype.org/publish/)). *Example:* [Hubspot CDAP plugin Maven repository](https://mvnrepository.com/artifact/io.cdap/hubspot-plugins/1.0.0).
    34  2. Add Spark Receiver Maven dependency to the `build.gradle` file. *Example:* ``implementation "io.cdap:hubspot-plugins:1.0.0"``.
    35  3. Implement function that will define how to get `Long offset` from the record of the Spark Receiver.
    36     *Example:* see [GetOffsetUtils](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap/utils/GetOffsetUtils.java) class from CDAP plugins examples.
    37  4. Construct `ReceiverBuilder` object by passing class of record that you want to read (e.g. String) and your Spark `Receiver` class name (dependency from step 2). *Example:*
    38     ```
    39        ReceiverBuilder<String, HubspotReceiver> receiverBuilder =
    40        new ReceiverBuilder<>(HubspotReceiver.class).withConstructorArgs();
    41     ```
    42  5. Use your Spark `Receiver` with SparkReceiverIO:
    43      1. Pass correct `getOffsetFn` (from step 3) and correct `ReceiverBuilder` (from step 4). *Example:*
    44     ```
    45     SparkReceiverIO.Read<V> reader =
    46              SparkReceiverIO.<V>read()
    47                  .withGetOffsetFn(getOffsetFn)
    48                  .withSparkReceiverBuilder(receiverBuilder);
    49     ```
    50  
    51  
    52  To learn more, please check out CDAP Streaming plugins [complete examples](https://github.com/apache/beam/tree/master/examples/java/cdap) where Spark Receivers are used.
    53  
    54  ## Dependencies
    55  
    56  To use SparkReceiverIO, add a dependency on `beam-sdks-java-io-sparkreceiver`.
    57  
    58  ```maven
    59  <dependency>
    60      <groupId>org.apache.beam</groupId>
    61      <artifactId>beam-sdks-java-io-sparkreceiver</artifactId>
    62      <version>...</version>
    63  </dependency>
    64  ```
    65  
    66  ## Documentation
    67  
    68  The documentation and usage examples are maintained in JavaDoc for [SparkReceiverIO class](src/main/java/org/apache/beam/sdk/io/sparkreceiver/SparkReceiverIO.java).