github.com/apache/beam/sdks/v2@v2.48.2/java/io/sparkreceiver/2/README.md (about) 1 <!-- 2 Licensed to the Apache Software Foundation (ASF) under one 3 or more contributor license agreements. See the NOTICE file 4 distributed with this work for additional information 5 regarding copyright ownership. The ASF licenses this file 6 to you under the Apache License, Version 2.0 (the 7 "License"); you may not use this file except in compliance 8 with the License. You may obtain a copy of the License at 9 10 http://www.apache.org/licenses/LICENSE-2.0 11 12 Unless required by applicable law or agreed to in writing, 13 software distributed under the License is distributed on an 14 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 KIND, either express or implied. See the License for the 16 specific language governing permissions and limitations 17 under the License. 18 --> 19 # SparkReceiverIO 20 21 SparkReceiverIO provides I/O transforms to read messages from an [Apache Spark Receiver](https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html) `org.apache.spark.streaming.receiver.Receiver` as an unbounded source. 22 23 ## Prerequistes 24 25 SparkReceiverIO supports [Spark Receivers](https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html) (Spark version 2.4). 26 1. Corresponding Spark Receiver should implement [HasOffset](https://github.com/apache/beam/blob/master/sdks/java/io/sparkreceiver/src/main/java/org/apache/beam/sdk/io/sparkreceiver/HasOffset.java) interface. 27 2. Records should have the numeric field that represents record offset. *Example:* `RecordId` field for Salesforce and `vid` field for Hubspot Receivers. 28 For more details please see [GetOffsetUtils](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap/utils/GetOffsetUtils.java) class from CDAP plugins examples. 29 30 ## Adding support for a new Spark Receiver 31 32 To add SparkReceiverIO support for a new Spark `Receiver`, perform the following steps: 33 1. Add Spark Receiver to the Maven Central repository (see [Sonatype publishing guidelines](https://central.sonatype.org/publish/)). *Example:* [Hubspot CDAP plugin Maven repository](https://mvnrepository.com/artifact/io.cdap/hubspot-plugins/1.0.0). 34 2. Add Spark Receiver Maven dependency to the `build.gradle` file. *Example:* ``implementation "io.cdap:hubspot-plugins:1.0.0"``. 35 3. Implement function that will define how to get `Long offset` from the record of the Spark Receiver. 36 *Example:* see [GetOffsetUtils](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap/utils/GetOffsetUtils.java) class from CDAP plugins examples. 37 4. Construct `ReceiverBuilder` object by passing class of record that you want to read (e.g. String) and your Spark `Receiver` class name (dependency from step 2). *Example:* 38 ``` 39 ReceiverBuilder<String, HubspotReceiver> receiverBuilder = 40 new ReceiverBuilder<>(HubspotReceiver.class).withConstructorArgs(); 41 ``` 42 5. Use your Spark `Receiver` with SparkReceiverIO: 43 1. Pass correct `getOffsetFn` (from step 3) and correct `ReceiverBuilder` (from step 4). *Example:* 44 ``` 45 SparkReceiverIO.Read<V> reader = 46 SparkReceiverIO.<V>read() 47 .withGetOffsetFn(getOffsetFn) 48 .withSparkReceiverBuilder(receiverBuilder); 49 ``` 50 51 52 To learn more, please check out CDAP Streaming plugins [complete examples](https://github.com/apache/beam/tree/master/examples/java/cdap) where Spark Receivers are used. 53 54 ## Dependencies 55 56 To use SparkReceiverIO, add a dependency on `beam-sdks-java-io-sparkreceiver`. 57 58 ```maven 59 <dependency> 60 <groupId>org.apache.beam</groupId> 61 <artifactId>beam-sdks-java-io-sparkreceiver</artifactId> 62 <version>...</version> 63 </dependency> 64 ``` 65 66 ## Documentation 67 68 The documentation and usage examples are maintained in JavaDoc for [SparkReceiverIO class](src/main/java/org/apache/beam/sdk/io/sparkreceiver/SparkReceiverIO.java).