github.com/apache/beam/sdks/v2@v2.48.2/java/io/cdap/README.md (about) 1 <!-- 2 Licensed to the Apache Software Foundation (ASF) under one 3 or more contributor license agreements. See the NOTICE file 4 distributed with this work for additional information 5 regarding copyright ownership. The ASF licenses this file 6 to you under the Apache License, Version 2.0 (the 7 "License"); you may not use this file except in compliance 8 with the License. You may obtain a copy of the License at 9 10 http://www.apache.org/licenses/LICENSE-2.0 11 12 Unless required by applicable law or agreed to in writing, 13 software distributed under the License is distributed on an 14 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 KIND, either express or implied. See the License for the 16 specific language governing permissions and limitations 17 under the License. 18 --> 19 20 # CdapIO 21 CdapIO provides I/O transforms for [CDAP](https://cdap.io/) plugins. 22 23 ## What is CDAP? 24 25 [CDAP](https://cdap.io/) is an application platform for building and managing data applications in hybrid and multi-cloud environments. 26 It enables developers, business analysts, and data scientists to use a visual rapid development environment and utilize common patterns, 27 data, and application abstractions to accelerate the development of data applications, addressing a broader range of real-time and batch use cases. 28 29 [CDAP plugins](https://github.com/data-integrations) types: 30 - Batch source 31 - Batch sink 32 - Streaming source 33 34 To learn more about CDAP plugins please see [io.cdap.cdap.api.annotation.Plugin](https://javadoc.io/static/io.cdap.cdap/cdap-api/6.7.2/io/cdap/cdap/api/annotation/Plugin.html) and [Data Integrations](https://github.com/data-integrations) plugins repository. 35 36 ## CDAP Batch plugins support in CDAP IO 37 38 CdapIO supports CDAP Batch plugins based on Hadoop [InputFormat](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html) and [OutputFormat](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/OutputFormat.html). 39 CDAP batch plugins support is implemented using [HadoopFormatIO](https://beam.apache.org/documentation/io/built-in/hadoop/). 40 41 CdapIO currently supports the following CDAP Batch plugins by referencing `CDAP plugin` class: 42 * [Hubspot Batch Source](https://github.com/data-integrations/hubspot/blob/develop/src/main/java/io/cdap/plugin/hubspot/source/batch/HubspotBatchSource.java) 43 * [Hubspot Batch Sink](https://github.com/data-integrations/hubspot/blob/develop/src/main/java/io/cdap/plugin/hubspot/sink/batch/HubspotBatchSink.java) 44 * [Salesforce Batch Source](https://github.com/data-integrations/salesforce/blob/develop/src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/SalesforceBatchSource.java) 45 * [Salesforce Batch Sink](https://github.com/data-integrations/salesforce/blob/develop/src/main/java/io/cdap/plugin/salesforce/plugin/sink/batch/SalesforceBatchSink.java) 46 * [ServiceNow Batch Source](https://github.com/data-integrations/servicenow-plugins/blob/develop/src/main/java/io/cdap/plugin/servicenow/source/ServiceNowSource.java) 47 * [Zendesk Batch Source](https://github.com/data-integrations/zendesk/blob/develop/src/main/java/io/cdap/plugin/zendesk/source/batch/ZendeskBatchSource.java) 48 49 It means that all these plugins can be used like this: 50 ``CdapIO.withCdapPluginClass(HubspotBatchSource.class)`` 51 52 ### Requirements for Cdap Batch plugins 53 54 CDAP Batch plugin should be based on `HadoopFormat` implementation. 55 56 ### How to add support for a new CDAP Batch plugin 57 58 To add CdapIO support for a new CDAP Batch [Plugin](src/main/java/org/apache/beam/sdk/io/cdap/Plugin.java) perform the following steps: 59 1. Find CDAP plugin artifacts in the Maven Central repository. *Example:* [Hubspot plugin Maven repository](https://mvnrepository.com/artifact/io.cdap/hubspot-plugins/1.0.0). *Note:* To add a custom CDAP plugin, please follow [Sonatype publishing guidelines](https://central.sonatype.org/publish/). 60 2. Add the CDAP plugin Maven dependency to the `build.gradle` file. *Example:* ``implementation "io.cdap:hubspot-plugins:1.0.0"``. 61 3. Here are two ways of using CDAP batch plugin with CdapIO: 62 1. Using `Plugin.createBatch()` method. Pass Cdap Plugin class and correct `InputFormat` (or `OutputFormat`) and `InputFormatProvider` (or `OutputFormatProvider`) classes to CdapIO. *Example:* 63 ``` 64 CdapIO.withCdapPlugin( 65 Plugin.createBatch( 66 EmployeeBatchSource.class, 67 EmployeeInputFormat.class, 68 EmployeeInputFormatProvider.class)); 69 ``` 70 2. Using `MappingUtils`. 71 1. Navigate to [MappingUtils](src/main/java/org/apache/beam/sdk/io/cdap/MappingUtils.java) class. 72 2. Modify `getPluginClassByName()` method: 73 3. Add the code for mapping Cdap Plugin class name and `Input/Output Format` and `FormatProvider` classes. 74 *Example:* 75 ``` 76 if (pluginClass.equals(EmployeeBatchSource.class)){ 77 return Plugin.createBatch(pluginClass, 78 EmployeeInputFormat.class, 79 EmployeeInputFormatProvider.class); 80 } 81 ``` 82 4. After these steps you will be able to use Cdap Plugin by class name like this: ``CdapIO.withCdapPluginClass(EmployeeBatchSource.class)`` 83 84 To learn more, please check out [complete examples](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap). 85 86 ## CDAP Streaming plugins support in CDAP IO 87 88 CdapIO supports CDAP Streaming plugins based on [Apache Spark Receiver](https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html). 89 CDAP streaming plugins support is implemented using [SparkReceiverIO](https://github.com/apache/beam/tree/master/sdks/java/io/sparkreceiver). 90 91 ### Requirements for Cdap Streaming plugins 92 93 1. CDAP Streaming plugin should be based on `Spark Receiver`. 94 2. CDAP Streaming plugin should support work with offsets. 95 1. Corresponding Spark Receiver should implement [HasOffset](https://github.com/apache/beam/blob/master/sdks/java/io/sparkreceiver/src/main/java/org/apache/beam/sdk/io/sparkreceiver/HasOffset.java) interface. 96 2. Records should have the numeric field that represents record offset. *Example:* `RecordId` field for Salesforce and `vid` field for Hubspot plugins. 97 For more details please see [GetOffsetUtils](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap/utils/GetOffsetUtils.java) class from examples. 98 99 ### How to add support for a new CDAP Streaming plugin 100 101 To add CdapIO support for a new CDAP Streaming SparkReceiver [Plugin](src/main/java/org/apache/beam/sdk/io/cdap/Plugin.java), perform the following steps: 102 1. Find CDAP plugin artifacts in the Maven Central repository. *Example:* [Hubspot plugin Maven repository](https://mvnrepository.com/artifact/io.cdap/hubspot-plugins/1.0.0). *Note:* To add a custom CDAP plugin, please follow [Sonatype publishing guidelines](https://central.sonatype.org/publish/). 103 2. Add CDAP plugin Maven dependency to the `build.gradle` file. *Example:* ``implementation "io.cdap:hubspot-plugins:1.0.0"``. 104 3. Implement function that will define how to get `Long offset` from the record of the Cdap Plugin. 105 *Example:* see [GetOffsetUtils](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap/utils/GetOffsetUtils.java) class from examples. 106 4. Here are two ways of using Cdap streaming Plugin with CdapIO: 107 1. Using `Plugin.createStreaming()` method. Pass Cdap Plugin class, correct `getOffsetFn` (from step 3) and Spark `Receiver` class to CdapIO. *Example:* 108 ``` 109 CdapIO.withCdapPlugin( 110 Plugin.createStreaming( 111 HubspotStreamingSource.class, 112 offsetFnForHubspot, 113 HubspotReceiver.class))); 114 ``` 115 2. Using `MappingUtils`. 116 1. Navigate to [MappingUtils](src/main/java/org/apache/beam/sdk/io/cdap/MappingUtils.java) class. 117 2. Modify `getPluginClassByName()` method: 118 3. Add the code for mapping Cdap Plugin class name, `getOffsetFn` function and Spark `Receiver` class. 119 *Example:* 120 ``` 121 if (pluginClass.equals(HubspotStreamingSource.class)){ 122 return Plugin.createStreaming(pluginClass, 123 getOffsetFnForHubpot(), 124 HubspotReceiverClass.class); 125 } 126 ``` 127 4. After these steps you will be able to use Cdap Plugin by class name like this: ``CdapIO.withCdapPluginClass(HubspotStreamingSource.class)`` 128 129 To learn more, please check out [complete examples](https://github.com/apache/beam/tree/master/examples/java/cdap). 130 131 ## Dependencies 132 133 To use CdapIO please add a dependency on `beam-sdks-java-io-cdap`. 134 135 ```maven 136 <dependency> 137 <groupId>org.apache.beam</groupId> 138 <artifactId>beam-sdks-java-io-cdap</artifactId> 139 <version>...</version> 140 </dependency> 141 ``` 142 143 ## Documentation 144 145 The documentation and usage examples are maintained in JavaDoc for [CdapIO.java](src/main/java/org/apache/beam/sdk/io/cdap/CdapIO.java).