github.com/apache/beam/sdks/v2@v2.48.2/go/README.md (about)

     1  <!--
     2      Licensed to the Apache Software Foundation (ASF) under one
     3      or more contributor license agreements.  See the NOTICE file
     4      distributed with this work for additional information
     5      regarding copyright ownership.  The ASF licenses this file
     6      to you under the Apache License, Version 2.0 (the
     7      "License"); you may not use this file except in compliance
     8      with the License.  You may obtain a copy of the License at
     9  
    10        http://www.apache.org/licenses/LICENSE-2.0
    11  
    12      Unless required by applicable law or agreed to in writing,
    13      software distributed under the License is distributed on an
    14      "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    15      KIND, either express or implied.  See the License for the
    16      specific language governing permissions and limitations
    17      under the License.
    18  -->
    19  
    20  # Go SDK
    21  
    22  The Apache Beam Go SDK is the Beam Model implemented in the [Go Programming Language](https://go.dev/).
    23  It is based on the following initial [design](https://s.apache.org/beam-go-sdk-design-rfc).
    24  
    25  ## How to run the examples
    26  
    27  **Prerequisites**: to use Google Cloud sources and sinks (default for
    28  most examples), follow the setup
    29  [here](https://beam.apache.org/documentation/runners/dataflow/). You can
    30  verify that it works by running the corresponding Java example.
    31  
    32  The examples are normal Go programs and are most easily run directly.
    33  They are parameterized by Go flags.
    34  For example, to run wordcount on the Go direct runner do:
    35  
    36  ```
    37  $ pwd
    38  [...]/sdks/go
    39  $ go run examples/wordcount/wordcount.go --output=/tmp/result.txt
    40  [{6: KV<string,int>/GW/KV<bytes,int[varintz]>}]
    41  [{10: KV<int,string>/GW/KV<int[varintz],bytes>}]
    42  2018/03/21 09:39:03 Pipeline:
    43  2018/03/21 09:39:03 Nodes: {1: []uint8/GW/bytes}
    44  {2: string/GW/bytes}
    45  {3: string/GW/bytes}
    46  {4: string/GW/bytes}
    47  {5: string/GW/bytes}
    48  {6: KV<string,int>/GW/KV<bytes,int[varintz]>}
    49  {7: CoGBK<string,int>/GW/CoGBK<bytes,int[varintz]>}
    50  {8: KV<string,int>/GW/KV<bytes,int[varintz]>}
    51  {9: string/GW/bytes}
    52  {10: KV<int,string>/GW/KV<int[varintz],bytes>}
    53  {11: CoGBK<int,string>/GW/CoGBK<int[varintz],bytes>}
    54  Edges: 1: Impulse [] -> [Out: []uint8 -> {1: []uint8/GW/bytes}]
    55  2: ParDo [In(Main): []uint8 <- {1: []uint8/GW/bytes}] -> [Out: T -> {2: string/GW/bytes}]
    56  3: ParDo [In(Main): string <- {2: string/GW/bytes}] -> [Out: string -> {3: string/GW/bytes}]
    57  4: ParDo [In(Main): string <- {3: string/GW/bytes}] -> [Out: string -> {4: string/GW/bytes}]
    58  5: ParDo [In(Main): string <- {4: string/GW/bytes}] -> [Out: string -> {5: string/GW/bytes}]
    59  6: ParDo [In(Main): T <- {5: string/GW/bytes}] -> [Out: KV<T,int> -> {6: KV<string,int>/GW/KV<bytes,int[varintz]>}]
    60  7: CoGBK [In(Main): KV<string,int> <- {6: KV<string,int>/GW/KV<bytes,int[varintz]>}] -> [Out: CoGBK<string,int> -> {7: CoGBK<string,int>/GW/CoGBK<bytes,int[varintz]>}]
    61  8: Combine [In(Main): int <- {7: CoGBK<string,int>/GW/CoGBK<bytes,int[varintz]>}] -> [Out: KV<string,int> -> {8: KV<string,int>/GW/KV<bytes,int[varintz]>}]
    62  9: ParDo [In(Main): KV<string,int> <- {8: KV<string,int>/GW/KV<bytes,int[varintz]>}] -> [Out: string -> {9: string/GW/bytes}]
    63  10: ParDo [In(Main): T <- {9: string/GW/bytes}] -> [Out: KV<int,T> -> {10: KV<int,string>/GW/KV<int[varintz],bytes>}]
    64  11: CoGBK [In(Main): KV<int,string> <- {10: KV<int,string>/GW/KV<int[varintz],bytes>}] -> [Out: CoGBK<int,string> -> {11: CoGBK<int,string>/GW/CoGBK<int[varintz],bytes>}]
    65  12: ParDo [In(Main): CoGBK<int,string> <- {11: CoGBK<int,string>/GW/CoGBK<int[varintz],bytes>}] -> []
    66  2018/03/21 09:39:03 Reading from gs://apache-beam-samples/shakespeare/kinglear.txt
    67  2018/03/21 09:39:04 Writing to /tmp/result.txt
    68  ```
    69  
    70  The debugging output is currently quite verbose and likely to change. The output is a local
    71  file in this case:
    72  
    73  ```
    74  $ head /tmp/result.txt
    75  while: 2
    76  darkling: 1
    77  rail'd: 1
    78  ford: 1
    79  bleed's: 1
    80  hath: 52
    81  Remain: 1
    82  disclaim: 1
    83  sentence: 1
    84  purse: 6
    85  ```
    86  
    87  To run wordcount on dataflow runner do:
    88  
    89  ```
    90  $  go run wordcount.go --runner=dataflow --project=<YOUR_GCP_PROJECT> --region=<YOUR_GCP_REGION> --staging_location=<YOUR_GCS_LOCATION>/staging --worker_harness_container_image=<YOUR_SDK_HARNESS_IMAGE_LOCATION> --output=<YOUR_GCS_LOCATION>/output
    91  ```
    92  
    93  The output is a GCS file in this case:
    94  
    95  ```
    96  $ gsutil cat <YOUR_GCS_LOCATION>/output* | head
    97  Blanket: 1
    98  blot: 1
    99  Kneeling: 3
   100  cautions: 1
   101  appears: 4
   102  Deserved: 1
   103  nettles: 1
   104  OSWALD: 53
   105  sport: 3
   106  Crown'd: 1
   107  ```
   108  
   109  
   110  See [BUILD.md](./BUILD.md) for how to build Go code in general. See
   111  [container documentation](https://beam.apache.org/documentation/runtime/environments/#building-container-images) for how to build and push the Go SDK harness container image.
   112  
   113  ## Issues
   114  
   115  Please use the [`sdk-go`](https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Asdk-go) component for any bugs or feature requests.
   116  
   117  ## Contributing to the Go SDK
   118  
   119  ### New to developing Go?
   120  https://tour.golang.org : The Go Tour gives you the basics of the language, interactively no installation required.
   121  
   122  https://github.com/campoy/go-tooling-workshop is a great start on learning good (optional) development tools for Go.
   123  
   124  ### Developing Go Beam SDK on Github
   125  
   126  The Go SDK uses Go Modules for dependency management so it's as simple as cloning
   127  the repo, making necessary changes and running tests.
   128  
   129  Executing all unit tests for the SDK is possible from the `<beam root>\sdks\go` directory and running `go test ./...`.
   130  
   131  To test your change as Jenkins would execute it from a PR, from the
   132  beam root directory, run:
   133   * `./gradlew :sdks:go:goTest` executes the unit tests.
   134   * `./gradlew :sdks:go:test:ulrValidatesRunner` validates the SDK against the Portable Python runner.
   135   * `./gradlew :sdks:go:test:flinkValidatesRunner` validates the SDK against the Flink runner.
   136  
   137  Follow the [contribution guide](https://beam.apache.org/contribute/contribution-guide/#code) to create branches, and submit pull requests as normal.
   138  
   139