github.com/apache/beam/sdks/v2@v2.48.2/typescript/README-dev.md (about)

     1  <!--
     2      Licensed to the Apache Software Foundation (ASF) under one
     3      or more contributor license agreements.  See the NOTICE file
     4      distributed with this work for additional information
     5      regarding copyright ownership.  The ASF licenses this file
     6      to you under the Apache License, Version 2.0 (the
     7      "License"); you may not use this file except in compliance
     8      with the License.  You may obtain a copy of the License at
     9  
    10        http://www.apache.org/licenses/LICENSE-2.0
    11  
    12      Unless required by applicable law or agreed to in writing,
    13      software distributed under the License is distributed on an
    14      "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    15      KIND, either express or implied.  See the License for the
    16      specific language governing permissions and limitations
    17      under the License.
    18  -->
    19  
    20  # TypeScript Beam SDK
    21  
    22  This is the start of a fully functioning JavaScript (actually, TypeScript) SDK.
    23  There are two distinct aims with this SDK:
    24  
    25  1. Tap into the large (and relatively underserved, by existing data processing
    26  frameworks) community of JavaScript developers with a native SDK targeting this language.
    27  
    28  1. Develop a new SDK which can serve both as a proof of concept and reference
    29  that highlights the (relative) ease of porting Beam to new languages,
    30  a differentiating feature of Beam and Dataflow.
    31  
    32  To accomplish this, we lean heavily on the portability framework.
    33  For example, we make heavy use of cross-language transforms,
    34  in particular for IOs.
    35  In addition, the direct runner is simply an extension of the worker suitable
    36  for running on portable runners such as the ULR, which will directly transfer
    37  to running on production runners such as Dataflow and Flink.
    38  The target audience should hopefully not be put off by running other-language
    39  code encapsulated in docker images.
    40  
    41  ## Getting started
    42  
    43  To install and test the Typescript SDK from source, you will need `npm` and
    44  `python`. Other requirements can be installed by `npm` later on.
    45  
    46  (**Note** that Python is a requirement as it is used to orchestrate Beam
    47  functionality.)
    48  
    49  1. First you must clone the Beam repository and go to the `typescript` directory.
    50  ```
    51  git checkout https://github.com/apache/beam
    52  cd beam/sdks/typescript/
    53  ```
    54  
    55  2. Execute a local install of the necessary packages:
    56  
    57  ```
    58  npm install
    59  ```
    60  
    61  3. Then run `npm run build` to transpile Typescript files into JS files.
    62  
    63  ### Development workflows
    64  
    65  All of the development workflows (build, test, lint, clean, etc) are defined in
    66  `package.json` and can be run with `npm` commands (e.g. `npm run build`).
    67  
    68  ### Running a pipeline
    69  
    70  The `wordcount.ts` file defines a parameterizable pipeline that can be run
    71  against different runners. You can run it from the transpiled `.js` file
    72  like so:
    73  
    74  ```
    75  node dist/src/apache_beam/examples/wordcount.js ${PARAMETERS}
    76  ```
    77  
    78  To run locally:
    79  
    80  ```
    81  node dist/src/apache_beam/examples/wordcount.js --runner=direct
    82  ```
    83  
    84  To run against Flink, where the local infrastructure is automatically
    85  downloaded and set up:
    86  
    87  ```
    88  node dist/src/apache_beam/examples/wordcount.js --runner=flink
    89  ```
    90  
    91  To run on Dataflow:
    92  
    93  ```
    94  node dist/src/apache_beam/examples/wordcount.js \
    95      --runner=dataflow \
    96      --project=${PROJECT_ID} \
    97      --tempLocation=gs://${GCS_BUCKET}/wordcount-js/temp --region=${REGION}
    98  ```
    99  
   100  ## TODO
   101  
   102  This SDK is a work in progress. In January 2022 we developed the ability to
   103  construct and run basic pipelines (including external transforms and running
   104  on a portable runner) but the following big-ticket items remain.
   105  
   106  * Containerization
   107  
   108    * Actually use worker threads for multiple bundles
   109      (unsure if this is a large benefit, mitigated using sibling workers).
   110  
   111  * API
   112  
   113    * There are several TODOs of minor features or design decisions to finalize.
   114  
   115      * Consider using (or supporting) 2-arrays rather than {key, value} objects
   116        for KVs.
   117  
   118      * Force the second argument of map/flatMap to be an Object, which would lead
   119      to a less confusing API (vs. Array.map) and clean up the implementation.
   120      Also add a [do]Filter, and possibly a [do]Reduce?
   121  
   122      * Move away from using classes.
   123  
   124    * Advanced features like state, timers, and SDF.
   125  
   126  * Other
   127  
   128    * Relative vs. absoute imports, possibly via setting a base url with a
   129    `jsconfig.json`.
   130  
   131    * More/better tests, including tests of illegal/unsupported use.
   132  
   133    * Set channel options like `grpc.max_{send,receive}_message_length` as we
   134    do in other SDKs.
   135  
   136    * Reduce use of `any`.
   137  
   138      * Could use `unknown` in its place where the type is truly unknown.
   139  
   140      * It'd be nice to enforce, maybe re-enable `noImplicitAny: true` in
   141      tsconfig if we can get the generated proto files to be ignored.
   142  
   143    * Enable a linter like eslint and fix at least the low hanging fruit.
   144  
   145  There is probably more; there are many TODOs littered throughout the code.
   146  
   147  ## Development.
   148  
   149  ### Getting stared
   150  
   151  Install node.js, and then from within `sdks/typescript`.
   152  
   153  ```
   154  npm install
   155  ```
   156  
   157  ### Running tests
   158  
   159  ```
   160  npm test
   161  ```
   162  
   163  ### Style
   164  
   165  We have adopted prettier which can be run with
   166  
   167  ```
   168  npx prettier --write .
   169  ```