github.com/apache/beam/sdks/v2@v2.48.2/typescript/README-dev.md (about) 1 <!-- 2 Licensed to the Apache Software Foundation (ASF) under one 3 or more contributor license agreements. See the NOTICE file 4 distributed with this work for additional information 5 regarding copyright ownership. The ASF licenses this file 6 to you under the Apache License, Version 2.0 (the 7 "License"); you may not use this file except in compliance 8 with the License. You may obtain a copy of the License at 9 10 http://www.apache.org/licenses/LICENSE-2.0 11 12 Unless required by applicable law or agreed to in writing, 13 software distributed under the License is distributed on an 14 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 KIND, either express or implied. See the License for the 16 specific language governing permissions and limitations 17 under the License. 18 --> 19 20 # TypeScript Beam SDK 21 22 This is the start of a fully functioning JavaScript (actually, TypeScript) SDK. 23 There are two distinct aims with this SDK: 24 25 1. Tap into the large (and relatively underserved, by existing data processing 26 frameworks) community of JavaScript developers with a native SDK targeting this language. 27 28 1. Develop a new SDK which can serve both as a proof of concept and reference 29 that highlights the (relative) ease of porting Beam to new languages, 30 a differentiating feature of Beam and Dataflow. 31 32 To accomplish this, we lean heavily on the portability framework. 33 For example, we make heavy use of cross-language transforms, 34 in particular for IOs. 35 In addition, the direct runner is simply an extension of the worker suitable 36 for running on portable runners such as the ULR, which will directly transfer 37 to running on production runners such as Dataflow and Flink. 38 The target audience should hopefully not be put off by running other-language 39 code encapsulated in docker images. 40 41 ## Getting started 42 43 To install and test the Typescript SDK from source, you will need `npm` and 44 `python`. Other requirements can be installed by `npm` later on. 45 46 (**Note** that Python is a requirement as it is used to orchestrate Beam 47 functionality.) 48 49 1. First you must clone the Beam repository and go to the `typescript` directory. 50 ``` 51 git checkout https://github.com/apache/beam 52 cd beam/sdks/typescript/ 53 ``` 54 55 2. Execute a local install of the necessary packages: 56 57 ``` 58 npm install 59 ``` 60 61 3. Then run `npm run build` to transpile Typescript files into JS files. 62 63 ### Development workflows 64 65 All of the development workflows (build, test, lint, clean, etc) are defined in 66 `package.json` and can be run with `npm` commands (e.g. `npm run build`). 67 68 ### Running a pipeline 69 70 The `wordcount.ts` file defines a parameterizable pipeline that can be run 71 against different runners. You can run it from the transpiled `.js` file 72 like so: 73 74 ``` 75 node dist/src/apache_beam/examples/wordcount.js ${PARAMETERS} 76 ``` 77 78 To run locally: 79 80 ``` 81 node dist/src/apache_beam/examples/wordcount.js --runner=direct 82 ``` 83 84 To run against Flink, where the local infrastructure is automatically 85 downloaded and set up: 86 87 ``` 88 node dist/src/apache_beam/examples/wordcount.js --runner=flink 89 ``` 90 91 To run on Dataflow: 92 93 ``` 94 node dist/src/apache_beam/examples/wordcount.js \ 95 --runner=dataflow \ 96 --project=${PROJECT_ID} \ 97 --tempLocation=gs://${GCS_BUCKET}/wordcount-js/temp --region=${REGION} 98 ``` 99 100 ## TODO 101 102 This SDK is a work in progress. In January 2022 we developed the ability to 103 construct and run basic pipelines (including external transforms and running 104 on a portable runner) but the following big-ticket items remain. 105 106 * Containerization 107 108 * Actually use worker threads for multiple bundles 109 (unsure if this is a large benefit, mitigated using sibling workers). 110 111 * API 112 113 * There are several TODOs of minor features or design decisions to finalize. 114 115 * Consider using (or supporting) 2-arrays rather than {key, value} objects 116 for KVs. 117 118 * Force the second argument of map/flatMap to be an Object, which would lead 119 to a less confusing API (vs. Array.map) and clean up the implementation. 120 Also add a [do]Filter, and possibly a [do]Reduce? 121 122 * Move away from using classes. 123 124 * Advanced features like state, timers, and SDF. 125 126 * Other 127 128 * Relative vs. absoute imports, possibly via setting a base url with a 129 `jsconfig.json`. 130 131 * More/better tests, including tests of illegal/unsupported use. 132 133 * Set channel options like `grpc.max_{send,receive}_message_length` as we 134 do in other SDKs. 135 136 * Reduce use of `any`. 137 138 * Could use `unknown` in its place where the type is truly unknown. 139 140 * It'd be nice to enforce, maybe re-enable `noImplicitAny: true` in 141 tsconfig if we can get the generated proto files to be ignored. 142 143 * Enable a linter like eslint and fix at least the low hanging fruit. 144 145 There is probably more; there are many TODOs littered throughout the code. 146 147 ## Development. 148 149 ### Getting stared 150 151 Install node.js, and then from within `sdks/typescript`. 152 153 ``` 154 npm install 155 ``` 156 157 ### Running tests 158 159 ``` 160 npm test 161 ``` 162 163 ### Style 164 165 We have adopted prettier which can be run with 166 167 ``` 168 npx prettier --write . 169 ```