github.com/cozy/cozy-stack@v0.0.0-20240603063001-31110fa4cae1/docs/archives/konnectors-design.md

github.com/cozy/cozy-stack@v0.0.0-20240603063001-31110fa4cae1/docs/archives/konnectors-design.md (about)

1 [Table of contents](../README.md#table-of-contents)
2
3 This document was written on february and march 2017.
4
5 # Konnectors
6
7 :warning: **Note:** this documentation is outdated. It is kept for historical
8 reasons and shows the design and trade-offs we have made initially. But a lot
9 of things have changed since, so don't expect the things to be still the same.
10
11 ## What we want ?
12
13 [Konnectors](https://github.com/cozy-labs/konnectors) is an application for Cozy
14 v2 that fetch data from different web sites and services, and save them into a
15 Cozy. The 50+ connectors represent a lot of work from the community. So, we want
16 to port it to Cozy v3. There will be 2 parts:
17
18 - My Accounts, a client-side app, that will offer the possibility for the user
19 to configure her accounts, and choose when to start the import of data (see
20 [the architecture doc](https://github.com/cozy-labs/konnectors/blob/development/docs/client-side-architecture.md)).
21 - Konnectors, a worker for the [job service](../jobs.md), with the code to import
22 data from the web sites.
23
24 ## Security
25
26 ### The risks
27
28 Konnectors is not just a random application. It's a very good target for attacks
29 on Cozy because of these specificities:
30
31 - It run on the server, where there is no Content Security Policy, or firewall
32 to protect the stack.
33 - It has access to Internet, by design.
34 - It is written in nodejs, with a lot of dependencies where it is easy to hide
35 malicious code.
36 - It is a collection of connectors written by a lot of people. We welcome
37 these contributions, but it also means that we take into account that we
38 can't review in depth all the contributions.
39
40 #### Access to couchdb
41
42 The stack has the admin credentials of couchdb. If a rogue code can read its
43 configuration file or intercept connexions between the stack and couchdb, it
44 will have access to couchdb with the admin credentials, and can do anything on
45 couchdb.
46
47 #### Access to the stack
48
49 An attacker can try to profit of konnectors for accessing the stack. It can
50 target the port 6060, used by the stack to manage the cozy instances. Or, it can
51 use its privileged position for timing attacks on passwords.
52
53 #### Spying other connectors
54
55 A rogue connector may try to spy other connectors to pick the credentials for
56 external web sites. It can be done by reading the environment variables or
57 [ptracing](https://en.wikipedia.org/wiki/Ptrace) them.
58
59 #### DoS
60
61 A connector can use a lot of CPU, Ram, or generate a lot of disk I/O to make a
62 deny of service on the server. The connector can remove files on the server to
63 make konnectors stop working.
64
65 #### Exploiting the CPU or the bandwidth
66
67 The resources of the server can be seen as valuable: the CPU can be used for
68 bitcoins mining. The bandwidth can be used for DDoS of an external target.
69
70 #### Sending spam
71
72 Profit of the configured SMTP server to send spams.
73
74 #### Be root
75
76 [Row hammer](https://en.wikipedia.org/wiki/Row_hammer) can be a way to gain root
77 access on a server.
78
79 ### Possible measures
80
81 #### Permissions
82
83 We can forbid the konnectors to speak directly with couchdb, and pass by the
84 stack for that. And use the [permissions](../permissions.md) to restrict what each
85 konnectors can do with the cozy-stack.
86
87 #### ignore-scripts for npm/yarn
88
89 Npm and yarn can execute scripts defined in package.json when installing nodejs
90 dependencies. We can use the
91 [`ignore-scripts`](https://docs.npmjs.com/misc/config#ignore-scripts) option to
92 disable this behaviour.
93
94 #### Forbid addons in nodejs
95
96 Nodejs can require [addons](https://nodejs.org/api/addons.html), ie C/C++
97 compiled libraries. I've found no flag to disable the install of such modules
98 for npm/yarn, and no flag for nodejs to prevent loading them. We can try to
99 detect and remove such modules just after the installation of node modules. They
100 should have a `.node` extension.
101
102 **Note**: not having a compiler on the server is not enough. Npm can install
103 precompiled modules.
104
105 #### vm/sandbox for Nodejs
106
107 [vm2](https://github.com/patriksimek/vm2) is a sandbox that can run untrusted
108 code with allowed Node's built-in modules.
109
110 #### Mock net
111
112 We can mock the net module of nodejs to add some restrictions on what it can do.
113 For example, we can check that it does only http/https, and block connection
114 to localhost:6060. It is only effective if the konnector has no way to start a
115 new node processus.
116
117 #### Timeout
118
119 If a konnector takes too long, after a timeout, it should be killed. It implies
120 that the cozy-stacks supervises the konnectors.
121
122 #### Chroot
123
124 [Chroot](https://en.wikipedia.org/wiki/Chroot) is a UNIX syscall that makes an
125 application see only a part of the file-system. In particular, we can remove
126 access to `/proc` and `/sys` by not mounting them, and limit access to `/dev` to
127 just `/dev/null`, `/dev/zero`, and `/dev/random` by symlinks them.
128
129 #### Executing as another user
130
131 We can create UNIX users that will just serve to execute the konnectors, and
132 nothing else. It's a nice way to give more isolation, but it means that we have
133 to find a way to execute the konnectors: either run the cozy-stack as root, or
134 have a daemon that launches the konnectors.
135
136 #### Ulimit & Prlimit
137
138 [ulimit](http://ss64.com/bash/ulimit.html) provides control over the resources
139 available to the shell and to processes started by it, on systems that allow
140 such control. It can be used to linit the number of processes (protection
141 against fork bombs), or the memory that can be used.
142
143 [prlimit](http://man7.org/linux/man-pages/man1/prlimit.1.html) can do the same
144 for just one command (technically, for a new session, not the current one).
145
146 #### Linux namespaces
147
148 One feature Linux provides here is namespaces. There are a bunch of different
149 kinds:
150
151 - in a pid namespace you become PID 1 and then your children are other
152 processes. All the other programs are gone
153 - in a networking namespace you can run programs on any port you want without
154 it conflicting with what’s already running
155 - in a mount namespace you can mount and unmount filesystems without it
156 affecting the host filesystem. So you can have a totally different set of
157 devices mounted (usually less).
158
159 It turns out that making namespaces is totally easy! You can just run a program
160 called [unshare](http://man7.org/linux/man-pages/man1/unshare.1.html).
161
162 Source:
163 [What even is a container, by Julia Evans](https://jvns.ca/blog/2016/10/10/what-even-is-a-container/)
164
165 #### Cgroups
166
167 [cgroups](https://en.wikipedia.org/wiki/Cgroups) (abbreviated from control
168 groups) is a Linux kernel feature that limits, accounts for, and isolates the
169 resource usage (CPU, memory, disk I/O, network, etc.) of a collection of
170 processes.
171
172 #### Seccomp BPF
173
174 Seccomp BPF is an extension to seccomp that allows filtering of system calls
175 using a configurable policy implemented using Berkeley Packet Filter rules.
176
177 #### Isolation in a docker
178
179 [Isode](https://github.com/tjanczuk/isode) is a 3 years old project that aims to
180 isolate nodejs apps in docker containers. A possibility would be to follow this
181 path and isolate the konnectors inside docker.
182
183 It's a real burden for administrators. And its command line options often
184 changes from one version to another, making difficult to deploy something
185 reliable for self-hosted users. So we will try to avoid it.
186
187 Isolation in docker contains is mostly a combination of Linux Namespaces,
188 Cgroups, and Seccomp BPF. There are other options with those (see below).
189
190 #### Rkt
191
192 [Rkt](https://coreos.com/rkt/) is a security-minded, standard-based container
193 engine. It is similar to Docker, but Docker needs running a daemon whereas rkt
194 can be launched from command-line with no daemon.
195
196 #### NsJail / FireJail
197
198 [NsJail](https://google.github.io/nsjail/) and
199 [FireJail](https://firejail.wordpress.com/) are two tools that use Linux
200 Namespaces and Seccomp BPF to reduce the risks to run untrusted applications on
201 Linux. FireJail seems to be more suited for graphical apps, and NsJail for
202 networking services.
203
204 #### NaCl / ZeroVM
205
206 [ZeroVM](http://www.zerovm.org/) is an open source virtualization technology
207 that is based on the Chromium
208 [Native Client](https://en.wikipedia.org/wiki/Google_Native_Client) (NaCl)
209 project. ZeroVM creates a secure and isolated execution environment which can
210 run a single thread or application. But NaCl is
211 [no longer maintained](https://bugs.chromium.org/p/chromium/issues/detail?id=239656#c160)
212 and ZeroVM has some severe limitations, so, it won't be used.
213
214 ## Konnector isolation study
215
216 The short list of tools which will be tested to isolate connectors is Rkt and
217 NsJail which on paper better fullfill our needs.
218
219 ### NsJail
220
221 NsJail is a lightweight process isolation tool, making use of Linux namespaces
222 and seccomp-bpf syscall filters. It is not a container tool like docker. Its
223 features are quite extensive regarding isolation. The
224 [README](https://github.com/google/nsjail) gives the full list of available
225 options. Although available in the google github, it is not an official google
226 tool.
227
228 NsJail is:
229
230 - easy to install : just a make away with standard build tools
231 - offers a full list or isolation tools
232 - lightly documented the only documentation is nsjail -h (also available in
233 the main github page) and it is quite cryptic for a non-sysadmin like me. I
234 could not find any help in any search engine. Some examples are available to
235 run a back in an isolated process and work but I could not run a full nodejs
236 (only nodejs -v worked)
237 - The konnectors will need a full nodejs installed on the host
238 - Is still actively maintained
239
240 ### Rkt
241
242 Rkt is very similar to docker. It can even directly run docker images from the
243 docker registry, which gives us a lot of existing images to use, even if we want
244 to be able to use other languages than node. For example, we could have also a
245 container dedicated to weboob, another container could use phantomjs or casper
246 and without forcing self-hosted users to do complicated installation procedures.
247
248 Rkt is :
249
250 - easy to install : debian, rpm package available, archlinux community package
251 : https://github.com/coreos/rkt/releases
252 - has network isolation like docker
253 - offers CPU, memory limitation, seccomp isolation (but the set of rules to
254 use is out of my understanding)
255 - is well [documented](https://coreos.com/rkt/docs/latest/), complete man
256 pages, but not as well known as docker, then there is not a lot of things to
257 find outside the official documentation.
258 - can use docker image directly or can convert them to one runnable aci file
259 with one simple cli command (rkt export)
260 - is in active developpement but relatively stable regarding core features.
261 - container images can be easily signed and the signature is checked by
262 default when running a container.
263
264 I managed to run a nodejs container with just the following commands :
265
266 rkt run --insecure-options=image --interactive docker://node:slim --name nodeslim -- -v
267 rkt list # to get the container uuid
268 rkt export --app=nodeslim <uuid> nodeslim.aci
269 rkt run --insecure-options=image --interactive nodeslim.aci -- -v # to run node -v in the new container
270
271 Note: the --insecure-options param is to avoid the check of the image signature
272 to ease the demonstration
273
274 ### Choice
275
276 The best choice would be Rkt for it's ease of use (which is good for
277 contribution) and wide range of isolation features + access to the big docker
278 ecosystem without beeing a burden for the host administrator. Note : the
279 limitation of NsJail I saw might be due to my lack of knowledge regarding system
280 administration.
281
282 ### Proposed use of rkt regarding connectors
283
284 #### Installation
285
286 As stated before, rkt is easy to install. It may also be possible to make it
287 available in the cozy-stack docker image but I did not test it (TODO)
288
289 #### Image creation
290
291 To create an ACI file image, you just need to run a docker image one time :
292
293 rkt run --uuid-file-save=$PWD/uuid --insecure-options=image --interactive docker://node:slim --name nodeslim -- -v
294 rkt export --app=nodeslim `cat uuid` nodeslim.aci && rm uuid
295
296 The node:slim image weights 84M at this time. The node:alpine image also exists
297 and is way lighter (19M) but I had problems with DNS with this, and alpine can
298 cause some nasty bugs that are difficult to track.
299
300 #### Running a connector
301
302 A path dedicated to run the konnectors with a predefined list of node packages
303 available (the net module could be mocked with special limitations to block
304 some urls)
305
306 A script will run the node container giving as option the script to launch. The
307 path is mounted inside the container. The following script does just that
308
309 #!/usr/bin/env bash
310 rm -rf ./container_dir
311 cp -r ./container_dir_template ./container_dir
312 rkt run --net=host --environment=CREDENTIAL=value;COZY_URL=url --uuid-file-save=$PWD/uuid --volume data,kind=host,source=$PWD/container_dir --insecure-options=image nodeslim.aci --cpu=100m --memory=128M --name rktnode --mount volume=data,target=/usr/src/app --exec node -- /usr/src/app/$1 $2 &
313 # the container will handle itself the communication with the stack
314 sleep 60
315 rkt stop --force --uuid-file=uuid
316 rkt rm --uuid-file=uuid
317 rm -rf ./container_dir
318
319 This script can be run like this :
320
321 ./rkt.sh mynewkonnector.js
322
323 If the mynewkonnector.js file is available in the container_dir_template
324 directory.
325
326 Cons : must forbid access to port 5984 and 6060 + SMTP server
327
328 The limitation of time, CPU and memory will avoid most DOS attacks (to my
329 knowledge). For memory use, I still don't see a way to prevent the excessive use
330 of swap from the container. To prevent the connectors from listening to each
331 other, they should be run in containers with different uid, avoiding them to
332 listen to each other.
333
334 #### Solution to limit access of the container to 5984 and 6060 ports + SMTP
335
336 The container must be started in bridged mode. With that, the container still
337 has access to localhost but through a specific IP address visible with ifconfig.
338 That way, the host can have iptable rules to forbid access to specified ports to
339 the bridge.
340
341 To connect a container in bridge mode :
342
343 On the host create the file /etc/rkt/net.d/10-containers.conf
344
345 {
346 "name": "bridge",
347 "type": "bridge",
348 "bridge": "rkt-bridge-nat",
349 "ipMasq": true,
350 "isGateway": true,
351 "ipam": {
352 "type": "host-local",
353 "subnet": "10.2.0.0/24",
354 "routes": [
355 { "dst": "0.0.0.0/0" }
356 ]
357 }
358 }
359
360 and run your container with the "--net=bridge" option. That way, a new interface
361 is available in the container and gives you access to the host.
362
363 ## Konnector install and run details
364
365 ### Install
366
367 The konnectors will be installed in the .cozy_konnectors directory which is in
368 the VFS using git clone (like the apps at the moment).
369
370 The konnectors installation may be triggered when the user says he wants to use
371 it. The resulting repository is then kept for each run of the konnector. It may
372 then be given to the user the possibility to upgrade the konnector to the latest
373 version if any.
374
375 To update a given konnector, a `git pull` command is run on the konnector.
376
377 ### Details about running a konnector
378
379 To run a given konnector, the stack will copy this connector in a "run"
380 directory, which is not in the VFS. This directory will be given to the rocket
381 container as the current working directory with full read and write access on
382 it. This is where the container will put its logs and any temp file needed.
383 There will be also cozy-client.js and the shared libraries in a lib directory
384 inside this directory. The lib directory will be the content of the
385 [actual server lib directory](https://github.com/cozy-labs/konnectors/tree/master/server/lib).
386
387 The konnector will be run with the following environment variables :
388
389 - `COZY_CREDENTIALS` : containing the response to Oauth request as json string
390 - `COZY_URL` : to know what instance is running the konnector
391 - `COZY_FIELDS` : as a json string with all the values from the account
392 associated to the konnector.
393 - `COZY_PARAMETERS` : optional json string associated with the application,
394 used to parameterize a konnector based on a common set of code.
395
396 In the end of the konnector execution (or timeout), the logs are read in the
397 log.txt file and added to the konnector own log file (in VFS) and the run
398 directory is then destroyed.
399
400 ## Multi-account handling
401
402 This section is devoted to allow the user to use one account for multiple
403 konnectors. It will follow the following constraints in mind:
404
405 - The migration path must be as easy as possible
406 - The developpement and maintainance of konnector must also be as easy as
407 possible
408
409 ### New doctype : io.cozy.accounts
410
411 A new doctype will have to be created to allow to keep konnector accounts
412 independently from each konnector. The one once used by the email application
413 seems to be a good candidate : io.cozy.accounts
414
415 Here is an example document with this doctype :
416
417 ```
418 {
419 _id: "ojpiojpoij",
420 name: "user decided name for the account",
421 accountType: "google",
422 login: "mylogin",
423 password: "123456"
424 }
425 ```
426
427 Any attribute needed for the account may be added : email, etc...
428
429 ### Updates needed in existing application and konnectors
430
431 CRUD manipulation of io.cozy.accounts and linking them with konnectors will be
432 handled by the "my accounts" client application.
433
434 Each konnector need also to declare a new field in the "fields" attribute which
435 will be the type of account, related to the accountType field in the new account
436 docType.
437
438 Ex:
439
440 ```
441 module.exports = baseKonnector.createNew({
442 name: 'Trainline',
443 vendorLink: 'www.captaintrain.com',
444 category: 'transport',
445 color: {
446 hex: '#48D5B5',
447 css: '#48D5B5'
448 },
449 fields: {
450 login: {
451 type: 'text'
452 },
453 password: {
454 type: 'password'
455 },
456 folderPath: {
457 type: 'folder',
458 advanced: true
459 },
460 accountType: "trainline"
461 },
462 dataType: ['bill'],
463 models: [Bill],
464 fetchOperations: [
465 ...
466 ]
467 })
468 ```
469
470 With this new field, which will appear also in the io.cozy.konnectors docType,
471 the "my account" client appliction will be able to propose existing accounts of
472 the good type for activating a new konnector.
473
474 ### Migration path
475
476 For the migration of existing, activated konnectors in V2, the type of account
477 for each konnector will have to be indicated in a V2 "my account" application
478 update. After that, it will be possible to create the accounts associated to
479 each activated konnectors an link the konnectors to these accounts in a
480 migration script.
481
482 ## Study on konnectors installation on VFS
483
484 The VFS is slow and installing npm packages on it will cause some performance
485 problem. We are trying to find solution to handle that.
486
487 We found 3 possible solutions :
488
489 - Install the konnector on VFS as tar.gz files with all the dependencies
490 included by the konnector developper
491 - advantages : easy for the konnector developper, as performant as a cp,
492 no nedd for a compiled version of the konnector source, no duplication
493 of code in the repo
494 - drawbacks : The source are not readable in files application then more
495 complicated to study the konnector source, not really nice... , still
496 could take a lot of space on VFS
497 - Use webpack with `target: node` option to make a node bundle of the
498 dependencies
499 - advantages : the konnector itself stays in clear on the VFS
500 - drawbacks : forces a compilation of the sources and then a sync between
501 the source and bundle in the git repo by the konnector developper,
502 forces konnector developpers to use webpack.
503 - Install the npm dependencies with yarn in an immutable cache (--cache-folder
504 option) in a directory like deps-${konnector-git-sha1} not in VFS.
505 - advantages : easier for the konnector developper, no particular
506 dependency handling, no mandatory compilation, just a package.json in
507 the git repository, the cache can be shared by instances
508 - drawbacks : node only solution, maybe more work on the cozy-stack side
509
510 ## TODO
511
512 - [x] How to install and update the konnectors?
513 - [x] Are the konnectors installed once per server or per instance (in the VFS
514 like client-side apps)?
515 - [x] One git repository with all the konnectors (like now), or one repos per
516 konnector? Same question for package.json
517 - [ ] What API to list the konnectors for My Accounts?
518 - [ ] What workflow for developing a konnector?
519 - [ ] How to test konnectors?
520 - [x] How are managed the locales? : declared in manfiest.konnector
521 - [x] Which version of nodejs? Last LTS version bundled in a rocket container
522 - [ ] Do you keep coffeescript? Or move every konnector to ES2017? _ 28
523 konnectors in coffee _ 22 konnectors in JS
524 - [ ] What about weboob?
525 - [ ] What roadmap for transforming the konnectors-v2 in konnectors-v3?
526 - [x] What format for the konnectors manifest?
527 - [x] What permissions for a konnector?
528 - [ ] For konnectors that import files, how can we let the user select a
529 folder and have an associated permission for the konnector in this
530 folder (and not anywhere else on the virtual file system)?
531 - [ ] Can we associate the data retrieved by a konnector to a "profile"? The
532 goal is to allow a client-side to have a permission on this profile and
533 be able to read all the data fetched by a given konnector (or is tied to
534 an account)?
535 - [ ] How are logged the data exported/synchronized by a "push" konnector?
536 - [x] Analyze the konnectors node_modules _ no compiled modules currently _ 28
537 dependencies that install 65 MB for 271 modules in production \* 71
538 dependencies that install 611 MB for 858 modules with dev dependencies
539 - [ ] How are persisted the accounts?
540 - [x] How is executed a konnector? In particular, how the credentials are
541 given to the konnector?
542 - [ ] what should expose a konnector (data, functions, etc)? \*
543 https://github.com/cozy-labs/konnectors/issues/695
544 - [ ] How can we support konnectors with OAuth?