github.com/cozy/cozy-stack@v0.0.0-20240603063001-31110fa4cae1/docs/archives/konnectors-design.md (about) 1 [Table of contents](../README.md#table-of-contents) 2 3 This document was written on february and march 2017. 4 5 # Konnectors 6 7 :warning: **Note:** this documentation is outdated. It is kept for historical 8 reasons and shows the design and trade-offs we have made initially. But a lot 9 of things have changed since, so don't expect the things to be still the same. 10 11 ## What we want ? 12 13 [Konnectors](https://github.com/cozy-labs/konnectors) is an application for Cozy 14 v2 that fetch data from different web sites and services, and save them into a 15 Cozy. The 50+ connectors represent a lot of work from the community. So, we want 16 to port it to Cozy v3. There will be 2 parts: 17 18 - My Accounts, a client-side app, that will offer the possibility for the user 19 to configure her accounts, and choose when to start the import of data (see 20 [the architecture doc](https://github.com/cozy-labs/konnectors/blob/development/docs/client-side-architecture.md)). 21 - Konnectors, a worker for the [job service](../jobs.md), with the code to import 22 data from the web sites. 23 24 ## Security 25 26 ### The risks 27 28 Konnectors is not just a random application. It's a very good target for attacks 29 on Cozy because of these specificities: 30 31 - It run on the server, where there is no Content Security Policy, or firewall 32 to protect the stack. 33 - It has access to Internet, by design. 34 - It is written in nodejs, with a lot of dependencies where it is easy to hide 35 malicious code. 36 - It is a collection of connectors written by a lot of people. We welcome 37 these contributions, but it also means that we take into account that we 38 can't review in depth all the contributions. 39 40 #### Access to couchdb 41 42 The stack has the admin credentials of couchdb. If a rogue code can read its 43 configuration file or intercept connexions between the stack and couchdb, it 44 will have access to couchdb with the admin credentials, and can do anything on 45 couchdb. 46 47 #### Access to the stack 48 49 An attacker can try to profit of konnectors for accessing the stack. It can 50 target the port 6060, used by the stack to manage the cozy instances. Or, it can 51 use its privileged position for timing attacks on passwords. 52 53 #### Spying other connectors 54 55 A rogue connector may try to spy other connectors to pick the credentials for 56 external web sites. It can be done by reading the environment variables or 57 [ptracing](https://en.wikipedia.org/wiki/Ptrace) them. 58 59 #### DoS 60 61 A connector can use a lot of CPU, Ram, or generate a lot of disk I/O to make a 62 deny of service on the server. The connector can remove files on the server to 63 make konnectors stop working. 64 65 #### Exploiting the CPU or the bandwidth 66 67 The resources of the server can be seen as valuable: the CPU can be used for 68 bitcoins mining. The bandwidth can be used for DDoS of an external target. 69 70 #### Sending spam 71 72 Profit of the configured SMTP server to send spams. 73 74 #### Be root 75 76 [Row hammer](https://en.wikipedia.org/wiki/Row_hammer) can be a way to gain root 77 access on a server. 78 79 ### Possible measures 80 81 #### Permissions 82 83 We can forbid the konnectors to speak directly with couchdb, and pass by the 84 stack for that. And use the [permissions](../permissions.md) to restrict what each 85 konnectors can do with the cozy-stack. 86 87 #### ignore-scripts for npm/yarn 88 89 Npm and yarn can execute scripts defined in package.json when installing nodejs 90 dependencies. We can use the 91 [`ignore-scripts`](https://docs.npmjs.com/misc/config#ignore-scripts) option to 92 disable this behaviour. 93 94 #### Forbid addons in nodejs 95 96 Nodejs can require [addons](https://nodejs.org/api/addons.html), ie C/C++ 97 compiled libraries. I've found no flag to disable the install of such modules 98 for npm/yarn, and no flag for nodejs to prevent loading them. We can try to 99 detect and remove such modules just after the installation of node modules. They 100 should have a `.node` extension. 101 102 **Note**: not having a compiler on the server is not enough. Npm can install 103 precompiled modules. 104 105 #### vm/sandbox for Nodejs 106 107 [vm2](https://github.com/patriksimek/vm2) is a sandbox that can run untrusted 108 code with allowed Node's built-in modules. 109 110 #### Mock net 111 112 We can mock the net module of nodejs to add some restrictions on what it can do. 113 For example, we can check that it does only http/https, and block connection 114 to localhost:6060. It is only effective if the konnector has no way to start a 115 new node processus. 116 117 #### Timeout 118 119 If a konnector takes too long, after a timeout, it should be killed. It implies 120 that the cozy-stacks supervises the konnectors. 121 122 #### Chroot 123 124 [Chroot](https://en.wikipedia.org/wiki/Chroot) is a UNIX syscall that makes an 125 application see only a part of the file-system. In particular, we can remove 126 access to `/proc` and `/sys` by not mounting them, and limit access to `/dev` to 127 just `/dev/null`, `/dev/zero`, and `/dev/random` by symlinks them. 128 129 #### Executing as another user 130 131 We can create UNIX users that will just serve to execute the konnectors, and 132 nothing else. It's a nice way to give more isolation, but it means that we have 133 to find a way to execute the konnectors: either run the cozy-stack as root, or 134 have a daemon that launches the konnectors. 135 136 #### Ulimit & Prlimit 137 138 [ulimit](http://ss64.com/bash/ulimit.html) provides control over the resources 139 available to the shell and to processes started by it, on systems that allow 140 such control. It can be used to linit the number of processes (protection 141 against fork bombs), or the memory that can be used. 142 143 [prlimit](http://man7.org/linux/man-pages/man1/prlimit.1.html) can do the same 144 for just one command (technically, for a new session, not the current one). 145 146 #### Linux namespaces 147 148 One feature Linux provides here is namespaces. There are a bunch of different 149 kinds: 150 151 - in a pid namespace you become PID 1 and then your children are other 152 processes. All the other programs are gone 153 - in a networking namespace you can run programs on any port you want without 154 it conflicting with what’s already running 155 - in a mount namespace you can mount and unmount filesystems without it 156 affecting the host filesystem. So you can have a totally different set of 157 devices mounted (usually less). 158 159 It turns out that making namespaces is totally easy! You can just run a program 160 called [unshare](http://man7.org/linux/man-pages/man1/unshare.1.html). 161 162 Source: 163 [What even is a container, by Julia Evans](https://jvns.ca/blog/2016/10/10/what-even-is-a-container/) 164 165 #### Cgroups 166 167 [cgroups](https://en.wikipedia.org/wiki/Cgroups) (abbreviated from control 168 groups) is a Linux kernel feature that limits, accounts for, and isolates the 169 resource usage (CPU, memory, disk I/O, network, etc.) of a collection of 170 processes. 171 172 #### Seccomp BPF 173 174 Seccomp BPF is an extension to seccomp that allows filtering of system calls 175 using a configurable policy implemented using Berkeley Packet Filter rules. 176 177 #### Isolation in a docker 178 179 [Isode](https://github.com/tjanczuk/isode) is a 3 years old project that aims to 180 isolate nodejs apps in docker containers. A possibility would be to follow this 181 path and isolate the konnectors inside docker. 182 183 It's a real burden for administrators. And its command line options often 184 changes from one version to another, making difficult to deploy something 185 reliable for self-hosted users. So we will try to avoid it. 186 187 Isolation in docker contains is mostly a combination of Linux Namespaces, 188 Cgroups, and Seccomp BPF. There are other options with those (see below). 189 190 #### Rkt 191 192 [Rkt](https://coreos.com/rkt/) is a security-minded, standard-based container 193 engine. It is similar to Docker, but Docker needs running a daemon whereas rkt 194 can be launched from command-line with no daemon. 195 196 #### NsJail / FireJail 197 198 [NsJail](https://google.github.io/nsjail/) and 199 [FireJail](https://firejail.wordpress.com/) are two tools that use Linux 200 Namespaces and Seccomp BPF to reduce the risks to run untrusted applications on 201 Linux. FireJail seems to be more suited for graphical apps, and NsJail for 202 networking services. 203 204 #### NaCl / ZeroVM 205 206 [ZeroVM](http://www.zerovm.org/) is an open source virtualization technology 207 that is based on the Chromium 208 [Native Client](https://en.wikipedia.org/wiki/Google_Native_Client) (NaCl) 209 project. ZeroVM creates a secure and isolated execution environment which can 210 run a single thread or application. But NaCl is 211 [no longer maintained](https://bugs.chromium.org/p/chromium/issues/detail?id=239656#c160) 212 and ZeroVM has some severe limitations, so, it won't be used. 213 214 ## Konnector isolation study 215 216 The short list of tools which will be tested to isolate connectors is Rkt and 217 NsJail which on paper better fullfill our needs. 218 219 ### NsJail 220 221 NsJail is a lightweight process isolation tool, making use of Linux namespaces 222 and seccomp-bpf syscall filters. It is not a container tool like docker. Its 223 features are quite extensive regarding isolation. The 224 [README](https://github.com/google/nsjail) gives the full list of available 225 options. Although available in the google github, it is not an official google 226 tool. 227 228 NsJail is: 229 230 - easy to install : just a make away with standard build tools 231 - offers a full list or isolation tools 232 - lightly documented the only documentation is nsjail -h (also available in 233 the main github page) and it is quite cryptic for a non-sysadmin like me. I 234 could not find any help in any search engine. Some examples are available to 235 run a back in an isolated process and work but I could not run a full nodejs 236 (only nodejs -v worked) 237 - The konnectors will need a full nodejs installed on the host 238 - Is still actively maintained 239 240 ### Rkt 241 242 Rkt is very similar to docker. It can even directly run docker images from the 243 docker registry, which gives us a lot of existing images to use, even if we want 244 to be able to use other languages than node. For example, we could have also a 245 container dedicated to weboob, another container could use phantomjs or casper 246 and without forcing self-hosted users to do complicated installation procedures. 247 248 Rkt is : 249 250 - easy to install : debian, rpm package available, archlinux community package 251 : https://github.com/coreos/rkt/releases 252 - has network isolation like docker 253 - offers CPU, memory limitation, seccomp isolation (but the set of rules to 254 use is out of my understanding) 255 - is well [documented](https://coreos.com/rkt/docs/latest/), complete man 256 pages, but not as well known as docker, then there is not a lot of things to 257 find outside the official documentation. 258 - can use docker image directly or can convert them to one runnable aci file 259 with one simple cli command (rkt export) 260 - is in active developpement but relatively stable regarding core features. 261 - container images can be easily signed and the signature is checked by 262 default when running a container. 263 264 I managed to run a nodejs container with just the following commands : 265 266 rkt run --insecure-options=image --interactive docker://node:slim --name nodeslim -- -v 267 rkt list # to get the container uuid 268 rkt export --app=nodeslim <uuid> nodeslim.aci 269 rkt run --insecure-options=image --interactive nodeslim.aci -- -v # to run node -v in the new container 270 271 Note: the --insecure-options param is to avoid the check of the image signature 272 to ease the demonstration 273 274 ### Choice 275 276 The best choice would be Rkt for it's ease of use (which is good for 277 contribution) and wide range of isolation features + access to the big docker 278 ecosystem without beeing a burden for the host administrator. Note : the 279 limitation of NsJail I saw might be due to my lack of knowledge regarding system 280 administration. 281 282 ### Proposed use of rkt regarding connectors 283 284 #### Installation 285 286 As stated before, rkt is easy to install. It may also be possible to make it 287 available in the cozy-stack docker image but I did not test it (TODO) 288 289 #### Image creation 290 291 To create an ACI file image, you just need to run a docker image one time : 292 293 rkt run --uuid-file-save=$PWD/uuid --insecure-options=image --interactive docker://node:slim --name nodeslim -- -v 294 rkt export --app=nodeslim `cat uuid` nodeslim.aci && rm uuid 295 296 The node:slim image weights 84M at this time. The node:alpine image also exists 297 and is way lighter (19M) but I had problems with DNS with this, and alpine can 298 cause some nasty bugs that are difficult to track. 299 300 #### Running a connector 301 302 A path dedicated to run the konnectors with a predefined list of node packages 303 available (the net module could be mocked with special limitations to block 304 some urls) 305 306 A script will run the node container giving as option the script to launch. The 307 path is mounted inside the container. The following script does just that 308 309 #!/usr/bin/env bash 310 rm -rf ./container_dir 311 cp -r ./container_dir_template ./container_dir 312 rkt run --net=host --environment=CREDENTIAL=value;COZY_URL=url --uuid-file-save=$PWD/uuid --volume data,kind=host,source=$PWD/container_dir --insecure-options=image nodeslim.aci --cpu=100m --memory=128M --name rktnode --mount volume=data,target=/usr/src/app --exec node -- /usr/src/app/$1 $2 & 313 # the container will handle itself the communication with the stack 314 sleep 60 315 rkt stop --force --uuid-file=uuid 316 rkt rm --uuid-file=uuid 317 rm -rf ./container_dir 318 319 This script can be run like this : 320 321 ./rkt.sh mynewkonnector.js 322 323 If the mynewkonnector.js file is available in the container_dir_template 324 directory. 325 326 Cons : must forbid access to port 5984 and 6060 + SMTP server 327 328 The limitation of time, CPU and memory will avoid most DOS attacks (to my 329 knowledge). For memory use, I still don't see a way to prevent the excessive use 330 of swap from the container. To prevent the connectors from listening to each 331 other, they should be run in containers with different uid, avoiding them to 332 listen to each other. 333 334 #### Solution to limit access of the container to 5984 and 6060 ports + SMTP 335 336 The container must be started in bridged mode. With that, the container still 337 has access to localhost but through a specific IP address visible with ifconfig. 338 That way, the host can have iptable rules to forbid access to specified ports to 339 the bridge. 340 341 To connect a container in bridge mode : 342 343 On the host create the file /etc/rkt/net.d/10-containers.conf 344 345 { 346 "name": "bridge", 347 "type": "bridge", 348 "bridge": "rkt-bridge-nat", 349 "ipMasq": true, 350 "isGateway": true, 351 "ipam": { 352 "type": "host-local", 353 "subnet": "10.2.0.0/24", 354 "routes": [ 355 { "dst": "0.0.0.0/0" } 356 ] 357 } 358 } 359 360 and run your container with the "--net=bridge" option. That way, a new interface 361 is available in the container and gives you access to the host. 362 363 ## Konnector install and run details 364 365 ### Install 366 367 The konnectors will be installed in the .cozy_konnectors directory which is in 368 the VFS using git clone (like the apps at the moment). 369 370 The konnectors installation may be triggered when the user says he wants to use 371 it. The resulting repository is then kept for each run of the konnector. It may 372 then be given to the user the possibility to upgrade the konnector to the latest 373 version if any. 374 375 To update a given konnector, a `git pull` command is run on the konnector. 376 377 ### Details about running a konnector 378 379 To run a given konnector, the stack will copy this connector in a "run" 380 directory, which is not in the VFS. This directory will be given to the rocket 381 container as the current working directory with full read and write access on 382 it. This is where the container will put its logs and any temp file needed. 383 There will be also cozy-client.js and the shared libraries in a lib directory 384 inside this directory. The lib directory will be the content of the 385 [actual server lib directory](https://github.com/cozy-labs/konnectors/tree/master/server/lib). 386 387 The konnector will be run with the following environment variables : 388 389 - `COZY_CREDENTIALS` : containing the response to Oauth request as json string 390 - `COZY_URL` : to know what instance is running the konnector 391 - `COZY_FIELDS` : as a json string with all the values from the account 392 associated to the konnector. 393 - `COZY_PARAMETERS` : optional json string associated with the application, 394 used to parameterize a konnector based on a common set of code. 395 396 In the end of the konnector execution (or timeout), the logs are read in the 397 log.txt file and added to the konnector own log file (in VFS) and the run 398 directory is then destroyed. 399 400 ## Multi-account handling 401 402 This section is devoted to allow the user to use one account for multiple 403 konnectors. It will follow the following constraints in mind: 404 405 - The migration path must be as easy as possible 406 - The developpement and maintainance of konnector must also be as easy as 407 possible 408 409 ### New doctype : io.cozy.accounts 410 411 A new doctype will have to be created to allow to keep konnector accounts 412 independently from each konnector. The one once used by the email application 413 seems to be a good candidate : io.cozy.accounts 414 415 Here is an example document with this doctype : 416 417 ``` 418 { 419 _id: "ojpiojpoij", 420 name: "user decided name for the account", 421 accountType: "google", 422 login: "mylogin", 423 password: "123456" 424 } 425 ``` 426 427 Any attribute needed for the account may be added : email, etc... 428 429 ### Updates needed in existing application and konnectors 430 431 CRUD manipulation of io.cozy.accounts and linking them with konnectors will be 432 handled by the "my accounts" client application. 433 434 Each konnector need also to declare a new field in the "fields" attribute which 435 will be the type of account, related to the accountType field in the new account 436 docType. 437 438 Ex: 439 440 ``` 441 module.exports = baseKonnector.createNew({ 442 name: 'Trainline', 443 vendorLink: 'www.captaintrain.com', 444 category: 'transport', 445 color: { 446 hex: '#48D5B5', 447 css: '#48D5B5' 448 }, 449 fields: { 450 login: { 451 type: 'text' 452 }, 453 password: { 454 type: 'password' 455 }, 456 folderPath: { 457 type: 'folder', 458 advanced: true 459 }, 460 accountType: "trainline" 461 }, 462 dataType: ['bill'], 463 models: [Bill], 464 fetchOperations: [ 465 ... 466 ] 467 }) 468 ``` 469 470 With this new field, which will appear also in the io.cozy.konnectors docType, 471 the "my account" client appliction will be able to propose existing accounts of 472 the good type for activating a new konnector. 473 474 ### Migration path 475 476 For the migration of existing, activated konnectors in V2, the type of account 477 for each konnector will have to be indicated in a V2 "my account" application 478 update. After that, it will be possible to create the accounts associated to 479 each activated konnectors an link the konnectors to these accounts in a 480 migration script. 481 482 ## Study on konnectors installation on VFS 483 484 The VFS is slow and installing npm packages on it will cause some performance 485 problem. We are trying to find solution to handle that. 486 487 We found 3 possible solutions : 488 489 - Install the konnector on VFS as tar.gz files with all the dependencies 490 included by the konnector developper 491 - advantages : easy for the konnector developper, as performant as a cp, 492 no nedd for a compiled version of the konnector source, no duplication 493 of code in the repo 494 - drawbacks : The source are not readable in files application then more 495 complicated to study the konnector source, not really nice... , still 496 could take a lot of space on VFS 497 - Use webpack with `target: node` option to make a node bundle of the 498 dependencies 499 - advantages : the konnector itself stays in clear on the VFS 500 - drawbacks : forces a compilation of the sources and then a sync between 501 the source and bundle in the git repo by the konnector developper, 502 forces konnector developpers to use webpack. 503 - Install the npm dependencies with yarn in an immutable cache (--cache-folder 504 option) in a directory like deps-${konnector-git-sha1} not in VFS. 505 - advantages : easier for the konnector developper, no particular 506 dependency handling, no mandatory compilation, just a package.json in 507 the git repository, the cache can be shared by instances 508 - drawbacks : node only solution, maybe more work on the cozy-stack side 509 510 ## TODO 511 512 - [x] How to install and update the konnectors? 513 - [x] Are the konnectors installed once per server or per instance (in the VFS 514 like client-side apps)? 515 - [x] One git repository with all the konnectors (like now), or one repos per 516 konnector? Same question for package.json 517 - [ ] What API to list the konnectors for My Accounts? 518 - [ ] What workflow for developing a konnector? 519 - [ ] How to test konnectors? 520 - [x] How are managed the locales? : declared in manfiest.konnector 521 - [x] Which version of nodejs? Last LTS version bundled in a rocket container 522 - [ ] Do you keep coffeescript? Or move every konnector to ES2017? _ 28 523 konnectors in coffee _ 22 konnectors in JS 524 - [ ] What about weboob? 525 - [ ] What roadmap for transforming the konnectors-v2 in konnectors-v3? 526 - [x] What format for the konnectors manifest? 527 - [x] What permissions for a konnector? 528 - [ ] For konnectors that import files, how can we let the user select a 529 folder and have an associated permission for the konnector in this 530 folder (and not anywhere else on the virtual file system)? 531 - [ ] Can we associate the data retrieved by a konnector to a "profile"? The 532 goal is to allow a client-side to have a permission on this profile and 533 be able to read all the data fetched by a given konnector (or is tied to 534 an account)? 535 - [ ] How are logged the data exported/synchronized by a "push" konnector? 536 - [x] Analyze the konnectors node_modules _ no compiled modules currently _ 28 537 dependencies that install 65 MB for 271 modules in production \* 71 538 dependencies that install 611 MB for 858 modules with dev dependencies 539 - [ ] How are persisted the accounts? 540 - [x] How is executed a konnector? In particular, how the credentials are 541 given to the konnector? 542 - [ ] what should expose a konnector (data, functions, etc)? \* 543 https://github.com/cozy-labs/konnectors/issues/695 544 - [ ] How can we support konnectors with OAuth?