github.com/cozy/cozy-stack@v0.0.0-20240603063001-31110fa4cae1/docs/archives/konnectors-design.md (about)

     1  [Table of contents](../README.md#table-of-contents)
     2  
     3  This document was written on february and march 2017.
     4  
     5  # Konnectors
     6  
     7  :warning: **Note:** this documentation is outdated. It is kept for historical
     8  reasons and shows the design and trade-offs we have made initially. But a lot
     9  of things have changed since, so don't expect the things to be still the same.
    10  
    11  ## What we want ?
    12  
    13  [Konnectors](https://github.com/cozy-labs/konnectors) is an application for Cozy
    14  v2 that fetch data from different web sites and services, and save them into a
    15  Cozy. The 50+ connectors represent a lot of work from the community. So, we want
    16  to port it to Cozy v3. There will be 2 parts:
    17  
    18  -   My Accounts, a client-side app, that will offer the possibility for the user
    19      to configure her accounts, and choose when to start the import of data (see
    20      [the architecture doc](https://github.com/cozy-labs/konnectors/blob/development/docs/client-side-architecture.md)).
    21  -   Konnectors, a worker for the [job service](../jobs.md), with the code to import
    22      data from the web sites.
    23  
    24  ## Security
    25  
    26  ### The risks
    27  
    28  Konnectors is not just a random application. It's a very good target for attacks
    29  on Cozy because of these specificities:
    30  
    31  -   It run on the server, where there is no Content Security Policy, or firewall
    32      to protect the stack.
    33  -   It has access to Internet, by design.
    34  -   It is written in nodejs, with a lot of dependencies where it is easy to hide
    35      malicious code.
    36  -   It is a collection of connectors written by a lot of people. We welcome
    37      these contributions, but it also means that we take into account that we
    38      can't review in depth all the contributions.
    39  
    40  #### Access to couchdb
    41  
    42  The stack has the admin credentials of couchdb. If a rogue code can read its
    43  configuration file or intercept connexions between the stack and couchdb, it
    44  will have access to couchdb with the admin credentials, and can do anything on
    45  couchdb.
    46  
    47  #### Access to the stack
    48  
    49  An attacker can try to profit of konnectors for accessing the stack. It can
    50  target the port 6060, used by the stack to manage the cozy instances. Or, it can
    51  use its privileged position for timing attacks on passwords.
    52  
    53  #### Spying other connectors
    54  
    55  A rogue connector may try to spy other connectors to pick the credentials for
    56  external web sites. It can be done by reading the environment variables or
    57  [ptracing](https://en.wikipedia.org/wiki/Ptrace) them.
    58  
    59  #### DoS
    60  
    61  A connector can use a lot of CPU, Ram, or generate a lot of disk I/O to make a
    62  deny of service on the server. The connector can remove files on the server to
    63  make konnectors stop working.
    64  
    65  #### Exploiting the CPU or the bandwidth
    66  
    67  The resources of the server can be seen as valuable: the CPU can be used for
    68  bitcoins mining. The bandwidth can be used for DDoS of an external target.
    69  
    70  #### Sending spam
    71  
    72  Profit of the configured SMTP server to send spams.
    73  
    74  #### Be root
    75  
    76  [Row hammer](https://en.wikipedia.org/wiki/Row_hammer) can be a way to gain root
    77  access on a server.
    78  
    79  ### Possible measures
    80  
    81  #### Permissions
    82  
    83  We can forbid the konnectors to speak directly with couchdb, and pass by the
    84  stack for that. And use the [permissions](../permissions.md) to restrict what each
    85  konnectors can do with the cozy-stack.
    86  
    87  #### ignore-scripts for npm/yarn
    88  
    89  Npm and yarn can execute scripts defined in package.json when installing nodejs
    90  dependencies. We can use the
    91  [`ignore-scripts`](https://docs.npmjs.com/misc/config#ignore-scripts) option to
    92  disable this behaviour.
    93  
    94  #### Forbid addons in nodejs
    95  
    96  Nodejs can require [addons](https://nodejs.org/api/addons.html), ie C/C++
    97  compiled libraries. I've found no flag to disable the install of such modules
    98  for npm/yarn, and no flag for nodejs to prevent loading them. We can try to
    99  detect and remove such modules just after the installation of node modules. They
   100  should have a `.node` extension.
   101  
   102  **Note**: not having a compiler on the server is not enough. Npm can install
   103  precompiled modules.
   104  
   105  #### vm/sandbox for Nodejs
   106  
   107  [vm2](https://github.com/patriksimek/vm2) is a sandbox that can run untrusted
   108  code with allowed Node's built-in modules.
   109  
   110  #### Mock net
   111  
   112  We can mock the net module of nodejs to add some restrictions on what it can do.
   113  For example, we can check that it does only http/https, and block connection
   114  to localhost:6060. It is only effective if the konnector has no way to start a
   115  new node processus.
   116  
   117  #### Timeout
   118  
   119  If a konnector takes too long, after a timeout, it should be killed. It implies
   120  that the cozy-stacks supervises the konnectors.
   121  
   122  #### Chroot
   123  
   124  [Chroot](https://en.wikipedia.org/wiki/Chroot) is a UNIX syscall that makes an
   125  application see only a part of the file-system. In particular, we can remove
   126  access to `/proc` and `/sys` by not mounting them, and limit access to `/dev` to
   127  just `/dev/null`, `/dev/zero`, and `/dev/random` by symlinks them.
   128  
   129  #### Executing as another user
   130  
   131  We can create UNIX users that will just serve to execute the konnectors, and
   132  nothing else. It's a nice way to give more isolation, but it means that we have
   133  to find a way to execute the konnectors: either run the cozy-stack as root, or
   134  have a daemon that launches the konnectors.
   135  
   136  #### Ulimit & Prlimit
   137  
   138  [ulimit](http://ss64.com/bash/ulimit.html) provides control over the resources
   139  available to the shell and to processes started by it, on systems that allow
   140  such control. It can be used to linit the number of processes (protection
   141  against fork bombs), or the memory that can be used.
   142  
   143  [prlimit](http://man7.org/linux/man-pages/man1/prlimit.1.html) can do the same
   144  for just one command (technically, for a new session, not the current one).
   145  
   146  #### Linux namespaces
   147  
   148  One feature Linux provides here is namespaces. There are a bunch of different
   149  kinds:
   150  
   151  -   in a pid namespace you become PID 1 and then your children are other
   152      processes. All the other programs are gone
   153  -   in a networking namespace you can run programs on any port you want without
   154      it conflicting with what’s already running
   155  -   in a mount namespace you can mount and unmount filesystems without it
   156      affecting the host filesystem. So you can have a totally different set of
   157      devices mounted (usually less).
   158  
   159  It turns out that making namespaces is totally easy! You can just run a program
   160  called [unshare](http://man7.org/linux/man-pages/man1/unshare.1.html).
   161  
   162  Source:
   163  [What even is a container, by Julia Evans](https://jvns.ca/blog/2016/10/10/what-even-is-a-container/)
   164  
   165  #### Cgroups
   166  
   167  [cgroups](https://en.wikipedia.org/wiki/Cgroups) (abbreviated from control
   168  groups) is a Linux kernel feature that limits, accounts for, and isolates the
   169  resource usage (CPU, memory, disk I/O, network, etc.) of a collection of
   170  processes.
   171  
   172  #### Seccomp BPF
   173  
   174  Seccomp BPF is an extension to seccomp that allows filtering of system calls
   175  using a configurable policy implemented using Berkeley Packet Filter rules.
   176  
   177  #### Isolation in a docker
   178  
   179  [Isode](https://github.com/tjanczuk/isode) is a 3 years old project that aims to
   180  isolate nodejs apps in docker containers. A possibility would be to follow this
   181  path and isolate the konnectors inside docker.
   182  
   183  It's a real burden for administrators. And its command line options often
   184  changes from one version to another, making difficult to deploy something
   185  reliable for self-hosted users. So we will try to avoid it.
   186  
   187  Isolation in docker contains is mostly a combination of Linux Namespaces,
   188  Cgroups, and Seccomp BPF. There are other options with those (see below).
   189  
   190  #### Rkt
   191  
   192  [Rkt](https://coreos.com/rkt/) is a security-minded, standard-based container
   193  engine. It is similar to Docker, but Docker needs running a daemon whereas rkt
   194  can be launched from command-line with no daemon.
   195  
   196  #### NsJail / FireJail
   197  
   198  [NsJail](https://google.github.io/nsjail/) and
   199  [FireJail](https://firejail.wordpress.com/) are two tools that use Linux
   200  Namespaces and Seccomp BPF to reduce the risks to run untrusted applications on
   201  Linux. FireJail seems to be more suited for graphical apps, and NsJail for
   202  networking services.
   203  
   204  #### NaCl / ZeroVM
   205  
   206  [ZeroVM](http://www.zerovm.org/) is an open source virtualization technology
   207  that is based on the Chromium
   208  [Native Client](https://en.wikipedia.org/wiki/Google_Native_Client) (NaCl)
   209  project. ZeroVM creates a secure and isolated execution environment which can
   210  run a single thread or application. But NaCl is
   211  [no longer maintained](https://bugs.chromium.org/p/chromium/issues/detail?id=239656#c160)
   212  and ZeroVM has some severe limitations, so, it won't be used.
   213  
   214  ## Konnector isolation study
   215  
   216  The short list of tools which will be tested to isolate connectors is Rkt and
   217  NsJail which on paper better fullfill our needs.
   218  
   219  ### NsJail
   220  
   221  NsJail is a lightweight process isolation tool, making use of Linux namespaces
   222  and seccomp-bpf syscall filters. It is not a container tool like docker. Its
   223  features are quite extensive regarding isolation. The
   224  [README](https://github.com/google/nsjail) gives the full list of available
   225  options. Although available in the google github, it is not an official google
   226  tool.
   227  
   228  NsJail is:
   229  
   230  -   easy to install : just a make away with standard build tools
   231  -   offers a full list or isolation tools
   232  -   lightly documented the only documentation is nsjail -h (also available in
   233      the main github page) and it is quite cryptic for a non-sysadmin like me. I
   234      could not find any help in any search engine. Some examples are available to
   235      run a back in an isolated process and work but I could not run a full nodejs
   236      (only nodejs -v worked)
   237  -   The konnectors will need a full nodejs installed on the host
   238  -   Is still actively maintained
   239  
   240  ### Rkt
   241  
   242  Rkt is very similar to docker. It can even directly run docker images from the
   243  docker registry, which gives us a lot of existing images to use, even if we want
   244  to be able to use other languages than node. For example, we could have also a
   245  container dedicated to weboob, another container could use phantomjs or casper
   246  and without forcing self-hosted users to do complicated installation procedures.
   247  
   248  Rkt is :
   249  
   250  -   easy to install : debian, rpm package available, archlinux community package
   251      : https://github.com/coreos/rkt/releases
   252  -   has network isolation like docker
   253  -   offers CPU, memory limitation, seccomp isolation (but the set of rules to
   254      use is out of my understanding)
   255  -   is well [documented](https://coreos.com/rkt/docs/latest/), complete man
   256      pages, but not as well known as docker, then there is not a lot of things to
   257      find outside the official documentation.
   258  -   can use docker image directly or can convert them to one runnable aci file
   259      with one simple cli command (rkt export)
   260  -   is in active developpement but relatively stable regarding core features.
   261  -   container images can be easily signed and the signature is checked by
   262      default when running a container.
   263  
   264  I managed to run a nodejs container with just the following commands :
   265  
   266      rkt run --insecure-options=image --interactive docker://node:slim --name nodeslim -- -v
   267      rkt list   # to get the container uuid
   268      rkt export --app=nodeslim <uuid> nodeslim.aci
   269      rkt run --insecure-options=image --interactive nodeslim.aci -- -v  # to run node -v in the new container
   270  
   271  Note: the --insecure-options param is to avoid the check of the image signature
   272  to ease the demonstration
   273  
   274  ### Choice
   275  
   276  The best choice would be Rkt for it's ease of use (which is good for
   277  contribution) and wide range of isolation features + access to the big docker
   278  ecosystem without beeing a burden for the host administrator. Note : the
   279  limitation of NsJail I saw might be due to my lack of knowledge regarding system
   280  administration.
   281  
   282  ### Proposed use of rkt regarding connectors
   283  
   284  #### Installation
   285  
   286  As stated before, rkt is easy to install. It may also be possible to make it
   287  available in the cozy-stack docker image but I did not test it (TODO)
   288  
   289  #### Image creation
   290  
   291  To create an ACI file image, you just need to run a docker image one time :
   292  
   293      rkt run  --uuid-file-save=$PWD/uuid --insecure-options=image --interactive docker://node:slim --name nodeslim -- -v
   294      rkt export --app=nodeslim `cat uuid` nodeslim.aci && rm uuid
   295  
   296  The node:slim image weights 84M at this time. The node:alpine image also exists
   297  and is way lighter (19M) but I had problems with DNS with this, and alpine can
   298  cause some nasty bugs that are difficult to track.
   299  
   300  #### Running a connector
   301  
   302  A path dedicated to run the konnectors with a predefined list of node packages
   303  available (the net module could be mocked with special limitations to block
   304  some urls)
   305  
   306  A script will run the node container giving as option the script to launch. The
   307  path is mounted inside the container. The following script does just that
   308  
   309      #!/usr/bin/env bash
   310      rm -rf ./container_dir
   311      cp -r ./container_dir_template ./container_dir
   312      rkt run --net=host --environment=CREDENTIAL=value;COZY_URL=url --uuid-file-save=$PWD/uuid --volume data,kind=host,source=$PWD/container_dir --insecure-options=image nodeslim.aci --cpu=100m --memory=128M --name rktnode --mount volume=data,target=/usr/src/app --exec node -- /usr/src/app/$1 $2 &
   313      # the container will handle itself the communication with the stack
   314      sleep 60
   315      rkt stop --force --uuid-file=uuid
   316      rkt rm --uuid-file=uuid
   317      rm -rf ./container_dir
   318  
   319  This script can be run like this :
   320  
   321      ./rkt.sh mynewkonnector.js
   322  
   323  If the mynewkonnector.js file is available in the container_dir_template
   324  directory.
   325  
   326  Cons : must forbid access to port 5984 and 6060 + SMTP server
   327  
   328  The limitation of time, CPU and memory will avoid most DOS attacks (to my
   329  knowledge). For memory use, I still don't see a way to prevent the excessive use
   330  of swap from the container. To prevent the connectors from listening to each
   331  other, they should be run in containers with different uid, avoiding them to
   332  listen to each other.
   333  
   334  #### Solution to limit access of the container to 5984 and 6060 ports + SMTP
   335  
   336  The container must be started in bridged mode. With that, the container still
   337  has access to localhost but through a specific IP address visible with ifconfig.
   338  That way, the host can have iptable rules to forbid access to specified ports to
   339  the bridge.
   340  
   341  To connect a container in bridge mode :
   342  
   343  On the host create the file /etc/rkt/net.d/10-containers.conf
   344  
   345      {
   346          "name": "bridge",
   347          "type": "bridge",
   348          "bridge": "rkt-bridge-nat",
   349          "ipMasq": true,
   350          "isGateway": true,
   351          "ipam": {
   352              "type": "host-local",
   353              "subnet": "10.2.0.0/24",
   354              "routes": [
   355                     { "dst": "0.0.0.0/0" }
   356              ]
   357          }
   358      }
   359  
   360  and run your container with the "--net=bridge" option. That way, a new interface
   361  is available in the container and gives you access to the host.
   362  
   363  ## Konnector install and run details
   364  
   365  ### Install
   366  
   367  The konnectors will be installed in the .cozy_konnectors directory which is in
   368  the VFS using git clone (like the apps at the moment).
   369  
   370  The konnectors installation may be triggered when the user says he wants to use
   371  it. The resulting repository is then kept for each run of the konnector. It may
   372  then be given to the user the possibility to upgrade the konnector to the latest
   373  version if any.
   374  
   375  To update a given konnector, a `git pull` command is run on the konnector.
   376  
   377  ### Details about running a konnector
   378  
   379  To run a given konnector, the stack will copy this connector in a "run"
   380  directory, which is not in the VFS. This directory will be given to the rocket
   381  container as the current working directory with full read and write access on
   382  it. This is where the container will put its logs and any temp file needed.
   383  There will be also cozy-client.js and the shared libraries in a lib directory
   384  inside this directory. The lib directory will be the content of the
   385  [actual server lib directory](https://github.com/cozy-labs/konnectors/tree/master/server/lib).
   386  
   387  The konnector will be run with the following environment variables :
   388  
   389  -   `COZY_CREDENTIALS` : containing the response to Oauth request as json string
   390  -   `COZY_URL` : to know what instance is running the konnector
   391  -   `COZY_FIELDS` : as a json string with all the values from the account
   392      associated to the konnector.
   393  -   `COZY_PARAMETERS` : optional json string associated with the application,
   394      used to parameterize a konnector based on a common set of code.
   395  
   396  In the end of the konnector execution (or timeout), the logs are read in the
   397  log.txt file and added to the konnector own log file (in VFS) and the run
   398  directory is then destroyed.
   399  
   400  ## Multi-account handling
   401  
   402  This section is devoted to allow the user to use one account for multiple
   403  konnectors. It will follow the following constraints in mind:
   404  
   405  -   The migration path must be as easy as possible
   406  -   The developpement and maintainance of konnector must also be as easy as
   407      possible
   408  
   409  ### New doctype : io.cozy.accounts
   410  
   411  A new doctype will have to be created to allow to keep konnector accounts
   412  independently from each konnector. The one once used by the email application
   413  seems to be a good candidate : io.cozy.accounts
   414  
   415  Here is an example document with this doctype :
   416  
   417  ```
   418  {
   419      _id: "ojpiojpoij",
   420      name: "user decided name for the account",
   421      accountType: "google",
   422      login: "mylogin",
   423      password: "123456"
   424  }
   425  ```
   426  
   427  Any attribute needed for the account may be added : email, etc...
   428  
   429  ### Updates needed in existing application and konnectors
   430  
   431  CRUD manipulation of io.cozy.accounts and linking them with konnectors will be
   432  handled by the "my accounts" client application.
   433  
   434  Each konnector need also to declare a new field in the "fields" attribute which
   435  will be the type of account, related to the accountType field in the new account
   436  docType.
   437  
   438  Ex:
   439  
   440  ```
   441  module.exports = baseKonnector.createNew({
   442    name: 'Trainline',
   443    vendorLink: 'www.captaintrain.com',
   444    category: 'transport',
   445    color: {
   446      hex: '#48D5B5',
   447      css: '#48D5B5'
   448    },
   449    fields: {
   450      login: {
   451        type: 'text'
   452      },
   453      password: {
   454        type: 'password'
   455      },
   456      folderPath: {
   457        type: 'folder',
   458        advanced: true
   459      },
   460      accountType: "trainline"
   461    },
   462    dataType: ['bill'],
   463    models: [Bill],
   464    fetchOperations: [
   465      ...
   466    ]
   467  })
   468  ```
   469  
   470  With this new field, which will appear also in the io.cozy.konnectors docType,
   471  the "my account" client appliction will be able to propose existing accounts of
   472  the good type for activating a new konnector.
   473  
   474  ### Migration path
   475  
   476  For the migration of existing, activated konnectors in V2, the type of account
   477  for each konnector will have to be indicated in a V2 "my account" application
   478  update. After that, it will be possible to create the accounts associated to
   479  each activated konnectors an link the konnectors to these accounts in a
   480  migration script.
   481  
   482  ## Study on konnectors installation on VFS
   483  
   484  The VFS is slow and installing npm packages on it will cause some performance
   485  problem. We are trying to find solution to handle that.
   486  
   487  We found 3 possible solutions :
   488  
   489  -   Install the konnector on VFS as tar.gz files with all the dependencies
   490      included by the konnector developper
   491      -   advantages : easy for the konnector developper, as performant as a cp,
   492          no nedd for a compiled version of the konnector source, no duplication
   493          of code in the repo
   494      -   drawbacks : The source are not readable in files application then more
   495          complicated to study the konnector source, not really nice... , still
   496          could take a lot of space on VFS
   497  -   Use webpack with `target: node` option to make a node bundle of the
   498      dependencies
   499      -   advantages : the konnector itself stays in clear on the VFS
   500      -   drawbacks : forces a compilation of the sources and then a sync between
   501          the source and bundle in the git repo by the konnector developper,
   502          forces konnector developpers to use webpack.
   503  -   Install the npm dependencies with yarn in an immutable cache (--cache-folder
   504      option) in a directory like deps-${konnector-git-sha1} not in VFS.
   505      -   advantages : easier for the konnector developper, no particular
   506          dependency handling, no mandatory compilation, just a package.json in
   507          the git repository, the cache can be shared by instances
   508      -   drawbacks : node only solution, maybe more work on the cozy-stack side
   509  
   510  ## TODO
   511  
   512  -   [x] How to install and update the konnectors?
   513  -   [x] Are the konnectors installed once per server or per instance (in the VFS
   514          like client-side apps)?
   515  -   [x] One git repository with all the konnectors (like now), or one repos per
   516          konnector? Same question for package.json
   517  -   [ ] What API to list the konnectors for My Accounts?
   518  -   [ ] What workflow for developing a konnector?
   519  -   [ ] How to test konnectors?
   520  -   [x] How are managed the locales? : declared in manfiest.konnector
   521  -   [x] Which version of nodejs? Last LTS version bundled in a rocket container
   522  -   [ ] Do you keep coffeescript? Or move every konnector to ES2017? _ 28
   523          konnectors in coffee _ 22 konnectors in JS
   524  -   [ ] What about weboob?
   525  -   [ ] What roadmap for transforming the konnectors-v2 in konnectors-v3?
   526  -   [x] What format for the konnectors manifest?
   527  -   [x] What permissions for a konnector?
   528  -   [ ] For konnectors that import files, how can we let the user select a
   529          folder and have an associated permission for the konnector in this
   530          folder (and not anywhere else on the virtual file system)?
   531  -   [ ] Can we associate the data retrieved by a konnector to a "profile"? The
   532          goal is to allow a client-side to have a permission on this profile and
   533          be able to read all the data fetched by a given konnector (or is tied to
   534          an account)?
   535  -   [ ] How are logged the data exported/synchronized by a "push" konnector?
   536  -   [x] Analyze the konnectors node_modules _ no compiled modules currently _ 28
   537          dependencies that install 65 MB for 271 modules in production \* 71
   538          dependencies that install 611 MB for 858 modules with dev dependencies
   539  -   [ ] How are persisted the accounts?
   540  -   [x] How is executed a konnector? In particular, how the credentials are
   541          given to the konnector?
   542  -   [ ] what should expose a konnector (data, functions, etc)? \*
   543          https://github.com/cozy-labs/konnectors/issues/695
   544  -   [ ] How can we support konnectors with OAuth?