github.com/cozy/cozy-stack@v0.0.0-20240603063001-31110fa4cae1/docs/couchdb-quirks.md (about)

     1  # CouchDB Quirks
     2  
     3  ## Mango indexes
     4  
     5  ### Exists operator
     6  
     7  The
     8  [`$exists` operator](http://docs.couchdb.org/en/stable/api/database/find.html#condition-operators)
     9  can be used with a mango index for the `true` value, but not for the `false`
    10  value. For `false`, a more heavy solution is required:
    11  [a partial index](http://docs.couchdb.org/en/stable/api/database/find.html#find-partial-indexes).
    12  
    13  ### Index selection
    14  
    15  CouchDB may accept or refuse to use a mango index for a query, with obsure
    16  reasons. In general, you can follow these two rules of thumb:
    17  
    18  1. An index on the fields `foo, bar, baz` can be used only to fetch documents
    19     where `foo`, `bar`, and `baz` exist. It means that a query that filters only
    20     on the value on `foo` won't use the mango index, because it can miss a
    21     document where `foo` has the expected value but without `bar` or `baz`. If
    22     you know that all the documents that you want have the `bar` and `baz`
    23     fields, you can just add two filters `$exists: true` (one for `bar`, the
    24     other for `baz`).
    25  
    26  2. You should use exactly the same sequence of fields for creating the index and
    27     the `sort` operator of the query. If you have an index on `os, browser, ip`
    28     for the `io.cozy.sessions.logins`, and you want to have all the documents for
    29     a login from `windows`, sorted by `browser`, you can use the index, but you
    30     should use `os, browser, ip` for the sort (or at least `os, browser`, even if
    31     it is seems to weird to sort on `os` when all the sorted documents will have
    32     the same value, `windows`). Please note that using `use_index` on a request,
    33     the results will be sorted by default according to this rule. So, you can
    34     omit the `sort` operator on the query (except if you want the `descending`
    35     order).
    36  
    37  ### Warnings for slow requests
    38  
    39  When requesting a mango index, CouchDB can use an index. But there are also
    40  cases where no index can be used, or where the index is not optimal. Let's
    41  see the different scenarios:
    42  
    43  - CouchDB doesn't use an index, it will respond with a warning, and cozy-stack
    44    will transform this warning in an error, as developers should really avoid
    45    this issue
    46  
    47  - CouchDB can use an index for the selector but not for the sort, it will
    48    respond with an error, and the cozy-stack will just forward the error
    49  
    50  - CouchDB can use an index, but will still look at much more documents in
    51    the index that what will be in the response (it happens with `$or` and `$in`
    52    operators, which should be avoided), CouchDB 3+ will send a warning and the
    53    cozy-stack will forward the documents and the warning to the client.
    54  
    55  ### Comparison of strings
    56  
    57  Comparison of strings is done using ICU which implements the Unicode Collation
    58  Algorithm, giving a dictionary sorting of keys. This can give surprising
    59  results if you were expecting ASCII ordering. Note that:
    60  
    61  - All symbols sort before numbers and letters (even the “high” symbols like tilde, `0x7e`)
    62  - Differing sequences of letters are compared without regard to case, so `a < aa` but also `A < aa` and `a < AA`
    63  - Identical sequences of letters are compared with regard to case, with lowercase before uppercase, so `a < A`.
    64  
    65  ## Old revisions
    66  
    67  CouchDB keeps for each document a list of its revision (or more exactly a tree
    68  with replication and conflicts).
    69  
    70  It's possible to ask the list of the old revisions of a document with
    71  [`GET /db/{docid}?revs_info=true`](http://docs.couchdb.org/en/stable/api/document/common.html#get--db-docid).
    72  It works only if the document has not been deleted. For a deleted document,
    73  [a trick](https://stackoverflow.com/questions/10854883/retrieve-just-deleted-document/10857330#10857330)
    74  is to query the changes feed to know the last revision of the document, and to
    75  recreate the document from this revision.
    76  
    77  With an old revision, it's possible to get the content of the document at this
    78  revision with `GET /db/{docid}?rev={rev}` if the database was not compacted. On
    79  CouchDB 2.x, compacts happen automatically on all databases from times to times.
    80  
    81  A `purge` operation consists to remove the tombstone for the deleted documents.
    82  It is a manual operation, triggered by a
    83  [`POST /db/_purge`](http://docs.couchdb.org/en/stable/api/database/misc.html).
    84  
    85  ## Conflicts
    86  
    87  It is possible to create a conflict on CouchDB like it does for the replication
    88  by using `new_edits: false`, but it is not well documented to say the least. The
    89  more accurate description was in the old wiki, that [no longer
    90  exists](https://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Posting_Existing_Revisions).
    91  Here is a copy of what it said:
    92  
    93  > The replicator uses a special mode of \_bulk_docs. The documents it writes to
    94  > the destination database already have revision IDs that need to be preserved for
    95  > the two databases to be in sync (otherwise it would not be possible to tell that
    96  > the two represent the same revision.) To prevent the database from assigning
    97  > them new revision IDs, a "new_edits":false property is added to the JSON request
    98  > body.
    99  
   100  > Note that this changes the interpretation of the \_rev parameter in each
   101  > document: rather than being the parent revision ID to be matched against, it's
   102  > the existing revision ID that will be saved as-is into the database. And since
   103  > it's important to retain revision history when adding to the database, each
   104  > document body in this mode should have a \_revisions property that lists its
   105  > revision history; the format of this property is described on the HTTP document
   106  > API. For example:
   107  
   108  > `curl -X POST -d '{"new_edits":false,"docs":[{"_id":"person","_rev":"2-3595405","_revisions":{"start":2,"ids":["3595405","877727288"]},"name":"jim"}]}' "$OTHER_DB/_bulk_docs"`
   109  
   110  > This command will replicate one of the revisions created above, into a
   111  > separate database `OTHER_DB`. It will have the same revision ID as in `DB`,
   112  > `2-3595405`, and it will be known to have a parent revision with ID
   113  > `1-877727288`. (Even though `OTHER_DB` will not have the body of that revision,
   114  > the history will help it detect conflicts in future replications.)
   115  
   116  > As with \_all_or_nothing, this mode can create conflicts; in fact, this is
   117  > where the conflicts created by replication come from.
   118  > In short, it's a `PUT /doc/{id}?new_edits=false` with `_rev` the new revision of
   119  > the document, and `_revisions` the parents of this revision in the revisions
   120  > tree of this document.
   121  
   122  ### Conflict example
   123  
   124  Here is an example of a CouchDB conflict.
   125  
   126  Let's assume the following document with the revision history `[1-abc, 2-def]`
   127  saved in database:
   128  
   129  ```
   130  {
   131    "_id": foo,
   132    "_rev": 2-def,
   133    "bar": "tender",
   134    "_revisions": {
   135      "ids": [
   136        "def",
   137        "abc"
   138      ]
   139    }
   140  }
   141  ```
   142  
   143  The `_revisions` block is returned when passing `revs=true` to the query and
   144  gives all the revision ids, which the revision part after the dash.
   145  For instance, in `2-def`, `2` is called the "generation" and `def` the "id".
   146  
   147  We update the document with a `POST /bulk_docs` query, with the following
   148  content:
   149  
   150  ```
   151  {
   152  	"docs": [
   153  		{
   154  			"_id": "foo",
   155  			"_rev": "3-ghi",
   156  			"_revisions": { "start": 3, "ids": ["ghi", "xyz", "abc"] }
   157  			,
   158  			"bar": "racuda"
   159  		}
   160  	],
   161  	"new_edits": false
   162  }
   163  ```
   164  
   165  This produces a conflict bewteen `2-def` and `2-xyz`: the former was first saved
   166  in database, but we forced the latter to be a new child of `1-abc`. Hence, this
   167  document will have two revisions branches: `1-abc, 2-def` and `1-abc, 2-xyz, 3-ghi`.
   168  
   169  ### Sharing
   170  
   171  In the [sharing protocol](https://docs.cozy.io/en/cozy-stack/sharing-design/),
   172  we implement this behaviour as we follow the CouchDB replication model. However,
   173  we prevent CouchDB conflicts for files and directories: see [this
   174  explanation](https://docs.cozy.io/en/cozy-stack/sharing-design/#couchdb-conflicts)
   175  
   176  ## Design docs in \_all_docs
   177  
   178  When querying `GET /{db}/_all_docs`, the response include the design docs. It's
   179  quite difficult to filter them, particulary when pagination is involved. We have
   180  added an endpoint `GET /data/:doctype/_normal_docs` to the stack to help client
   181  side applications to deal with this.