github.com/gondor/docker@v1.9.0-rc1/experimental/userns.md (about)

     1  # Experimental: User namespace support
     2  
     3  Linux kernel [user namespace support](http://man7.org/linux/man-pages/man7/user_namespaces.7.html) provides additional security by enabling
     4  a process--and therefore a container--to have a unique range of user and
     5  group IDs which are outside the traditional user and group range utilized by
     6  the host system. Potentially the most important security improvement is that,
     7  by default, container processes running as the `root` user will have expected
     8  administrative privilege (with some restrictions) inside the container but will
     9  effectively be mapped to an unprivileged `uid` on the host.
    10  
    11  In this experimental phase, the Docker daemon creates a single daemon-wide mapping
    12  for all containers running on the same engine instance. The mappings will
    13  utilize the existing subordinate user and group ID feature available on all modern
    14  Linux distributions.
    15  The [`/etc/subuid`](http://man7.org/linux/man-pages/man5/subuid.5.html) and 
    16  [`/etc/subgid`](http://man7.org/linux/man-pages/man5/subgid.5.html) files will be
    17  read for the user, and optional group, specified to the `--userns-remap` 
    18  parameter.  If you do not wish to specify your own user and/or group, you can 
    19  provide `default` as the value to this flag, and a user will be created on your behalf
    20  and provided subordinate uid and gid ranges. This default user will be named
    21  `dockremap`, and entries will be created for it in `/etc/passwd` and 
    22  `/etc/group` using your distro's standard user and group creation tools.
    23  
    24  > **Note**: The single mapping per-daemon restriction exists for this experimental
    25  > phase because Docker shares image layers from its local cache across all
    26  > containers running on the engine instance.  Since file ownership must be
    27  > the same for all containers sharing the same layer content, the decision
    28  > was made to map the file ownership on `docker pull` to the daemon's user and
    29  > group mappings so that there is no delay for running containers once the
    30  > content is downloaded--exactly the same performance characteristics as with
    31  > user namespaces disabled.
    32  
    33  ## Starting the daemon with user namespaces enabled
    34  To enable this experimental user namespace support for a Docker daemon instance,
    35  start the daemon with the aforementioned `--userns-remap` flag, which accepts
    36  values in the following formats:
    37  
    38   - uid
    39   - uid:gid
    40   - username
    41   - username:groupname
    42  
    43  If numeric IDs are provided, translation back to valid user or group names
    44  will occur so that the subordinate uid and gid information can be read, given
    45  these resources are name-based, not id-based.  If the numeric ID information
    46  provided does not exist as entries in `/etc/passwd` or `/etc/group`, dameon
    47  startup will fail with an error message.
    48  
    49  *An example: starting with default Docker user management:*
    50  
    51  ```
    52       $ docker daemon --userns-remap=default
    53  ```    
    54  In this case, Docker will create--or find the existing--user and group
    55  named `dockremap`. If the user is created, and the Linux distribution has
    56  appropriate support, the `/etc/subuid` and `/etc/subgid` files will be populated
    57  with a contiguous 65536 length range of subordinate user and group IDs, starting
    58  at an offset based on prior entries in those files.  For example, Ubuntu will
    59  create the following range, based on an existing user already having the first
    60  65536 range:
    61  
    62  ```
    63       $ cat /etc/subuid
    64       user1:100000:65536
    65       dockremap:165536:65536
    66  ```
    67  
    68  > **Note:** On a fresh Fedora install, we found that we had to `touch` the
    69  > `/etc/subuid` and `/etc/subgid` files to have ranges assigned when users
    70  > were created.  Once these files existed, range assigment on user creation
    71  > worked properly.
    72  
    73  If you have a preferred/self-managed user with subordinate ID mappings already
    74  configured, you can provide that username or uid to the `--userns-remap` flag.
    75  If you have a group that doesn't match the username, you may provide the `gid`
    76  or group name as well; otherwise the username will be used as the group name
    77  when querying the system for the subordinate group ID range.
    78  
    79  ## Detailed information on `subuid`/`subgid` ranges
    80  
    81  Given there may be advanced use of the subordinate ID ranges by power users, we will
    82  describe how the Docker daemon uses the range entries within these files under the
    83  current experimental user namespace support.
    84  
    85  The simplest case exists where only one contiguous range is defined for the
    86  provided user or group. In this case, Docker will use that entire contiguous
    87  range for the mapping of host uids and gids to the container process.  This 
    88  means that the first ID in the range will be the remapped root user, and the
    89  IDs above that initial ID will map host ID 1 through the end of the range.
    90  
    91  From the example `/etc/subid` content shown above, that means the remapped root
    92  user would be uid 165536.
    93  
    94  If the system administrator has set up multiple ranges for a single user or
    95  group, the Docker daemon will read all the available ranges and use the
    96  following algorithm to create the mapping ranges:
    97  
    98  1. The ranges will be sorted by *start ID* ascending
    99  2. Maps will be created from each range with where the host ID will increment starting at 0 for the first range, 0+*range1* length for the second, and so on.  This means that the lowest range start ID will be the remapped root, and all further ranges will map IDs from 1 through the uid or gid that equals the sum of all range lengths.
   100  3. Ranges segments above five will be ignored as the kernel ignores any ID maps after five (in `/proc/self/{u,g}id_map`)
   101  
   102  ## User namespace known restrictions
   103  
   104  The following standard Docker features are currently incompatible when
   105  running a Docker daemon with experimental user namespaces enabled:
   106  
   107   - sharing namespaces with the host (--pid=host, --net=host, etc.)
   108   - sharing namespaces with other containers (--net=container:*other*)
   109   - A `--readonly` container filesystem (a Linux kernel restriction on remount with new flags of a currently mounted filesystem when inside a user namespace)
   110   - external (volume/graph) drivers which are unaware/incapable of using daemon user mappings
   111   - Using `--privileged` mode containers
   112   - Using the lxc execdriver (only the `native` execdriver is enabled to use user namespaces)
   113   - volume use without pre-arranging proper file ownership in mounted volumes
   114  
   115  Additionally, while the `root` user inside a user namespaced container
   116  process has many of the privileges of the administrative root user, the
   117  following operations will fail:
   118  
   119   - Use of `mknod` - permission is denied for device creation by the container root
   120   - others will be listed here when fully tested