go.chromium.org/luci@v0.0.0-20240309015107-7cdc2e660f33/logdog/README.md (about)

     1  LogDog
     2  ======
     3  
     4  LogDog is a high-performance log collection and dissemination platform. It is
     5  designed to collect log data from a large number of cooperative individual
     6  sources and make it available to users and systems for consumption. It is
     7  composed of several services and tools that work cooperatively to provide a
     8  reliable log streaming platform.
     9  
    10  Like other LUCI components, LogDog primarily aims to be useful to the
    11  [Chromium](https://www.chromium.org/) project.
    12  
    13  LogDog offers several useful features:
    14  
    15  * Log data is streamed, and is consequently available the moment that it is
    16    ingested in the system.
    17  * Flexible hierarchial log namespaces for organization and navigation.
    18  * Recognition of different projects, and application of different ACLs for each
    19    project.
    20  * Able to stream text, binary data, or records.
    21  * Long term (possibly indefinite) log data storage and access.
    22  * Log data is sourced from read-forward streams (think files, sockets, etc.).
    23  * Leverages the LUCI Configuration Service for configuration and management.
    24  * Log data is implemented as [protobufs](api/logpb/log.proto).
    25  * The entire platform is written in Go.
    26  * Rich metadata is collected and stored alongside log records.
    27  * Built entirely on scalable platform technologies, targeting Google Cloud
    28    Platform.
    29    * Resource requirements scale linearly with log volume.
    30  
    31  
    32  ## APIs
    33  
    34  Most applications will interact with a LogDog Coordinator instance via its
    35  [Coordinator Logs API](api/endpoints/coordinator/logs/v1).
    36  
    37  Chrome Operations currently runs a LogDog instance serving *.chromium.org
    38  which is located at logs.chromium.org. You can view it's RPC explorer
    39  [here](https://logs.chromium.org/rpcexplorer/services/).
    40  Access it through the command line with the prpc from depot_tools with
    41  `prpc show logs.chromium.org`
    42  
    43  ## Life of a Log Stream
    44  
    45  Log streams pass through several layers and states during their path from
    46  generation through archival.
    47  
    48  1. **Streaming**: A log stream is being emitted by a **Butler** instance and
    49     pushed through the **Transport Layer** to the **Collector**.
    50    1. **Pre-Registration**: The log stream hasn't been observed by a
    51       **Collector** instance yet, and exists only in the mind of the **Butler**
    52       and the **Transport** layer.
    53    1. **Registered**: The log stream has been observed by a **Collector**
    54       instance and successfully registered with the **Coordinator**. At this
    55       point, it becomes queryable, listable, and the records that have been
    56       loaded into **Intermediate Storage** are streamable.
    57  1. **ArchivePending**: One of the following events cause the log stream to be
    58     recognized as finished and have an archival request dispatched. The archival
    59     request is submitted to the **Archivist** cluster.
    60     * The log stream's terminal entry is collected, and the terminal index is
    61       successfully registered with the **Coordinator**.
    62     * A sufficient amount of time has expired since the log stream's
    63       registration.
    64  1. **Archived**: An **Archivist** instance has received an archival request for
    65     the log stream, successfully executed the request according to its
    66     parameters, and updated the log stream's state with the **Coordinator**.
    67  
    68  
    69  Most of the lifecycle is hidden from the Logs API endpoint by design. The user
    70  need not distinguish between a stream that is streaming, has archival pending,
    71  or has been archived. They will issue the same `Get` requests and receive the
    72  same log stream data.
    73  
    74  A user may differentiate between a streaming and a complete log by observing its
    75  terminal index, which will be `< 0` if the log stream is still streaming.
    76  
    77  
    78  ## Components
    79  
    80  The LogDog platform consists of several components:
    81  
    82  * [Coordinator](appengine/coordinator), a hosted service which serves log data
    83    to users and manages the log stream lifecycle.
    84  * [Butler](client/cmd/logdog_butler), which runs on each log stream producing
    85    system and serves log data to the Collector for consumption.
    86  * [Collector](server/cmd/logdog_collector), a microservice which takes log
    87    stream data and ingests it into intermediate storage for streaming and
    88    archival.
    89  * [Archivist](server/cmd/logdog_archivist), a microservice which compacts
    90    completed log streams and prepares them for long-term storage.
    91  
    92  LogDog offers a CLI client to query and view log streams.
    93  
    94  Additionally, LogDog is built on several abstract middleware technologies,
    95  including:
    96  
    97  * A **Transport**, a layer for the **Butler** to send data to the **Collector**.
    98  * An **Intermediate Storage**, a fast highly-accessible layer which stores log
    99    data immediately ingested by the **Collector** until it can be archived.
   100  * An **Archival Storage**, for cheap long-term file storage.
   101  
   102  Log data is sent from the **Butler** through **Transport** to the **Collector**,
   103  which stages it in **Intermediate Storage**. Once the log stream is complete
   104  (or expired), the **Archivist** moves the data from **Intermediate Storage** to
   105  **Archival Storage**, where it will permanently reside.
   106  
   107  The Chromium-deployed LogDog service uses
   108  [Google Cloud Platform](https://cloud.google.com/) for several of the middleware
   109  layers:
   110  
   111  * [Google AppEngine](https://cloud.google.com/appengine), a scaling application
   112    hosting service.
   113  * [Cloud Datastore](https://cloud.google.com/datastore/), a powerful
   114    transactional NOSQL structured data storage system. This is used by the
   115    Coordinator to store log stream state.
   116  * [Cloud Pub/Sub](https://cloud.google.com/pubsub/), a publish / subscribe model
   117    transport layer. This is used to ferry log data from **Butler** instances to
   118    **Collector** instances for ingest.
   119  * [Cloud BigTable](https://cloud.google.com/bigtable/), an unstructured
   120    key/value storage. This is used as **intermediate storage** for log stream
   121    data.
   122  * [Cloud Storage](https://cloud.google.com/storage/), used for long-term log
   123    stream archival storage.
   124  * [Container Engine](https://cloud.google.com/container-engine/), which manages
   125    Kubernetes clusters. This is used to host the **Collector** and **Archivist**
   126    microservices.
   127  
   128  Additionally, other LUCI services are used, including:
   129  
   130  * [Auth Service](https://github.com/luci/luci-py/tree/master/appengine/auth_service),
   131    a configurable hosted access control system.
   132  * [Configuration Service](https://github.com/luci/luci-py/tree/master/appengine/config_service),
   133    a simple repository-based configuration service.
   134  
   135  ## Instantiation
   136  
   137  To instantiate your own LogDog instance, you will need the following
   138  prerequisites:
   139  
   140  * A **Configuration Service** instance.
   141  * A Google Cloud Platform project configured with:
   142    * Datastore
   143    * A Pub/Sub topic (Butler) and subscription (Collector) for log streaming.
   144    * A Pub/Sub topic (Coordinator) and subscription (Archivist) for archival
   145      coordination.
   146    * A Container Engine instance for microservice hosting.
   147    * A BigTable cluster.
   148    * A Cloud Storage bucket for archival staging and storage.
   149  
   150  Other compatible optional components include:
   151  
   152  * An **Auth Service** instance to manage authentication. This is necessary if
   153    something stricter than public read/write is desired.
   154  
   155  ### Config
   156  
   157  The **Configuration Service** must have a valid service entry text protobuf for
   158  this LogDog service (defined in
   159  [svcconfig/config.proto](api/config/svcconfig/config.proto)).
   160  
   161  ### Coordinator
   162  
   163  After deploying the Coordinator to a suitable cloud project, several
   164  configuration parameters must be defined visit its settings page at:
   165  `https://<your-app>/admin/portal`, and configure:
   166  
   167  * Configure the "Configuration Service Settings" to point to the **Configuration
   168    Service** instance.
   169  * If using timeseries monitoring, update the "Time Series Monitoring Settings".
   170  * If using **Auth Service**, set the "Authorization Settings".
   171  
   172  If you are using a BigTable instance outside of your cloud project (e.g.,
   173  staging, dev), you will need to add your BigTable service account JSON to the
   174  service's settings. Currently this cannot be done without a command-line tool.
   175  Hopefully a proper settings page will be added to enable this, or alternatively
   176  Cloud BigTable will be updated to support IAM.