golang.org/toolchain@v0.0.1-go1.9rc2.windows-amd64/src/cmd/vendor/github.com/google/pprof/doc/developer/profile.proto.md (about)

     1  This is a description of the profile.proto format.
     2  
     3  # Overview
     4  
     5  Profile.proto is a data representation for profile data. It is independent of
     6  the type of data being collected and the sampling process used to collect that
     7  data. On disk, it is represented as a gzip-compressed protocol buffer, described
     8  at src/proto/profile.proto
     9  
    10  A profile in this context refers to a collection of samples, each one
    11  representing measurements performed at a certain point in the life of a job. A
    12  sample associates a set of measurement values with a list of locations, commonly
    13  representing the program call stack when the sample was taken.
    14  
    15  Tools such as pprof analyze these samples and display this information in
    16  multiple forms, such as identifying hottest locations, building graphical call
    17  graphs or trees, etc.
    18  
    19  # General structure of a profile
    20  
    21  A profile is represented on a Profile message, which contain the following
    22  fields:
    23  
    24  * *sample*: A profile sample, with the values measured and the associated call
    25    stack as a list of location ids. Samples with identical call stacks can be
    26    merged by adding their respective values, element by element.
    27  * *location*: A unique place in the program, commonly mapped to a single
    28    instruction address. It has a unique nonzero id, to be referenced from the
    29    samples. It contains source information in the form of lines, and a mapping id
    30    that points to a binary.
    31  * *function*: A program function as defined in the program source. It has a
    32    unique nonzero id, referenced from the location lines. It contains a
    33    human-readable name for the function (eg a C++ demangled name), a system name
    34    (eg a C++ mangled name), the name of the corresponding source file, and other
    35    function attributes.
    36  * *mapping*: A binary that is part of the program during the profile
    37    collection. It has a unique nonzero id, referenced from the locations. It
    38    includes details on how the binary was mapped during program execution. By
    39    convention the main program binary is the first mapping, followed by any
    40    shared libraries.
    41  * *string_table*: All strings in the profile are represented as indices into
    42    this repeating field. The first string is empty, so index == 0 always
    43    represents the empty string.
    44  
    45  # Measurement values
    46  
    47  Measurement values are represented as 64-bit integers. The profile contains an
    48  explicit description of each value represented, using a ValueType message, with
    49  two fields:
    50  
    51  * *Type*: A human-readable description of the type semantics. For example “cpu”
    52    to represent CPU time, “wall” or “time” for wallclock time, or “memory” for
    53    bytes allocated.
    54  * *Unit*: A human-readable name of the unit represented by the 64-bit integer
    55    values. For example, it could be “nanoseconds” or “milliseconds” for a time
    56    value, or “bytes” or “megabytes” for a memory size. If this is just
    57    representing a number of events, the recommended unit name is “count”.
    58  
    59  A profile can represent multiple measurements per sample, but all samples must
    60  have the same number and type of measurements. The actual values are stored in
    61  the Sample.value fields, each one described by the corresponding
    62  Profile.sample_type field.
    63  
    64  Some profiles have a uniform period that describe the granularity of the data
    65  collection. For example, a CPU profile may have a period of 100ms, or a memory
    66  allocation profile may have a period of 512kb. Profiles can optionally describe
    67  such a value on the Profile.period and Profile.period_type fields. The profile
    68  period is meant for human consumption and does not affect the interpretation of
    69  the profiling data.
    70  
    71  By convention, the first value on all profiles is the number of samples
    72  collected at this call stack, with unit “count”. Because the profile does not
    73  describe the sampling process beyond the optional period, it must include
    74  unsampled values for all measurements. For example, a CPU profile could have
    75  value[0] == samples, and value[1] == time in milliseconds.
    76  
    77  ## Locations, functions and mappings
    78  
    79  Each sample lists the id of each location where the sample was collected, in
    80  bottom-up order. Each location has an explicit unique nonzero integer id,
    81  independent of its position in the profile, and holds additional information to
    82  identify the corresponding source.
    83  
    84  The profile source is expected to perform any adjustment required to the
    85  locations in order to point to the calls in the stack. For example, if the
    86  profile source extracts the call stack by walking back over the program stack,
    87  it must adjust the instruction addresses to point to the actual call
    88  instruction, instead of the instruction that each call will return to.
    89  
    90  Sources usually generate profiles that fall into these two categories:
    91  
    92  * *Unsymbolized profiles*: These only contain instruction addresses, and are to
    93    be symbolized by a separate tool. It is critical for each location to point to
    94    a valid mapping, which will provide the information required for
    95    symbolization. These are used for profiles of compiled languages, such as C++
    96    and Go.
    97  
    98  * *Symbolized profiles*: These contain all the symbol information available for
    99    the profile. Mappings and instruction addresses are optional for symbolized
   100    locations. These are used for profiles of interpreted or jitted languages,
   101    such as Java or Python.  Also, the profile format allows the generation of
   102    mixed profiles, with symbolized and unsymbolized locations.
   103  
   104  The symbol information is represented in the repeating lines field of the
   105  Location message. A location has multiple lines if it reflects multiple program
   106  sources, for example if representing inlined call stacks. Lines reference
   107  functions by their unique nonzero id, and the source line number within the
   108  source file listed by the function. A function contains the source attributes
   109  for a function, including its name, source file, etc. Functions include both a
   110  user and a system form of the name, for example to include C++ demangled and
   111  mangled names. For profiles where only a single name exists, both should be set
   112  to the same string.
   113  
   114  Mappings are also referenced from locations by their unique nonzero id, and
   115  include all information needed to symbolize addresses within the mapping. It
   116  includes similar information to the Linux /proc/self/maps file. Locations
   117  associated to a mapping should have addresses that land between the mapping
   118  start and limit. Also, if available, mappings should include a build id to
   119  uniquely identify the version of the binary being used.
   120  
   121  ## Labels
   122  
   123  Samples optionally contain labels, which are annotations to discriminate samples
   124  with identical locations. For example, a label can be used on a malloc profile
   125  to indicate allocation size, so two samples on the same call stack with sizes
   126  2MB and 4MB do not get merged into a single sample with two allocations and a
   127  size of 6MB.
   128  
   129  Labels can be string-based or numeric. They are represented by the Label
   130  message, with a key identifying the label and either a string or numeric
   131  value. For numeric labels, by convention the key represents the measurement unit
   132  of the numeric value. So for the previous example, the samples would have labels
   133  {“bytes”, 2097152} and {“bytes”, 4194304}.
   134  
   135  ## Keep and drop expressions
   136  
   137  Some profile sources may have knowledge of locations that are uninteresting or
   138  irrelevant. However, if symbolization is needed in order to identify these
   139  locations, the profile source may not be able to remove them when the profile is
   140  generated. The profile format provides a mechanism to identify these frames by
   141  name, through regular expressions.
   142  
   143  These expressions must match the function name in its entirety. Frames that
   144  match Profile.drop\_frames will be dropped from the profile, along with any
   145  frames below it. Frames that match Profile.keep\_frames will be kept, even if
   146  they match drop\_frames.
   147