github.com/nya3jp/tast@v0.0.0-20230601000426-85c8e4d83a9b/docs/using_uidetection.md (about)

     1  # Tast Codelab: Image-based UI Detection (go/tast-uidetection)
     2  
     3  > This document assumes that you are familiar with writng Tast tests
     4  > (go/tast-writing), and have already gone through [Codelab #1], [Codelab #2]
     5  > and [Codelab #3]
     6  
     7  This codelab follows the creation of a Tast test that uses the image-based UI
     8  detection library. It goes over the general setups, how to use it, and some
     9  common issues.
    10  
    11  [Codelab #1]: codelab_1.md
    12  [Codelab #2]: codelab_2.md
    13  [Codelab #3]: codelab_3.md
    14  
    15  [TOC]
    16  
    17  ## Background
    18  
    19  The [uiauto] library uses the Chrome accessibility tree to view and control the
    20  current state of the UI. The accessibility tree has access to:
    21  
    22  *   The Chrome Browser
    23  *   The ChromeOS Desktop UI
    24  *   ChromeOS packaged apps
    25  *   Web Apps/PWAs
    26  
    27  That being said, it does not have access to UI elements in containers or VMs
    28  (like VDI and Crostini). To close the automation gap for these apps, we
    29  introduce another UI automation library [uidetection] that does not rely on the
    30  accessibility tree. [uidetection] makes use of computer vision techniques and is
    31  able to detect UI elements from the screen directly.
    32  
    33  Note that [uidetection] also works on UI where the accessibility tree is
    34  available, but it is **preferred** to use [uiauto] in this case due to the
    35  efficiency and stability of using the accessibility tree.
    36  
    37  Currently, we support the detections of three types of UI elements:
    38  
    39  *   Custom icon detection. It allows the developer to provide a png icon image
    40      to find a match in the screen.
    41  *   Word detection. It allows the detection of a specific word in the screen.
    42  *   Text block detection. It allows the detection of a text block (i.e., a
    43      sentence or lines of sentences) that contains specific words.
    44  
    45  In Tast, [uidetection] can be imported like so:
    46  
    47  ```go
    48  import "go.chromium.org/tast-tests/cros/local/uidetection"
    49  ```
    50  
    51  [uiauto]: https://pkg.go.dev/chromium.googlesource.com/chromiumos/platform/tast-tests.git/src/go.chromium.org/tast-tests/cros/local/chrome/uiauto
    52  [uidetection]: https://pkg.go.dev/chromium.googlesource.com/chromiumos/platform/tast-tests.git/src/go.chromium.org/tast-tests/cros/local/uidetection
    53  
    54  ## Simple Starter Test
    55  
    56  Here is some sample code using the UI detection library:
    57  
    58  ```go
    59  func init() {
    60      testing.AddTest(&testing.Test{
    61          Func:         ExampleDetection,
    62          Desc:         "Example of using image-based UI detection API",
    63          Contacts:     []string{
    64              "my-group@chromium.org",
    65              "my-ldap@chromium.org",},
    66          Attr:         []string{"group:mainline", "informational"},
    67          SoftwareDeps: []string{"chrome"},
    68          Timeout:      3 * time.Minute,
    69          Data:         []string{"logo_chrome.png"}, // Icon file for detection.
    70          Fixture:      "chromeLoggedIn",
    71      })
    72  }
    73  
    74  func ExampleDetection(ctx context.Context, s *testing.State) {
    75      // Cleanup context setup.
    76      ...
    77  
    78      cr := s.FixtValue().(*chrome.Chrome)
    79      tconn, err := cr.TestAPIConn(ctx)
    80      if err != nil {
    81          s.Fatal("Failed to create Test API connection: ", err)
    82      }
    83  
    84      ud := uidetection.NewDefault(tconn)
    85  
    86      // Put UI interaction using ud here.
    87  }
    88  ```
    89  
    90  ## UI interaction in the screen
    91  
    92  In this sample test, we will perform the following operations that covers the
    93  basic three types of detections from a newly logged-in device: 1. Click the
    94  Chrome icon to open a Chrome browser (custom icon detection). 2. Click the
    95  button that contains "Customize Chrome" (text block detection). 3. Click the
    96  "Cancel" button (word detection).
    97  
    98  ### Click the Chrome icon
    99  
   100  We first define a finder for the icon element as:
   101  
   102  ```go
   103  icon := uidetection.CustomIcon(s.DataPath("logo_chrome.png"))
   104  ```
   105  
   106  "logo_chrome.png" is the icon file declared in the test registration, see
   107  [Data files in Tast] for details on using the data files in tast.
   108  
   109  Then we can left-click the icon by:
   110  
   111  ```go
   112  if err := ud.LeftClick(icon)(ctx); err != nil {
   113    s.Fatal("Failed to click Chrome icon", err)
   114  }
   115  ```
   116  
   117  ### Click the "Customize Chrome" textblock
   118  
   119  The button that contains multiple words is represented by the textblock finder:
   120  
   121  ```go
   122  textblock := uidetection.TextBlock([]string{"Customize", "Chrome"})
   123  ```
   124  
   125  The left-click operation is done by:
   126  
   127  ```go
   128  if err := ud.LeftClick(textblock)(ctx); err != nil {
   129    s.Fatal("Failed to click Customize Chrome button", err)
   130  }
   131  ```
   132  
   133  Since the uidetection library takes stable screenshot by default, it waits for
   134  the screen to be consistent between two intervals (300 ms). When the screen is
   135  not expected to be static, i.e., in this scenario, the text cursor in the
   136  browser address bar is blinking all the time, using stable screenshot strategy
   137  can result in a test failure. The solution is to explicitly ask the API to take
   138  the immediate screenshot using
   139  `WithScreenshotStrategy(uidetection.ImmediateScreenshot)`:
   140  
   141  ```go
   142  if err:= ud.WithScreenshotStrategy(uidetection.ImmediateScreenshot).LeftClick(textblock)(ctx); err != nil {
   143    s.Fatal("Failed to click Customize Chrome button", err)
   144  }
   145  ```
   146  
   147  ### Click the "Cancel" word
   148  
   149  Similarly to the textblock detection, this operation can be defined as:
   150  
   151  ```go
   152  word := uidetection.Word("Cancel")
   153  if err := ud.LeftClick(word)(ctx); err != nil {
   154    s.Fatal("Failed to click Cancel button", err)
   155  }
   156  ```
   157  
   158  ### Ensuring the customize panel is closed
   159  
   160  Finally, after clicking the cancel button, we may also need to check whether the
   161  test succeeded. In this case, we have to decide what demonstrates that the
   162  successful close of the "customize chrome" layout. A simple solution is to check
   163  if the cancel button disappeared from the screen:
   164  
   165  ```go
   166  if err := ud.WaitUntilGone(uidetection.Word("Cancel"))(ctx); err != nil {
   167    s.Fatal("The cancel button still exists", err)
   168  }
   169  ```
   170  
   171  ### Combine these actions
   172  
   173  Ideally, you would use [uiauto.Combine] to deal with these actions as a group:
   174  
   175  ```go
   176  if err := uiauto.Combine("verify detections",
   177          ud.LeftClick(uidetection.CustomIcon(s.DataPath("logo_chrome.png"))),
   178          ud.WithScreenshotStrategy(uidetection.ImmediateScreenshot).LeftClick(uidetection.TextBlock([]string{"Customize", "Chrome"})),
   179          ud.LeftClick(uidetection.Word("Cancel")),
   180          ud.WaitUntilGone(uidetection.Word("Cancel")),
   181      )(ctx); err != nil {
   182          s.Fatal("Failed to perform image-based UI interactions: ", err)
   183      }
   184  ```
   185  
   186  [Data files in Tast]: https://chromium.googlesource.com/chromiumos/platform/tast/+/HEAD/docs/writing_tests.md#Data-files
   187  [uiauto.Combine]: https://pkg.go.dev/chromium.googlesource.com/chromiumos/platform/tast-tests.git/src/go.chromium.org/tast-tests/cros/local/chrome/uiauto#Combine
   188  
   189  ## Full Code
   190  
   191  ```go
   192  // Copyright <copyright_year> The ChromiumOS Authors
   193  // Use of this source code is governed by a BSD-style license that can be
   194  // found in the LICENSE file.
   195  
   196  package uidetection
   197  
   198  import (
   199      "context"
   200      "time"
   201  
   202      "go.chromium.org/tast/core/ctxutil"
   203      "go.chromium.org/tast-tests/cros/local/chrome"
   204      "go.chromium.org/tast-tests/cros/local/chrome/uiauto"
   205      "go.chromium.org/tast-tests/cros/local/chrome/uiauto/faillog"
   206      "go.chromium.org/tast-tests/cros/local/uidetection"
   207      "go.chromium.org/tast/core/testing"
   208  )
   209  
   210  func init() {
   211      testing.AddTest(&testing.Test{
   212          Func: ExampleDetection,
   213          Desc: "Example of using image-based UI detection API",
   214          Contacts: []string{
   215              "my-group@chromium.org",
   216              "my-ldap@chromium.org"},
   217          Attr:         []string{"group:mainline", "informational"},
   218          SoftwareDeps: []string{"chrome"},
   219          Timeout:      3 * time.Minute,
   220          Data:         []string{"logo_chrome.png"}, // Icon file for detection.
   221          Fixture:      "chromeLoggedIn",
   222      })
   223  }
   224  
   225  func ExampleDetection(ctx context.Context, s *testing.State) {
   226      // Shorten deadline to leave time for cleanup.
   227      cleanupCtx := ctx
   228      ctx, cancel := ctxutil.Shorten(ctx, 5*time.Second)
   229      defer cancel()
   230  
   231      cr := s.FixtValue().(*chrome.Chrome)
   232      tconn, err := cr.TestAPIConn(ctx)
   233      if err != nil {
   234          s.Fatal("Failed to create Test API connection: ", err)
   235      }
   236      defer faillog.DumpUITreeWithScreenshotOnError(cleanupCtx, s.OutDir(), s.HasError, cr, "uidetection")
   237  
   238      ud := uidetection.NewDefault(tconn)
   239      if err := uiauto.Combine("verify detections",
   240          // Click Chrome logo icon to open a Chrome browser.
   241          ud.LeftClick(uidetection.CustomIcon(s.DataPath("logo_chrome.png"))),
   242          // Click the button that contains "Customize Chrome" textblock.
   243          ud.WithScreenshotStrategy(uidetection.ImmediateScreenshot).LeftClick(uidetection.TextBlock([]string{"Customize", "Chrome"})),
   244          // Click the "cancel" button.
   245          ud.LeftClick(uidetection.Word("Cancel")),
   246          // Verify the "cancel" button is gone.
   247          ud.WaitUntilGone(uidetection.Word("Cancel")),
   248      )(ctx); err != nil {
   249          s.Fatal("Failed to perform image-based UI interactions: ", err)
   250      }
   251  }
   252  ```
   253  
   254  ## Advanced UI Detections
   255  
   256  ### Region of interest (ROI) detections
   257  
   258  Using ROI detection may be desired to: 1. Improve the detection efficiency.
   259  Smaller image results in faster detection. 2. Improve the detection accuracy.
   260  The API may fail to detection small UI elements in the whole screenshot, and
   261  using ROI instead of the whole screen can help to increase the detection
   262  accuracy.
   263  
   264  Currently, the UI detection library support ROI
   265  [defined](https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/go.chromium.org/tast-tests/cros/local/uidetection/finder.go)
   266  by:
   267  
   268  1.  Image-based UI elements.
   269  
   270      *   `Within(*uidetection.Finder)` finds a UI element within another UI
   271          element.
   272      *   `Above(*uidetection.Finder)` finds a UI element that is above another UI
   273          element.
   274      *   `Below(*uidetection.Finder)` finds a UI element that is below another UI
   275          element.
   276      *   `LeftOf(*uidetection.Finder)` finds a UI element that is in the left of
   277          another UI element.
   278      *   `RightOf(*uidetection.Finder)` finds a UI element that is in the right
   279          of another UI element.
   280  
   281      **Example**: find the word `next` that is above the textblock `some
   282      textblock`:
   283  
   284      ```go
   285      word := uidetection.Word("next").Above(uidetection.Textblock([]string{"some", "textblock"})))
   286      ```
   287  
   288  2.  Assessbility-tree-based UI elements (reprensented by [nodewith.Finder]).
   289  
   290      *   `WithinA11yNode(*nodewith.Finder)` finds a UI element that is within a
   291          UI node in the accessibility tree.
   292      *   `AboveA11yNode(*nodewith.Finder)` finds a UI element that is above a UI
   293          node in the accessibility tree.
   294      *   `BelowA11yNode(*nodewith.Finder)` finds a UI element that is below a UI
   295          node in the accessibility tree.
   296      *   `LeftOfA11yNode(*nodewith.Finder)` finds a UI element that is in the
   297          left of a UI node in the accessibility tree.
   298      *   `RightOfA11yNode(*nodewith.Finder)` finds a UI element that is in the
   299          right of a UI node in the accessibility tree.
   300  
   301      **Example**: find an icon `icon.png` in the VS Code app:
   302  
   303      ```go
   304      vs_app_windown := nodewith.NameStartingWith("Get Started - Visual Studio Code").Role(role.Window).First() // Finder of the icon in the VS Code app.
   305      icon := uidetection.CustomIcon(s.DataPath("icon.png")).WithinA11yNode(vs_app_windown)
   306      ```
   307  
   308  3.  ROIs in pixel (px), **USE WITH CAUTION**.
   309  
   310      *   `WithinPx(coords.Rect)` finds a UI element in the bounding box specified
   311          in pixels.
   312      *   `AbovePx(int)` finds a UI element above a pixel.
   313      *   `BelowPx(int)` finds a UI element below a pixel.
   314      *   `LeftOfPx(int)` finds a UI element in the left of a pixel.
   315      *   `RightOfPx(int)` finds a UI element in the right of a pixel.
   316  
   317  4.  ROIs in density-independent pixels (dp) **USE WITH CAUTION**.
   318      `WithinDp(coords.Rect)`, `AboveDp(int)`, `BelowDp(int)`, `LeftOfDp(int)`,
   319      `RighttOfDp(int)` are defined analogously as the ROIs in pixels, except that
   320      they are in [density-independent pixels].
   321  
   322  Note: usage of ROIs in pixels or in density-independent pixels is generally
   323  discouraged, as they vary from devices to devices. Please first consider using
   324  other two types of ROIs.
   325  
   326  [nodewith.Finder]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/go.chromium.org/tast-tests/cros/local/chrome/uiauto/nodewith/nodewith.go?q=nodewith.go#:~:text=type%20Finder%20struct%20%7B
   327  [density-independent pixels]: https://en.wikipedia.org/wiki/Device-independent_pixel
   328  
   329  ## FAQs
   330  
   331  ### How can I test if the library can find a UI element in a screenshot before coding? (Googlers only)
   332  
   333  Try the playground at [go/acuiti-playground]. Upload a screenshot you want to
   334  test and change the detection type to 'text' or 'custom icon'. If you can
   335  find a UI element with playground, the [uidetection] library can find it too.
   336  
   337  The playground also generates a Tast code snippet matching the query.
   338  
   339  ### I am getting errors in taking stable screenshot.
   340  
   341  If you encounter error saying `screen has not stopped changing after XXXs,
   342  perhaps increase timeout or use immediate-screenshot strategy`, this happens
   343  because the screen is not static. You can check the two consecutive screenshots
   344  `uidetection_screenshot.png` and `old_uidetection_screenshot.png` to see in
   345  which location the screen keeps changing. If this is expected, try using the
   346  immediate screenshot strategy
   347  `WithScreenshotStrategy(uidetection.ImmediateScreenshot)`
   348  
   349  **Example**: Left-click the "Customize Chrome" textblock using the immediate
   350  screenshot strategy
   351  
   352  ```go
   353  ud := uidetection.NewDefault(tconn)
   354  ud.WithScreenshotStrategy(uidetection.ImmediateScreenshot).LeftClick(uidetection.TextBlock([]string{"Customize", "Chrome"})),
   355  ```
   356  
   357  ## Report Bugs
   358  
   359  If you have any issues in using this [uidetection] library, please file a bug in
   360  Buganizer to the component 1034649 and assign it to the hotlist 3788221.
   361  
   362  [go/acuiti-playground]: http://go/acuiti-playground