github.com/nya3jp/tast@v0.0.0-20230601000426-85c8e4d83a9b/docs/using_uidetection.md (about) 1 # Tast Codelab: Image-based UI Detection (go/tast-uidetection) 2 3 > This document assumes that you are familiar with writng Tast tests 4 > (go/tast-writing), and have already gone through [Codelab #1], [Codelab #2] 5 > and [Codelab #3] 6 7 This codelab follows the creation of a Tast test that uses the image-based UI 8 detection library. It goes over the general setups, how to use it, and some 9 common issues. 10 11 [Codelab #1]: codelab_1.md 12 [Codelab #2]: codelab_2.md 13 [Codelab #3]: codelab_3.md 14 15 [TOC] 16 17 ## Background 18 19 The [uiauto] library uses the Chrome accessibility tree to view and control the 20 current state of the UI. The accessibility tree has access to: 21 22 * The Chrome Browser 23 * The ChromeOS Desktop UI 24 * ChromeOS packaged apps 25 * Web Apps/PWAs 26 27 That being said, it does not have access to UI elements in containers or VMs 28 (like VDI and Crostini). To close the automation gap for these apps, we 29 introduce another UI automation library [uidetection] that does not rely on the 30 accessibility tree. [uidetection] makes use of computer vision techniques and is 31 able to detect UI elements from the screen directly. 32 33 Note that [uidetection] also works on UI where the accessibility tree is 34 available, but it is **preferred** to use [uiauto] in this case due to the 35 efficiency and stability of using the accessibility tree. 36 37 Currently, we support the detections of three types of UI elements: 38 39 * Custom icon detection. It allows the developer to provide a png icon image 40 to find a match in the screen. 41 * Word detection. It allows the detection of a specific word in the screen. 42 * Text block detection. It allows the detection of a text block (i.e., a 43 sentence or lines of sentences) that contains specific words. 44 45 In Tast, [uidetection] can be imported like so: 46 47 ```go 48 import "go.chromium.org/tast-tests/cros/local/uidetection" 49 ``` 50 51 [uiauto]: https://pkg.go.dev/chromium.googlesource.com/chromiumos/platform/tast-tests.git/src/go.chromium.org/tast-tests/cros/local/chrome/uiauto 52 [uidetection]: https://pkg.go.dev/chromium.googlesource.com/chromiumos/platform/tast-tests.git/src/go.chromium.org/tast-tests/cros/local/uidetection 53 54 ## Simple Starter Test 55 56 Here is some sample code using the UI detection library: 57 58 ```go 59 func init() { 60 testing.AddTest(&testing.Test{ 61 Func: ExampleDetection, 62 Desc: "Example of using image-based UI detection API", 63 Contacts: []string{ 64 "my-group@chromium.org", 65 "my-ldap@chromium.org",}, 66 Attr: []string{"group:mainline", "informational"}, 67 SoftwareDeps: []string{"chrome"}, 68 Timeout: 3 * time.Minute, 69 Data: []string{"logo_chrome.png"}, // Icon file for detection. 70 Fixture: "chromeLoggedIn", 71 }) 72 } 73 74 func ExampleDetection(ctx context.Context, s *testing.State) { 75 // Cleanup context setup. 76 ... 77 78 cr := s.FixtValue().(*chrome.Chrome) 79 tconn, err := cr.TestAPIConn(ctx) 80 if err != nil { 81 s.Fatal("Failed to create Test API connection: ", err) 82 } 83 84 ud := uidetection.NewDefault(tconn) 85 86 // Put UI interaction using ud here. 87 } 88 ``` 89 90 ## UI interaction in the screen 91 92 In this sample test, we will perform the following operations that covers the 93 basic three types of detections from a newly logged-in device: 1. Click the 94 Chrome icon to open a Chrome browser (custom icon detection). 2. Click the 95 button that contains "Customize Chrome" (text block detection). 3. Click the 96 "Cancel" button (word detection). 97 98 ### Click the Chrome icon 99 100 We first define a finder for the icon element as: 101 102 ```go 103 icon := uidetection.CustomIcon(s.DataPath("logo_chrome.png")) 104 ``` 105 106 "logo_chrome.png" is the icon file declared in the test registration, see 107 [Data files in Tast] for details on using the data files in tast. 108 109 Then we can left-click the icon by: 110 111 ```go 112 if err := ud.LeftClick(icon)(ctx); err != nil { 113 s.Fatal("Failed to click Chrome icon", err) 114 } 115 ``` 116 117 ### Click the "Customize Chrome" textblock 118 119 The button that contains multiple words is represented by the textblock finder: 120 121 ```go 122 textblock := uidetection.TextBlock([]string{"Customize", "Chrome"}) 123 ``` 124 125 The left-click operation is done by: 126 127 ```go 128 if err := ud.LeftClick(textblock)(ctx); err != nil { 129 s.Fatal("Failed to click Customize Chrome button", err) 130 } 131 ``` 132 133 Since the uidetection library takes stable screenshot by default, it waits for 134 the screen to be consistent between two intervals (300 ms). When the screen is 135 not expected to be static, i.e., in this scenario, the text cursor in the 136 browser address bar is blinking all the time, using stable screenshot strategy 137 can result in a test failure. The solution is to explicitly ask the API to take 138 the immediate screenshot using 139 `WithScreenshotStrategy(uidetection.ImmediateScreenshot)`: 140 141 ```go 142 if err:= ud.WithScreenshotStrategy(uidetection.ImmediateScreenshot).LeftClick(textblock)(ctx); err != nil { 143 s.Fatal("Failed to click Customize Chrome button", err) 144 } 145 ``` 146 147 ### Click the "Cancel" word 148 149 Similarly to the textblock detection, this operation can be defined as: 150 151 ```go 152 word := uidetection.Word("Cancel") 153 if err := ud.LeftClick(word)(ctx); err != nil { 154 s.Fatal("Failed to click Cancel button", err) 155 } 156 ``` 157 158 ### Ensuring the customize panel is closed 159 160 Finally, after clicking the cancel button, we may also need to check whether the 161 test succeeded. In this case, we have to decide what demonstrates that the 162 successful close of the "customize chrome" layout. A simple solution is to check 163 if the cancel button disappeared from the screen: 164 165 ```go 166 if err := ud.WaitUntilGone(uidetection.Word("Cancel"))(ctx); err != nil { 167 s.Fatal("The cancel button still exists", err) 168 } 169 ``` 170 171 ### Combine these actions 172 173 Ideally, you would use [uiauto.Combine] to deal with these actions as a group: 174 175 ```go 176 if err := uiauto.Combine("verify detections", 177 ud.LeftClick(uidetection.CustomIcon(s.DataPath("logo_chrome.png"))), 178 ud.WithScreenshotStrategy(uidetection.ImmediateScreenshot).LeftClick(uidetection.TextBlock([]string{"Customize", "Chrome"})), 179 ud.LeftClick(uidetection.Word("Cancel")), 180 ud.WaitUntilGone(uidetection.Word("Cancel")), 181 )(ctx); err != nil { 182 s.Fatal("Failed to perform image-based UI interactions: ", err) 183 } 184 ``` 185 186 [Data files in Tast]: https://chromium.googlesource.com/chromiumos/platform/tast/+/HEAD/docs/writing_tests.md#Data-files 187 [uiauto.Combine]: https://pkg.go.dev/chromium.googlesource.com/chromiumos/platform/tast-tests.git/src/go.chromium.org/tast-tests/cros/local/chrome/uiauto#Combine 188 189 ## Full Code 190 191 ```go 192 // Copyright <copyright_year> The ChromiumOS Authors 193 // Use of this source code is governed by a BSD-style license that can be 194 // found in the LICENSE file. 195 196 package uidetection 197 198 import ( 199 "context" 200 "time" 201 202 "go.chromium.org/tast/core/ctxutil" 203 "go.chromium.org/tast-tests/cros/local/chrome" 204 "go.chromium.org/tast-tests/cros/local/chrome/uiauto" 205 "go.chromium.org/tast-tests/cros/local/chrome/uiauto/faillog" 206 "go.chromium.org/tast-tests/cros/local/uidetection" 207 "go.chromium.org/tast/core/testing" 208 ) 209 210 func init() { 211 testing.AddTest(&testing.Test{ 212 Func: ExampleDetection, 213 Desc: "Example of using image-based UI detection API", 214 Contacts: []string{ 215 "my-group@chromium.org", 216 "my-ldap@chromium.org"}, 217 Attr: []string{"group:mainline", "informational"}, 218 SoftwareDeps: []string{"chrome"}, 219 Timeout: 3 * time.Minute, 220 Data: []string{"logo_chrome.png"}, // Icon file for detection. 221 Fixture: "chromeLoggedIn", 222 }) 223 } 224 225 func ExampleDetection(ctx context.Context, s *testing.State) { 226 // Shorten deadline to leave time for cleanup. 227 cleanupCtx := ctx 228 ctx, cancel := ctxutil.Shorten(ctx, 5*time.Second) 229 defer cancel() 230 231 cr := s.FixtValue().(*chrome.Chrome) 232 tconn, err := cr.TestAPIConn(ctx) 233 if err != nil { 234 s.Fatal("Failed to create Test API connection: ", err) 235 } 236 defer faillog.DumpUITreeWithScreenshotOnError(cleanupCtx, s.OutDir(), s.HasError, cr, "uidetection") 237 238 ud := uidetection.NewDefault(tconn) 239 if err := uiauto.Combine("verify detections", 240 // Click Chrome logo icon to open a Chrome browser. 241 ud.LeftClick(uidetection.CustomIcon(s.DataPath("logo_chrome.png"))), 242 // Click the button that contains "Customize Chrome" textblock. 243 ud.WithScreenshotStrategy(uidetection.ImmediateScreenshot).LeftClick(uidetection.TextBlock([]string{"Customize", "Chrome"})), 244 // Click the "cancel" button. 245 ud.LeftClick(uidetection.Word("Cancel")), 246 // Verify the "cancel" button is gone. 247 ud.WaitUntilGone(uidetection.Word("Cancel")), 248 )(ctx); err != nil { 249 s.Fatal("Failed to perform image-based UI interactions: ", err) 250 } 251 } 252 ``` 253 254 ## Advanced UI Detections 255 256 ### Region of interest (ROI) detections 257 258 Using ROI detection may be desired to: 1. Improve the detection efficiency. 259 Smaller image results in faster detection. 2. Improve the detection accuracy. 260 The API may fail to detection small UI elements in the whole screenshot, and 261 using ROI instead of the whole screen can help to increase the detection 262 accuracy. 263 264 Currently, the UI detection library support ROI 265 [defined](https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/go.chromium.org/tast-tests/cros/local/uidetection/finder.go) 266 by: 267 268 1. Image-based UI elements. 269 270 * `Within(*uidetection.Finder)` finds a UI element within another UI 271 element. 272 * `Above(*uidetection.Finder)` finds a UI element that is above another UI 273 element. 274 * `Below(*uidetection.Finder)` finds a UI element that is below another UI 275 element. 276 * `LeftOf(*uidetection.Finder)` finds a UI element that is in the left of 277 another UI element. 278 * `RightOf(*uidetection.Finder)` finds a UI element that is in the right 279 of another UI element. 280 281 **Example**: find the word `next` that is above the textblock `some 282 textblock`: 283 284 ```go 285 word := uidetection.Word("next").Above(uidetection.Textblock([]string{"some", "textblock"}))) 286 ``` 287 288 2. Assessbility-tree-based UI elements (reprensented by [nodewith.Finder]). 289 290 * `WithinA11yNode(*nodewith.Finder)` finds a UI element that is within a 291 UI node in the accessibility tree. 292 * `AboveA11yNode(*nodewith.Finder)` finds a UI element that is above a UI 293 node in the accessibility tree. 294 * `BelowA11yNode(*nodewith.Finder)` finds a UI element that is below a UI 295 node in the accessibility tree. 296 * `LeftOfA11yNode(*nodewith.Finder)` finds a UI element that is in the 297 left of a UI node in the accessibility tree. 298 * `RightOfA11yNode(*nodewith.Finder)` finds a UI element that is in the 299 right of a UI node in the accessibility tree. 300 301 **Example**: find an icon `icon.png` in the VS Code app: 302 303 ```go 304 vs_app_windown := nodewith.NameStartingWith("Get Started - Visual Studio Code").Role(role.Window).First() // Finder of the icon in the VS Code app. 305 icon := uidetection.CustomIcon(s.DataPath("icon.png")).WithinA11yNode(vs_app_windown) 306 ``` 307 308 3. ROIs in pixel (px), **USE WITH CAUTION**. 309 310 * `WithinPx(coords.Rect)` finds a UI element in the bounding box specified 311 in pixels. 312 * `AbovePx(int)` finds a UI element above a pixel. 313 * `BelowPx(int)` finds a UI element below a pixel. 314 * `LeftOfPx(int)` finds a UI element in the left of a pixel. 315 * `RightOfPx(int)` finds a UI element in the right of a pixel. 316 317 4. ROIs in density-independent pixels (dp) **USE WITH CAUTION**. 318 `WithinDp(coords.Rect)`, `AboveDp(int)`, `BelowDp(int)`, `LeftOfDp(int)`, 319 `RighttOfDp(int)` are defined analogously as the ROIs in pixels, except that 320 they are in [density-independent pixels]. 321 322 Note: usage of ROIs in pixels or in density-independent pixels is generally 323 discouraged, as they vary from devices to devices. Please first consider using 324 other two types of ROIs. 325 326 [nodewith.Finder]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/go.chromium.org/tast-tests/cros/local/chrome/uiauto/nodewith/nodewith.go?q=nodewith.go#:~:text=type%20Finder%20struct%20%7B 327 [density-independent pixels]: https://en.wikipedia.org/wiki/Device-independent_pixel 328 329 ## FAQs 330 331 ### How can I test if the library can find a UI element in a screenshot before coding? (Googlers only) 332 333 Try the playground at [go/acuiti-playground]. Upload a screenshot you want to 334 test and change the detection type to 'text' or 'custom icon'. If you can 335 find a UI element with playground, the [uidetection] library can find it too. 336 337 The playground also generates a Tast code snippet matching the query. 338 339 ### I am getting errors in taking stable screenshot. 340 341 If you encounter error saying `screen has not stopped changing after XXXs, 342 perhaps increase timeout or use immediate-screenshot strategy`, this happens 343 because the screen is not static. You can check the two consecutive screenshots 344 `uidetection_screenshot.png` and `old_uidetection_screenshot.png` to see in 345 which location the screen keeps changing. If this is expected, try using the 346 immediate screenshot strategy 347 `WithScreenshotStrategy(uidetection.ImmediateScreenshot)` 348 349 **Example**: Left-click the "Customize Chrome" textblock using the immediate 350 screenshot strategy 351 352 ```go 353 ud := uidetection.NewDefault(tconn) 354 ud.WithScreenshotStrategy(uidetection.ImmediateScreenshot).LeftClick(uidetection.TextBlock([]string{"Customize", "Chrome"})), 355 ``` 356 357 ## Report Bugs 358 359 If you have any issues in using this [uidetection] library, please file a bug in 360 Buganizer to the component 1034649 and assign it to the hotlist 3788221. 361 362 [go/acuiti-playground]: http://go/acuiti-playground