Automation and Accessibility Integration (AAI) (Total Control 8+)
AAI identifies UI elements on screen as objects, traditional way of (x, y) coordinates take screen as one giant object, so seek image/color and OCR are required to identify an object on screen.
Accessibility is a feature that represent UI elements on screen as underlying nodes, a node includes many properties such as text/description, dimension, boolean properties such as clickable, editable or scrollable, underlying class name, etc. The text/description can be accessed easily (OCR is not needed), the dimension (and clickable) ensures the button can be clicked on certain location even the node is moved to another location.
A node can represent an UI element (e.g. button) or a group of UI elements or layout of certain elements. A node (element, group or layout) can be identified by a node ID (represent in string of hex). We integrate Accessibility, TC scripting framework and UI Automator library to achieve the following goals:
- Coordinate independent makes the script more portable with different resolutions, multiple sizes and brands.
- Synchronous API will wait until the screen is repaint, make the script simpler, do not need to guess the time to sleep.
- Can retrieve the string from the app with ease, instead of using error-prone OCR.
The simplest case of AAI:
- Click "OK" on the screen if found, far better than click(100, 100) for specific resolution: devices.clickSync("OK")
- Enter text into text entry, AAI can find all text entry lines in current screen.
devices.inputTextSync([position], "text") // Enter text, the position is used for multiple inputs
- Run or restart application, without query, it will return on screen refresh; with query, it will match the query after screen is refreshed.
devices.runAppSync(<package name>, [query])
devices.restartAppSync(<package name>, [query])
The entire screen is composed by many nodes, a node can be a smallest UI element or a container of many nodes, some nodes are invisible. The entire screen is a tree structure starting from the single root node. Depending on the complexity of App, a screen can contain 50-300 nodes.
Since the users are only interested in small subset of nodes, the challenge is finding the correct nodes users want and extract information or perform actions on them.
The challenge is how to find nodes? We invented a query language to find the nodes, the FindNode program is installed in every device, the query language will be performed to obtain the nodes that meet the criteria, the intent is to reduce the large number of nodes to one or few intended nodes, users can obtain information or apply actions to the nodes.
For example: UI Automator in Java provides "UiSelector" and "BySelector" in UiDevice.findObject() or findObjects() to locate nodes, it can be complex for multiple conditions:
new UiSelector().className("android.widget.TextView").text("OK")
We created a simple query language, that is shorter and portable since the query will be send to many devices, the above code can be rewritten in our query language as:
"C: android.widget.TextView&&T:OK"
AAI project includes the following:
- Query language, simple one line syntax language to search for intended nodes. Core of the AAI.
- FindNode carry out the query or actions on each device. All the query and certain actions are done in FindNode, it contains few dozen commands. See FindNode documentation for more information.
- Object mode in one-to-many synchronization, send the node (or UI object) to all devices instead of coordination, click "OK" can run on all devices with different resolutions than click(100,100).
- UI Explorer to obtain node information, can visually test the query language, learning and exploration tool.
- AAIS, a simple language to perform automation on multiple devices. Capture and replay generate this language, see AAIS documentation for more information.
- REST and JS API includes accessibility to FindNode.
- UiElement class on top of FindNode to access node with ease.
Query
Each query has one or multiple "<key>:<value>" pair, multiple keys can be added with "&&" as separator.
Each node is identified by a node ID. A query can divided into 3 phases:
- Template ("TP"). This class "generates" the initiate nodes. For instance "TP:textInput" will return a list of editable text fields. This class is mandatory, if not specified, default template will be used.
- Basic query (BQ). Each node contains information about itself, the classes, the text/description, the properties, etc. BQ will match node one at a time, will reject the nodes that do not meet the criterion, will not be passed into next phase. If BQ is not specified, the nodes generate from TP will be passed into EQ.
- Expanded query (EQ). A set of keys that usually work with multiple nodes. Multiple EQ execute from left to right, same key can be specified multiple times. An example of EQ: "OX:1" find the element/node on the right of the current node.
After the query is performed, one/more nodes found are listed in "ML" (matching list), list of actions can be applied on the ML, the actions can be retrieving information or perform action on ML.
Template:
Generate the initial nodes for BQ or EQ.
TP:all All nodes
TP:more All nodes except nodes end with "Layout"
TP:basic All leaf node (child count is zero)
TP:reduced Optimize "TP:more" to return nodes that is important for screen
TP:anyText[,<min>[,<max>]] Nodes with content in "text" in certain length.
TP:anyDescription[,<min>[,<max>]] Nodes with content in "description" in certain length.
TP:textInput All "editable" fields that sorted from top-left to bottom-right.
TP:findText,<text> Nodes with the text in the arguments, can contain "*" and "/…/".
TP:line,top|bottom,<number> Return top/bottom nodes that is outside of scrollable nodes.
TP:scrollable,<position> Nodes inside scrollable container, position for multiple scrollable nodes.
Basic Query (BQ):
Query for node-level information obtained, each nodes from TP will match (if provided) in BQ to proceed.
P:<package name> -Should not be used, default to running app
C:<class name> Class name (S)
R:<resource ID> Resource ID (S)
D:<text> Description (S)
T:<text> Text (S)
IT:<number> Text input type (I)
CC:<number> Child count (I)
ID:<ID> Node ID in hex (S)
BI:[x, y] The nodes contain (x,y)
BI:[x1, y1, x2, y2] The nodes enclosed by the rectangle, if x or y is -1, it will be ignored
BP:<prop name> Boolean properties (S).
- checkable, checked, clickable, editable, enabled, focused, longClickable
- scrollable, visibleToUser.
TD:<text> Match text or description (S).
Expanded Query (EQ):
Queries here usually across multiple nodes.
The order of Expanded Query is important, all expanded queries execute from left to right. Commands with the same keys are allowed.
IX:<number> Obtain the one node from list of matching nodes based on position
OX:<number> Offset to neighbor nodes horizontally (positive – right, negative – left)
OY:<number> Offset to neighbor nodes vertically (positive – down, negative – up)
ON:<type> Different ways to pick one node out of list of matching nodes
ST:<sort type> Return sorted nodes based on the position of the nodes on screen
TX Return nodes that intersect with reference node horizontally.
TY Return nodes that Intersect with reference node vertically.
VG:[level number] Return the group of nodes in a view group from first node in ML.
RN Return the optimized nodes from a list of matching nodes.
BQ:<query> Perform basic query.
X:<key in BQ> Basic query key prefixed with "X"
For BQ, the query syntax can contain "!" for not, ">", "<" for greater than or less than, "*" for wild card match and "/<regexp>/" for regular expression. It can match package name, class name, resource ID, text, description, child count and input type.
FindNode is installed into every device (part of Total Control App), it is the only program that recognize the query syntax, it parses query, locate nodes and perform actions to the nodes found. FindNode offload the complexity of JavaScript and CPU utilization of Total Control, all the search is conducted in the devices.
device.sendAAi() and devices.sendAai() are direct way to communicate with FindNode with one or list of devices. The JS object will be translated to JSON before sending to device, the returned value is in JS object format. If error is encountered, the return is null, the lastError() contains error message.
A simple query example, to obtain the text of Model name, use X offset of 1 (next to the right):
>> device.sendAai({query:"T:Model name&&OX:1", action:"getText"})
{retval: 'Galaxy S10+'}
FindNode can even detects the fixed icons on the top/bottom of the screen:
>> device.sendAai({query:"TP:line,bottom,-1", action:"getText"})
{retval: ['Chats','Calls','Contacts','Notifications']}
The following 3 commands, doing the same thing, click on the "Calls" text:
>> device.sendAai({query:"TP:line,bottom,-1&&T:Calls", action:"click"})
{retval: true}
>> device.sendAai({query:"TP:line,bottom,-1&&IX:1", action:"click"})
{retval: true}
>> device.sendAai({query:"TP:line,bottom,-1&&T:Chats&&OX:1", action:"click"})
{retval: true}
Click "Contacts" icon:
>> device.sendAai({query:"TP:line,bottom,-1&&T:Contacts&&OY:-1", action:"click"})
{retval: true}
>> device.sendAai({query:"TP:line,bottom,-1&&IX:2", action:"click"})
{retval: true}
// Find multiple "Contacts" are found on the screen, IX:-1 is to select the
// last node found
>> device.sendAai({query:"T:Contacts&&IX:-1&&OY:-1", action:"click"})
{retval: true}
Please read FindNode User Guide for complete information.