# OCR-Based Element Identification

OCR-based element identification enables test automation on screens where elements are not accessible through standard widget or XPath-based strategies.

In some mobile applications, UI components are rendered in ways that make them invisible to conventional automation techniques. For example, screens built with custom rendering, image-based components, or applications where development gaps result in elements that cannot be reliably targeted.

With this capability, QApilot uses OCR during the recording phase to detect and display visible text elements on screen. When a user selects an element, QApilot stores its text content and bounding box coordinates. During execution, OCR runs again on the live device screen to locate the same text, recalculates the current bounding coordinates, and performs the interaction.

It's now possible to control OCR-based element capture at the recording level.&#x20;

1. In Auto Step Mode, OCR is automatically disabled.&#x20;
2. When Auto Step Mode is off, a manual toggle is available in the recording UI, giving teams precise control over when OCR is used.<br>

<figure><img src="/files/znva03JuDmuRKKeKgDFv" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.qapilot.io/detailed-documentation/test-plan-executions/ocr-based-element-identification.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.