Page Discovery
The Pages step is where you discover and manage all pages on your website.
Overview
Page Discovery crawls your website to find all accessible URLs. This is the first step in the QAID workflow and must be completed before extracting elements.
Starting a Discovery
Basic Discovery
- Click "Discover Pages" in the Pages step
- The crawler will:
- Start from your base URL
- Follow all internal links
- Detect navigation menus
- Handle JavaScript-rendered content
Configuring Discovery Parameters
| Parameter | Default | Range | Description |
|---|---|---|---|
| Max Pages | 50 | 1-200 | Maximum number of pages to discover |
| Max Depth | 3 | 1-10 | How many links deep to follow |
Choosing Max Pages:
- Start with 20-50 for initial exploration
- Increase for comprehensive coverage
- Lower limits = faster discovery
Choosing Max Depth:
- Depth 1: Only pages linked from homepage
- Depth 2-3: Standard websites
- Depth 4+: Deep site structures
Understanding Page Statuses
Pages have one of four statuses:
| Status | Meaning | Color |
|---|---|---|
discovered | Found but not yet analyzed | Blue |
crawled | Successfully extracted elements | Green |
removed | Not found in latest crawl | Gray |
blacklisted | Manually excluded by user | Red |
Status Lifecycle
New URL → discovered → crawled
│
├─► removed (if not in next crawl)
│ └─► auto-restores if found again
│
└─► blacklisted (user action)Managing Pages
Viewing Discovered Pages
The main page list shows:
- Page URL (relative path)
- Status badge
- Element count (after extraction)
- Actions menu
Page Sections
Newly Discovered Pages found in the latest crawl that weren't in previous runs.
All Pages Complete list of active pages.
Lost Pages
Pages that were previously discovered but not found in the latest crawl. These have removed status.
Excluded Pages Pages you've blacklisted. They won't be processed in any workflow step.
Adding Pages Manually
- Click "Add Page"
- Enter the page path (e.g.,
/about-us) - Click "Add"
Use this for pages the crawler might miss (JavaScript-only routes, etc.).
Blacklisting Pages
To exclude a page from all processing:
- Find the page in the list
- Click the "..." menu
- Select "Blacklist"
- Confirm in the dialog
Common pages to blacklist:
- Admin pages (
/admin,/dashboard) - Logout URLs (
/logout,/signout) - Utility pages (
/print,/export) - User-specific pages (
/profile,/settings)
Restoring Blacklisted Pages
- Go to the Excluded Pages section
- Find the page
- Click "Restore"
The page returns to active status and will be processed again.
Deleting Pages
To permanently remove a page:
- Click the "..." menu
- Select "Delete"
- Confirm deletion
Warning: Deleting removes all associated elements, tests, and scenarios.
Page Criticality
Each page has a criticality score (1-5):
| Score | Level | Meaning |
|---|---|---|
| 5 | Critical | Core functionality, must always work |
| 4 | High | Important features, high priority |
| 3 | Medium | Standard functionality |
| 2 | Low | Secondary features |
| 1 | Minimal | Nice-to-have, lowest priority |
Setting Criticality
- Go to the Overview step (Dashboard)
- Find the Page Criticality section
- Drag pages between criticality levels
- Click "Save Rankings"
Criticality affects:
- Test prioritization
- Coverage reporting
- Scenario importance
Crawl History
View previous discovery runs:
- Click "History" in the Pages step
- See each run with:
- Timestamp
- Pages discovered count
- Duration
- Status
Re-running Discovery
You can run discovery again anytime:
- Click "Discover Pages"
- The crawler compares against existing pages:
- New pages are marked
discovered - Missing pages are marked
removed - Existing pages retain their status
- New pages are marked
Best Practices
Initial Discovery
- Start with lower limits (Max Pages: 30, Depth: 2)
- Review results and adjust parameters
- Blacklist irrelevant pages early
Authenticated Sites
- Configure credentials before discovery
- Verify crawler can access protected areas
- Check that login pages are handled correctly
Large Sites
- Use higher Max Pages limits
- Consider running discovery in batches
- Focus on critical sections first
Maintaining Page List
- Re-run discovery periodically to catch new pages
- Review "Lost Pages" to understand what changed
- Keep blacklist updated as site evolves
Troubleshooting
Pages Not Being Discovered
Possible causes:
- JavaScript-only navigation (increase JS wait time)
- Authentication required (configure credentials)
- Robots.txt blocking (check site permissions)
- Max Pages limit reached (increase limit)
Discovering Too Many Pages
Solutions:
- Lower Max Depth
- Blacklist utility/admin areas
- Reduce Max Pages limit
Pages Showing as "Removed"
This means:
- The page URL is no longer accessible
- The page moved to a different URL
- Authentication state changed
Action: Check if the page still exists manually. If yes, re-run discovery.
Related Topics
- Element Extraction - Next step after discovery
- Settings - Configure authentication
- Dashboard - View page metrics