QAID Docs

Page Discovery

The Pages step is where you discover and manage all pages on your website.

Overview

Page Discovery crawls your website to find all accessible URLs. This is the first step in the QAID workflow and must be completed before extracting elements.

Starting a Discovery

Basic Discovery

  1. Click "Discover Pages" in the Pages step
  2. The crawler will:
    • Start from your base URL
    • Follow all internal links
    • Detect navigation menus
    • Handle JavaScript-rendered content

Configuring Discovery Parameters

ParameterDefaultRangeDescription
Max Pages501-200Maximum number of pages to discover
Max Depth31-10How many links deep to follow

Choosing Max Pages:

  • Start with 20-50 for initial exploration
  • Increase for comprehensive coverage
  • Lower limits = faster discovery

Choosing Max Depth:

  • Depth 1: Only pages linked from homepage
  • Depth 2-3: Standard websites
  • Depth 4+: Deep site structures

Understanding Page Statuses

Pages have one of four statuses:

StatusMeaningColor
discoveredFound but not yet analyzedBlue
crawledSuccessfully extracted elementsGreen
removedNot found in latest crawlGray
blacklistedManually excluded by userRed

Status Lifecycle

New URL → discovered → crawled

                          ├─► removed (if not in next crawl)
                          │      └─► auto-restores if found again

                          └─► blacklisted (user action)

Managing Pages

Viewing Discovered Pages

The main page list shows:

  • Page URL (relative path)
  • Status badge
  • Element count (after extraction)
  • Actions menu

Page Sections

Newly Discovered Pages found in the latest crawl that weren't in previous runs.

All Pages Complete list of active pages.

Lost Pages Pages that were previously discovered but not found in the latest crawl. These have removed status.

Excluded Pages Pages you've blacklisted. They won't be processed in any workflow step.

Adding Pages Manually

  1. Click "Add Page"
  2. Enter the page path (e.g., /about-us)
  3. Click "Add"

Use this for pages the crawler might miss (JavaScript-only routes, etc.).

Blacklisting Pages

To exclude a page from all processing:

  1. Find the page in the list
  2. Click the "..." menu
  3. Select "Blacklist"
  4. Confirm in the dialog

Common pages to blacklist:

  • Admin pages (/admin, /dashboard)
  • Logout URLs (/logout, /signout)
  • Utility pages (/print, /export)
  • User-specific pages (/profile, /settings)

Restoring Blacklisted Pages

  1. Go to the Excluded Pages section
  2. Find the page
  3. Click "Restore"

The page returns to active status and will be processed again.

Deleting Pages

To permanently remove a page:

  1. Click the "..." menu
  2. Select "Delete"
  3. Confirm deletion

Warning: Deleting removes all associated elements, tests, and scenarios.

Page Criticality

Each page has a criticality score (1-5):

ScoreLevelMeaning
5CriticalCore functionality, must always work
4HighImportant features, high priority
3MediumStandard functionality
2LowSecondary features
1MinimalNice-to-have, lowest priority

Setting Criticality

  1. Go to the Overview step (Dashboard)
  2. Find the Page Criticality section
  3. Drag pages between criticality levels
  4. Click "Save Rankings"

Criticality affects:

  • Test prioritization
  • Coverage reporting
  • Scenario importance

Crawl History

View previous discovery runs:

  1. Click "History" in the Pages step
  2. See each run with:
    • Timestamp
    • Pages discovered count
    • Duration
    • Status

Re-running Discovery

You can run discovery again anytime:

  1. Click "Discover Pages"
  2. The crawler compares against existing pages:
    • New pages are marked discovered
    • Missing pages are marked removed
    • Existing pages retain their status

Best Practices

Initial Discovery

  • Start with lower limits (Max Pages: 30, Depth: 2)
  • Review results and adjust parameters
  • Blacklist irrelevant pages early

Authenticated Sites

  • Configure credentials before discovery
  • Verify crawler can access protected areas
  • Check that login pages are handled correctly

Large Sites

  • Use higher Max Pages limits
  • Consider running discovery in batches
  • Focus on critical sections first

Maintaining Page List

  • Re-run discovery periodically to catch new pages
  • Review "Lost Pages" to understand what changed
  • Keep blacklist updated as site evolves

Troubleshooting

Pages Not Being Discovered

Possible causes:

  • JavaScript-only navigation (increase JS wait time)
  • Authentication required (configure credentials)
  • Robots.txt blocking (check site permissions)
  • Max Pages limit reached (increase limit)

Discovering Too Many Pages

Solutions:

  • Lower Max Depth
  • Blacklist utility/admin areas
  • Reduce Max Pages limit

Pages Showing as "Removed"

This means:

  • The page URL is no longer accessible
  • The page moved to a different URL
  • Authentication state changed

Action: Check if the page still exists manually. If yes, re-run discovery.

On this page