Skip to content

Flakiness

A flaky test is one that passes and fails without the code changing — the classic source of “just re-run it” CI noise. Nijam’s whole reason for collecting history is to surface these automatically.

Because every run is stored with its commit, Nijam can compare outcomes for the same test on the same code. Two signals feed the score:

  • Within a run — a test that fails and then passes on a Playwright retry is flaky by definition.
  • Across runs — a test whose result flips between pass and fail for the same (or unchanged) commit, over its recent history.

Tests are then ranked so the noisiest ones rise to the top — you fix what’s costing you the most re-runs first, instead of guessing.

Nijam distinguishes two failure shapes, because they need different responses:

  • Flaky — inconsistent on unchanged code. Usually a timing/race/environment issue in the test.
  • Regressed — consistently started failing at a specific commit. Usually a real bug.

The test detail page shows the history timeline so you can tell which one you’re looking at.

Flakiness detection improves with history, so:

  • Run on CI, not just locally — consistent environment and metadata make the comparison meaningful.
  • Keep retries on — Playwright’s retries give Nijam the within-run flip signal.
  • Tag environments — an environment tag lets you tell apart “flaky everywhere” from “flaky only on webkit/staging”.