Automated website QA falls apart when your pull requests are huge. A 2,000-line PR that touches the nav, the checkout, and a font swap is impossible to review and terrifying to revert: when a regression ships, you cannot tell which of the ten changes caused it, and rolling back drags nine innocent changes with it. The fix is two habits that reinforce each other. First, one concern per PR with a small diff. Second, run lint, the test suite, and a Lighthouse/accessibility check on every PR and block merge on failure. Small diffs make the automated checks fast and meaningful, and a failing check on a small diff points at exactly one suspect. I have shipped both ways on client sites, and the small-batch flow is the only one where main stays deployable all week.
Why do large PRs hide regressions?
A large PR is a black box. The reviewer skims it because reading 40 changed files properly takes an afternoon nobody has, so real review degrades into a rubber stamp. Worse is what happens after merge. Say the Largest Contentful Paint on your landing page regresses from 1.9s to 4.2s the week after a big merge. Which change did it? The lazy-loading tweak, the hero image swap, the third-party script someone added, or the CSS refactor? They all rode in on the same commit. Now bisecting means picking apart a merge that touched everything.
Reverting is the other trap. git revert on a small, single-concern PR is a clean, safe operation: one button in the GitHub UI, one new commit that undoes exactly that change, main is healthy again in under a minute. Revert a giant squashed merge and you are throwing away a week of unrelated work along with the bug, or you are hand-editing the revert, which is how you introduce a second bug while fixing the first. Small batches are not a style preference. They are what keeps your rollback path to one click.
If reverting a change means untangling it from nine others first, you do not have a revert button. You have a salvage operation.
What should run on every pull request?
Three gates, in rough order of speed, so the cheap checks fail fast before you spend CI minutes on the slow ones. Each one runs automatically when a PR opens or updates, reports back as a status check, and is wired into branch protection so a red check blocks the merge button.
- Lint and format: ESLint plus Prettier (or Stylelint for CSS). Catches the trivial stuff in seconds so humans never review whitespace.
- The test suite: unit and integration tests. On a focused PR these run fast because the change surface is small.
- Lighthouse and accessibility: a Lighthouse CI run against a built preview, with budgets for performance and an axe-core pass for accessibility. This is the check that catches the LCP regression the moment it lands, on the one PR that caused it.
- Build: the production build must succeed. A PR that does not build never reaches main.
The accessibility and performance gate is the one people skip, and it is the one that pays off most for a public website. Tie it to budgets so it fails on regression, not on an arbitrary absolute score. Core Web Vitals are the metrics worth gating on; I cover how to read and improve them in my Core Web Vitals optimization guide, and the same thresholds become your CI assertions here.
How do I wire QA into GitHub Actions?
One workflow file, triggered on pull_request, with the fast checks in one job and Lighthouse in a job that depends on a successful build. The key detail people get wrong: the workflow only enforces anything once you add the jobs as required checks in branch protection. A green pipeline that does not block merge is decoration.
name: PR QA
on:
pull_request:
branches: [main]
jobs:
checks:
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
- run: npm ci
- run: npm run lint
- run: npm test -- --run
- run: npm run build
lighthouse:
needs: checks
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
- run: npm ci
- run: npm run build
- name: Run Lighthouse CI
run: |
npm install -g @lhci/cli@0.14.x
lhci autorunLighthouse CI reads its own config so the budgets live in version control next to the code they police. The assertions fail the job when a metric crosses the line, which is what turns the check red and blocks the merge.
{
"ci": {
"collect": {
"staticDistDir": "./dist",
"numberOfRuns": 3
},
"assert": {
"assertions": {
"categories:performance": ["error", { "minScore": 0.9 }],
"categories:accessibility": ["error", { "minScore": 1 }],
"largest-contentful-paint": ["error", { "maxNumericValue": 2500 }],
"cumulative-layout-shift": ["error", { "maxNumericValue": 0.1 }]
}
},
"upload": { "target": "temporary-public-storage" }
}
}The numberOfRuns of 3 matters: Lighthouse results jitter, so a single run produces flaky failures that erode trust in the gate. With three runs Lighthouse CI asserts against the median by default, which smooths out the outliers. Once a check flakes, people start clicking merge anyway, and your gate is dead. The general pipeline shape, secrets handling, and deploy step are covered in my CI/CD with GitHub Actions write-up if you want the production-deploy half.
Why add a preview deploy to every PR?
Checks tell you the code is green. A preview deploy lets you actually click through the change before it touches production. Every PR gets its own URL, built from that branch, so you (or a client signing off on a design tweak) open the real page and look at it. Static hosts like Netlify, Vercel, and Cloudflare Pages do this out of the box; for a self-hosted setup you can spin up a per-PR container or an S3 prefix keyed to the PR number. The preview link posts back as a comment or a deployment status on the PR.
This is where small batches pay off again. A preview of a single-concern PR shows one thing changed, so the reviewer knows exactly what to look at. A preview of a giant PR is just the whole site again, and nobody clicks through all of it. The preview is also where Lighthouse should run, against the deployed artifact rather than a local build, so your numbers reflect the real served bytes.
What does batching discipline look like in practice?
The rules are boring on purpose, because boring is what keeps main always-deployable:
- One concern per PR. "Fix mobile nav overflow" is a PR. "Fix mobile nav overflow and update the footer and bump dependencies" is three PRs.
- Small diff. If a PR is creeping past a few hundred changed lines, ask whether it is really one concern. Usually it is two or three concerns bundled together.
- Clear, specific titles. The title is what shows up in your git history and in the revert dialog. "misc fixes" tells future-you nothing.
- Merge only on green. Required checks plus at least one review. No override-and-merge for "just this once", because just-this-once is how main breaks on a Friday.
- Keep main deployable at all times. Every merge has already passed lint, tests, build, and Lighthouse on a small diff, so main is never in a state you would be afraid to ship.
A branching model that keeps PRs short-lived makes all of this easier; I lay out the one I use for solo and small-team work in my git branching strategy post. Short-lived branches and small PRs are the same discipline viewed from two angles.
None of this is exotic tooling. It is GitHub Actions you already have, a Lighthouse config in version control, branch protection that actually blocks, and the discipline to keep each PR about one thing. The payoff is concrete: regressions get caught on the PR that caused them, sign-off happens against a real preview URL, and when something does slip through, the revert is one commit that undoes exactly one change. Start with the smallest version, lint and tests as required checks on every PR, get the team used to a red check meaning stop, then layer in Lighthouse budgets and preview deploys. The first time a bad merge is a one-click revert instead of a war room, the small-batch habit pays for itself.

