The Test Launched
Before It Was Ready.
Here's What Happened Next.

On September 24th, a team member launched the campaign before the A/B testing protocol was in place. Three procedural errors invalidated the test before a single result could be measured. With 29 days of matching grant opportunity remaining, the question was how quickly a clean methodology could be rebuilt — and what the data would say when it was.

Table of Contents

Crisis

What Went Wrong
on September 24th

An SBN team member who was collaborating with me on the campaign, launched the initial email on September 24th while I was focused on SBN’s website SEO and program enrollment work. The launch produced two errors that compounded each other.

Error 1: Unequal manual list split

The testing protocol specified a random 50/50 split of the full email list. Instead, the the team member who was collaborating with me on the project divided the list by hand, producing two unequal groups: Group A uploaded with 1,431 contacts and Group B with 1,570 contacts. After Network for Good excluded unsubscribes at the point of send — a platform behavior not yet understood at the time — the actual delivered figures were Group A: 1,314 recipients and Group B: 1,424 recipients. A gap of 110 recipients between groups means any performance difference cannot be attributed to the email version alone.

Error 2: Wrong deadline in Version (A) body copy

Version A stated that the campaign goal of $75,000 needed to be reached by October 31st — the matching grant deadline — rather than the actual campaign deadline of December 31st. This confused two separate dates: the match window (October 31st) and the overall campaign goal (December 31st). When the error was spotted, a correction email was sent to Group (A) 34 minutes after the original.

The Correction Email:
A Compounding Mistake

The correction email was sent with good intentions. It made things measurably worse.

2:07 PM

Version (A) (Together for Nature) — sent to Group (A)

Subject: “Your Donation DOUBLES: Help 500 More People Experience Nature”

Recipients: 1,314 delivered (1,431 contacts uploaded)

2:34 PM

Version (B) — sent to Group (B)

Subject: “🔥 MATCH ALERT: Your Donation Doubled — Up to $21,000 Until October 31st”

Recipients: 1,424 delivered (1,570 contacts uploaded)

2:41 PM

Correction email — sent to Group (A) only

Subject: “Correction: Match Ends October 31 — Your Gift Still Doubled”

— 34 minutes after the original send

Every donor in Group A who opened both emails within that 34-minute window saw an organization send a correction to its own fundraising appeal before the day was out. The signal that sent — organizational confusion at the exact moment we were asking for trust and money — was more damaging than the original deadline error.

THE CORE LESSON

Correction emails fix the factual error and create a trust problem. In most cases, a silent fix — updating the landing page, correcting future emails — causes less damage than publicly acknowledging the mistake to the same audience within the same hour.

Impact

The Damage:
By the Numbers

The performance data from the initial send made the scale of the problem clear immediately.

Metric	Group A — Version A + correction	Group B — Version B, no correction
Recipients	1,314	1,424
Open Rate	21.9%	35.0%
Click Rate	0.8%	0.9%
Raised	$0	$255
Conversion Rate	0%	0.12%

Group (A)’s open rate of 21.9% wasn’t catastrophic on its own — it’s within a normal range for nonprofit fundraising. But the 0% conversion rate from 1,314 recipients tells a different story: the correction email arrived before most people had finished reading the first one, and the combination destroyed donor confidence entirely.

Group (B)’s 35.0% open rate and $255 raised from a group that received the email cleanly provided two directional signals — even with a compromised test. First, urgency-led subject lines with specific dollar amounts outperformed the alternative on open rate. Second, Version B (Goal-First) drove 14 clicks at 0.9% compared to Version A’s 11 clicks at 0.8% — a consistent advantage across click rate that held even accounting for the unequal groups and the correction email’s impact on Group A. Neither signal was conclusive on its own. But together they were enough to make an informed decision: Email #2 would use the Goal-First structure and urgency-led subject line formula as the baseline for both groups going forward.

WHAT THE DATA TOLD ME

The test was invalid — but not entirely useless. It produced two directional signals worth acting on. Group B’s 35.0% open rate confirmed that urgency-led subject lines with specific dollar amounts and a hard deadline were working. And Version B’s 14 clicks at 0.9% consistently outperformed Version A’s across all three September 24th sends — a signal that the Goal-First structure was driving more engagement, even from a compromised test with unequal groups. Neither finding was conclusive. Both informed what came next.

Recovery

Rebuilding the Test From Scratch

With the initial test invalidated and 29 days of matching grant opportunity remaining, the priority was to restore clean testing conditions as quickly as possible — without sending another email to a list that had just received two in the same afternoon.

The recovery had three components:

Re-segment the entire list using random sorting

Every contact was exported from Network for Good to Excel. Excel’s =RAND() function was applied to assign each contact a random value. The full list was sorted by that value and split down the middle — producing two groups of equal size with no selection bias. Both groups were re-uploaded to Network for Good as clean, separate segments.

When the A/B test emails went out, the groups were still slightly unequal in recipient count. At the time, I didn’t fully understand why — I had uploaded equally split groups and expected equal sends. I proceeded with the tests anyway, treating the results as directionally valid given the groups were close in size.

Only later did I discover the cause: Network for Good automatically excludes unsubscribed contacts. Because unsubscribes are distributed unevenly across the list, two equally sized uploaded groups will almost always produce unequal recipient counts. In future campaigns, I refined the process — rebalancing the groups after re-uploading to account for NFG’s unsubscribe removal before sending, producing genuinely equal recipient counts at the moment of delivery.

Applying the directional signal from email #1 to email #2

The initial test was invalid — but not entirely useless. Even with compromised groups, the click rate data gave a directional signal worth acting on. Version B (Goal-First) drove 14 clicks at 0.9%, while Version A’s original send drove 11 clicks at 0.8% and the correction drove 10 clicks at 0.7%. The Goal-First approach consistently drove more clicks than Story-First across all three sends — even accounting for the fact that the correction email had damaged Group A’s engagement. That signal was enough to choose the Goal-First structure as the baseline for Email #2, applied to both groups as a fresh start.

Design three clean A/B tests for the remaining October sends

Rather than trying to recover the original structure test, three new tests were designed — each testing a single variable with all other elements held constant. Subject line messaging (Email #4), email length (Email #5), and send time (Email #6). Each test would produce actionable data regardless of what the original test had failed to measure.

What This Campaign Taught Me About A/B Testing on NFG

This was my first time running A/B tests. Additionally, I was still learning the Network for Good platform. The platform has no native A/B testing tool — every step of the process had to be built manually in Excel and managed through segmented contact lists. The unequal recipient counts I encountered after re-uploading equal groups exposed a platform behaviour I hadn’t anticipated. Discovering why it happened — and building a rebalancing step into the process for future campaigns — is the kind of learning that only comes from running a test in the real world and paying close attention to what the data is actually telling you.

execution

The October Email Sequence

Six emails were sent between October 3rd and October 29th. The sequence was designed around a single strategic principle: as the October 31st matching grant deadline approached, urgency should increase while email length decreased. A donor who needs five emails to give is not going to be won over by a longer sixth.

Email #2

Oct 3

Both Groups

Fresh start after September 24th. Goal-First structure and urgency subject line formula applied to both groups simultaneously — informed by Version B’s stronger click rate and open rate signals from the compromised initial test.

Email #3

Oct 12

Both Groups

Halfway point momentum update. Updated donor count and campaign progress sent to both groups to maintain engagement ahead of the A/B tests.

Email #3

Resend

Oct 16

Non-openers

Same email manually resent to non-openers from Email #3 to maximise reach before the A/B tests began. At the time, I wasn’t aware that Network for Good has a built-in button to resend to non-openers automatically 24 hours after the original send — a feature I discovered and used for future emails.

Email #4

A/B Test #1

Oct 21

A/B Test 1 — Subject line messaging. Urgency/scarcity (Group A) vs. social proof (Group B). Non-donors only.

Email #5

A/B Test #2

Oct 27

Email #5 results (A/B Test 2 — Email Length) — $358 raised from 3 donors on October 27th–28th. The short format saw a higher open rate than the long format (25.9% vs. 23.8%), but click rates were identical at 0.7% — meaning the open-rate gap is more likely attributable to a minor subject line wording difference between versions than to length itself. The length question stayed open going into December.

Email #6

A/B Test #3

Oct 29

A/B Test 3 — Send time. Morning 9:30am (Group A) vs. evening 6:00pm (Group B). Winning variables from previous tests applied. Non-donors only.

SEGMENTATION LOGIC

Donors were excluded from all subsequent fundraising emails immediately after giving, regardless of gift size. Emails #3 through #6 excluded all previous donors and progressively narrowed the audience to non-donors only — protecting relationships with people who had already committed while maximising pressure on those who hadn’t.

A/B test #1

A/B Test 1:
Subject Line Messaging

Email #4, October 21. With 10 days remaining on the matching grant, this test asked a straightforward question: when a deadline is imminent, do donors respond more to urgency and scarcity, or to social proof and community?

Variable tested: Subject line only. Body copy, CTA, send time, and email length were identical between both groups.

✓ Winner — Group A

“10 Days Left: Don’t Let the $21,000 Match Expire Unused”

Recipients: 1,410
Open Rate: 25.8%
Click Rate: 0.8%
Approach: Urgency / scarcity

Group B

“Join 43 Donors Who’ve Already Doubled Their Impact”

Recipients: 1,406
Open Rate: 23.9%
Click Rate: 0.7%
Approach: Social proof

Finding — Applied to All Future Sends

Urgency-driven subject lines outperformed social proof by 7.9% on open rate during a deadline-pressure campaign. The “43 Donors” subject line isn’t weak — social proof is a powerful motivator — but when a hard deadline is imminent, scarcity wins. Applied to Email #5 and #6 subject line strategy.

A/B Test #2

A/B Test 2: Email Length

Email #5, October 27. Five days from the matching grant deadline. The primary variable tested was email length — short format vs. long format. Both versions led with “5 DAYS LEFT” urgency messaging, with a minor wording variation between subject lines rather than a different strategic approach.

Primary variable tested: Email length. Both subject lines used identical urgency framing. Open rates reflect the minor subject line wording difference; click rates — both 0.7% — reflect body copy and length performance.

✓ Winner — Group A

“5 DAYS LEFT: $21,000 Match Expires October 31st” — Short format

Recipients: 1,407
Open Rate: 25.9%
Click Rate: 0.7%
Format: Short – (150 words)

Group B

“5 DAYS LEFT: Don’t Let $21,000 in Matching Funds Expire” — Long format

Recipients: 1,403
Open Rate: 23.8%
Click Rate: 0.7%
Format: Long – (300 words)

Finding — Inconclusive

Both versions produced identical click rates (0.7%), meaning this test found no measurable difference in conversion intent between formats. The open rate difference (25.9% vs. 23.8%) is more likely attributable to the minor subject line wording variation between versions than to length itself — which means this single test, with that confound present, can’t confirm or rule out a length effect either way. Email length remained genuinely untested heading into December; it’s a question that needs a cleaner, single-variable test in a future campaign rather than a rule applied on the strength of this result.

A/B Test 3

A/B Test 3: Send Time

Email #6, October 29. 48 hours before the matching grant deadline. The winning short format from Test 2 was applied to both groups. Both used the same subject line — “48 HOURS LEFT: $21,000 Match Expires October 31st” — making this the cleanest single-variable test of the three. The only difference was delivery time.

Variable tested: Send time only. Subject line, content, and format were identical between both groups.

Group A — 9:30 AM PST

“48 HOURS LEFT: $21,000 Match Expires October 31st”

Recipients: 1,403
Open Rate: 25.9%
Click Rate: 0.5%
Strength: Awareness/Reach

Group B — 6:00 PM PST

“48 HOURS LEFT: $21,000 Match Expires October 31st”

Recipients: 1,402
Open Rate: 23.4%
Click Rate: 0.7%
Strength: Action/Conversion

Finding: A Strategic Trade-Off

This test didn’t produce a clean winner. Morning sends drove more opens — more people saw the email. Evening sends drove more clicks — more people who opened it acted on it. The right answer depends on campaign objective: if the priority is reach, send in the morning; if the priority is conversion from an already-aware audience, send in the evening. Applied to December sends based on each email’s specific objective.

Results

October Campaign Results

Six emails. Twenty-six days. Three clean A/B tests. Here’s what the October campaign delivered.

$ 0

Raised from email campaign

Transactions — $182 avg. gift | 65 donors

0 %

Conversion rate from 2,738 delivered recipients

Revenue by Email

Email	Date	Raised	Donors	Avg. Gift
Email #2 — Fresh start, urgency formula	Oct 3	$1,445	16	$85
Email #3 — Initial send + resend to non-openers	Oct 12	$7,888	21	$376
Email #4 — A/B Test 1 (Subject line)	Oct 20	$785	10	$79
Email #5 — A/B Test 2 (Email length)	Oct 27	$358	3	$119
Email #6 — A/B Test 3 (Send time)	Oct 29	$1,295	10	$130

Email #3’s combined total includes both the initial send (Oct 12–15: $925 from 10 donors) and the resend to non-openers (Oct 16–20: $6,963 from 11 donors). The resend figure includes an individual gift of $6,300 from a single donor, received October 16th and attributed separately as a major donor conversion. Organic resend performance excluding that gift: approximately $663 from 10 donors, comparable to the initial send.

Note on Totals

Per-email figures do not sum precisely to the $12,001 campaign total. The difference is attributable to mail-in donations — an option offered throughout the campaign — which were entered into Network for Good manually by the Executive Director and are not tied to a specific email send date.

October Campaign Summary

Metric	Result
Emails sent	6 (September 24 – October 29)
Total recipients	2,738 delivered (Group A: 1,314 + Group B: 1,424)
Amount raised — email campaign	$12,001
Total transactions	66
Average gift	$182
Conversion rate	2.2%
Total donors	65
Total organizational impact	$33,001 ($21,000 SVCF gift + $12,001 new donations)

What the 2.2% Conversion Rate Actually Meant

October’s 2.2% conversion rate was achieved using the most powerful psychological lever available — a matching grant with a hard deadline. 98% of the list had now seen that lever and chosen not to give. With the matching narrative spent and most of the list already solicited six times, the question for November wasn’t how to run another email campaign. It was whether email could reach $75,000 at all — and if not, what would.

The Test LaunchedBefore It Was Ready.Here's What Happened Next.

What Went Wrongon September 24th

The Correction Email:A Compounding Mistake

The Damage: By the Numbers