Your prompts suck

I wrote 12,000 crappy prompts so you can write one good one.

Tim Hanson
November 13, 2025

_{Day 268/100}

Hey—It's Tim. (Heads up, this one is long, like 2400 words long…)

Most people treating AI prompts like they're writing instructions for a human.

They're not.

After writing over 12,000 prompts for production software, I've learned that the biggest problem isn't that your prompts are too simple—it's that they're full of invisible contradictions that make your AI inconsistent, unpredictable, and ultimately unusable.

Today I'm going to show you the 13-section framework we use at Penfriend to build prompts that ship in production, and more importantly, which sections actually matter for the type of prompting you're doing.

I tweeted about this yesterday. Come follow me on X.

Today we do the deep dive into each section. If you have any questions, reply to that thread and I’ll answer them in a newsletter next week.

— (@)

The Real Problem With Your Prompts

Let me show you a "normal" prompt that most people would write:

You are a customer service agent. Be thorough and helpful.

Rules:
- Always gather complete context before responding
- Never make assumptions
- Respond immediately to urgent requests
- Research extensively before acting
- Keep responses under 100 words

Spot the contradictions?

"Gather complete context" vs "Respond immediately"
"Research extensively" vs "Keep under 100 words"
"Never make assumptions" vs "Respond immediately"

The AI has to choose which instruction to follow. And it will choose differently every time, giving you inconsistent results.

OpenAI's GPT-5 prompting guide says it explicitly: "Contradictory instructions severely degrade performance. The model wastes reasoning trying to reconcile conflicts rather than solving the problem."

This is why your AI feels random. It's not the model. It's your prompt.

The 13-Section Framework (And What Actually Matters)

Here's the full structure:

<SYSTEM_ROLE>
<OBJECTIVE>
<CONTEXT>
<TOOLS>
<AGENTIC_PREAMBLE>
<AGENTIC_EAGERNESS>
<REASONING_DIALS>
<PLANNING>
<CONTEXT_UNDERSTANDING>
<OUTPUT_FORMAT>
<VALIDATION>
<ASSUMPTIONS_LOG>

But here's the thing: You don't need all 13 sections for every prompt.

The sections you use depend on what you're building. Let me break down what matters for different use cases.

For Casual Prompters: The Core 4

If you're just writing prompts for daily use in ChatGPT or Claude, you only need 4 sections:

1. SYSTEM_ROLE (Required)

What it does: Establishes priority hierarchy when conflicts arise.

Why it matters: Without explicit priorities, the AI makes random choices between competing values.

How to write it well:

Use the ">" operator to show clear priority:

<SYSTEM_ROLE>
You are a content strategist for B2B SaaS companies.
Priorities: Accuracy > Clarity > Speed > Brevity
</SYSTEM_ROLE>

Now when the AI has to choose between being accurate and being brief, it knows accuracy wins.

The 201 insight: Most people write role descriptions like job postings. Don't. The AI doesn't need motivation—it needs decision-making criteria. Focus on the priority stack, not personality traits.

2. OBJECTIVE (Required)

What it does: Defines what "done" looks like.

Why it matters: Without clear deliverables, AI either over-delivers (wastes time/money) or under-delivers (incomplete work).

How to write it well:

<OBJECTIVE>
Create a 500-word blog outline on [topic].

Deliverables:
- 5-7 H2 sections with descriptive titles
- 2-3 bullet points under each H2
- Target keyword integrated naturally
- Estimated reading time

Success criteria:
- Outline covers the complete topic scope
- Each section is distinct (no overlap)
- Flow is logical for readers unfamiliar with the topic
</OBJECTIVE>

The 201 insight: Make success criteria measurable. "High quality" is useless. "No overlapping sections" is actionable. The AI can check that.

3. CONTEXT (Conditional - use when needed)

What it does: Provides the facts, constraints, and edge cases the AI needs.

Why it matters: This is where you prevent hallucination and reduce unnecessary tool calls.

When to use it:

You have specific constraints (brand guidelines, word limits, formats)
You're referencing specific files or prior work
There are known edge cases or exceptions
You have user preferences or requirements

How to write it well:

<CONTEXT>
Brand voice: Direct, conversational, avoid corporate jargon
Target audience: Marketing managers at 50-200 person SaaS companies
Avoid: Buzzwords like "synergy," "leverage," "game-changing"
Previous approach that didn't work: Generic listicles without specific examples
</CONTEXT>

The 201 insight: Only include relevant information. Don't info-dump. Every piece of context should directly impact the output. If you're not sure it matters, leave it out.

4. OUTPUT_FORMAT (Required)

What it does: Specifies exactly how you want the response formatted.

Why it matters: Inconsistent formatting is the #1 complaint about AI outputs. This section fixes it.

How to write it well:

<OUTPUT_FORMAT>
Structure:
- Use H2 headers for main sections
- Use bullet points for lists of 3+ items
- Use numbered lists only for sequential steps
- Keep paragraphs to 2-3 sentences max

Tone:
- Conversational, like explaining to a colleague
- No corporate jargon or buzzwords
- Use "you" not "one" or "the reader"

Length:
- Aim for 500-700 words
- Each section should be roughly equal length
</OUTPUT_FORMAT>

The 201 insight: Be specific about what you DON'T want. "No bullet points in the introduction" is often more valuable than "use bullet points for lists."

For Production Prompts: Add These 5

If you're building AI features that ship to users, you need to add these sections:

5. TOOLS (Critical for production)

What it does: Defines when to use tools, when not to, and any budgets.

Why it matters: Without clear guidance, AI either over-uses tools (expensive, slow) or under-uses them (incomplete information).

How to write it well:

<TOOLS>
Available tools:

check_subscription_status:
- Use for: Current plan, billing date, payment method
- Don't use for: Historical data, usage stats
- Max calls per response: 1

search_help_docs:
- Use when: Question involves product features or setup
- Don't use when: Question is about billing or account issues
- Budget: Up to 3 searches per response

refund_processor:
- Requires explicit user confirmation before calling
- Never call preemptively
</TOOLS>

The 201 insight: Set tool budgets. Without them, AI can make 10+ tool calls for a simple question. With them, it's forced to be strategic.

6. AGENTIC_PREAMBLE (Important for UX)

What it does: Controls how AI communicates its plan and progress.

Why it matters: Users need to follow along. Without preambles, AI executes silently and users don't know what's happening.

How to write it well:

<AGENTIC_PREAMBLE>
Before using any tools:
- Restate the user's question in clear terms
- Briefly state what you're about to check (one sentence)

After each tool call:
- Don't narrate ("I'm now checking...") 
- Just proceed to the next step

At the end:
- Summarize findings in 2-3 sentences
- Provide clear next steps
</AGENTIC_PREAMBLE>

The 201 insight: Match verbosity to task length. Quick tasks need minimal narration. Complex 5+ step workflows need more.

7. AGENTIC_EAGERNESS (Critical for safety)

What it does: Controls how autonomous the AI is—when to keep going vs ask permission.

Why it matters: This is your safety valve. Too eager = AI makes changes you didn't want. Not eager enough = AI constantly asks for permission.

How to write it well:

<AGENTIC_EAGERNESS>
Persistence policy: 
- Continue working until user's question is fully answered
- Don't stop after partial answers

Safe actions (proceed without asking):
- Reading data
- Checking status
- Searching documentation

Unsafe actions (always confirm first):
- Processing refunds
- Changing subscription plans
- Deleting data

Stop conditions:
- You've made 5+ tool calls without resolving the issue
- User asks you to stop or wait
- You need information only the user can provide
</AGENTIC_EAGERNESS>

The 201 insight: Categorizing actions as safe/unsafe is the key. Most production prompts fail because they don't define this clearly.

8. PLANNING (Important for complex tasks)

What it does: Forces AI to decompose tasks before executing.

Why it matters: Complex tasks fail when AI jumps straight to execution without planning.

How to write it well:

<PLANNING>
Before acting:
1. Identify what information you need
2. Determine which tools will get that information
3. Plan the order (parallel vs sequential)
4. State your plan in one sentence

After each step:
- Verify the result before moving to next step
- If something unexpected happens, revise your plan
</PLANNING>

The 201 insight: Only use this for multi-step workflows. Single-action prompts don't need planning overhead.

9. VALIDATION (Required for production)

What it does: Defines tests AI must pass before finishing.

Why it matters: This is your quality gate. Without validation, you're shipping untested outputs.

How to write it well:

<VALIDATION>
Before responding, verify:

Check 1: Have you answered the user's actual question?
- If no: Revise response to directly address their question

Check 2: Are all dollar amounts and dates accurate?
- If uncertain: State what you're uncertain about

Check 3: Does the response match our brand voice?
- Conversational? ✓
- No jargon? ✓
- Helpful tone? ✓

If any check fails: Revise before responding.
Don't validate more than once (avoid loops).
</VALIDATION>

The 201 insight: Make checks actionable and specific. "Is this high quality?" is useless. "Does this answer the user's question?" is checkable.

For Advanced Users: The Final 4

These sections are for specific use cases:

10. REASONING_DIALS (For cost optimization)

When to use: You're paying per token and want to optimize costs.

<REASONING_DIALS>
reasoning_effort = low
(This is a simple lookup task that doesn't require deep reasoning)

verbosity = low
(User wants quick answers, not explanations)
</REASONING_DIALS>

The 201 insight: High reasoning on simple tasks wastes money. Low reasoning on complex tasks gives bad results. Match reasoning to complexity.

11. CONTEXT_UNDERSTANDING (For search-heavy prompts)

When to use: Your AI has access to search tools or knowledge bases.

<CONTEXT_UNDERSTANDING>
Search when:
- Question involves recent changes to our product
- You're uncertain about a policy
- User asks for specific documentation

Don't search when:
- You're confident in the answer
- Question is about general product knowledge
- You've already searched twice without finding relevant info

Stop searching after: 3 searches OR you've found the answer
</CONTEXT_UNDERSTANDING>

The 201 insight: Set search limits. Without them, AI can search endlessly. With them, it's forced to work with what it finds.

12. ASSUMPTIONS_LOG (For transparency)

When to use: High-stakes decisions where users need to know what the AI assumed.

<ASSUMPTIONS_LOG>
If you had to make assumptions to complete this task, list them at the end:

Assumptions made:
- [List each assumption clearly]
- [Why you made this assumption]
</ASSUMPTIONS_LOG>

The 201 insight: This is your escape hatch for uncertainty. It lets AI proceed without perfect information while maintaining transparency.

The Secret to Great Prompts

Here's what most people get wrong:

You can't write a great prompt if you don't understand the task.

I see people trying to get AI to do things they couldn't do themselves. That never works.

Before you write a single line of your prompt:

Do the task manually at least once
Document every decision you made
Note where you had to make judgment calls
Identify which parts require creativity vs which parts are mechanical

Those decision points? Those become your prompt sections.

The judgment calls? Those become your validation checks.

The mechanical parts? Those become your planning steps.

How to Test Your Prompts

Never test in ChatGPT or Claude's chat interface.

Why? Because the chat interface has hidden system prompts and safety rails that won't be there when you use the API.

Test in the console:

For OpenAI: platform.openai.com/playground For Anthropic: console.anthropic.com/workbench

Put your full prompt in the "System" message field (OpenAI) or "System Prompt" (Anthropic).

Put your test inputs in the "User" message.

Run it 5-10 times with different inputs.

If you get inconsistent results, you have contradictions in your prompt.

The Meta-Prompt That Fixes Everything

When your prompt isn't working, use this:

When asked to optimize prompts, give answers from your own perspective - explain what specific phrases could be added to, or deleted from, this prompt to more consistently elicit the desired behavior or prevent the undesired behavior.

Here's a prompt: [YOUR PROMPT]

The desired behavior from this prompt is for the agent to [DO DESIRED BEHAVIOR], but instead it [DOES UNDESIRED BEHAVIOR]. 

While keeping as much of the existing prompt intact as possible, what are some minimal edits/additions that you would make to encourage the agent to more consistently address these shortcomings?

Guidelines:

Run it in the chat version of the model you're using
Run it in incognito mode (to avoid context contamination)
Run it on the exact model you'll use in production

Advanced version:

Given your understanding of the problem we're trying to solve, and if you had the ability to completely rewrite the prompt to solve it—optimizing for speed of answer, accuracy, and repeatability—what would you do? Why? Where does the current prompt trip up in trying to achieve these goals?

This second version is more aggressive. Use it when the first one doesn't give you enough improvements.

Which Sections Do You Actually Need?

Here's a quick reference:

For daily prompting (ChatGPT/Claude chat):

SYSTEM_ROLE ✓
OBJECTIVE ✓
CONTEXT (if you have specific constraints) ✓
OUTPUT_FORMAT ✓

For production features (API/applications):

Everything above, plus:
TOOLS ✓
AGENTIC_PREAMBLE ✓
AGENTIC_EAGERNESS ✓
PLANNING (for multi-step tasks) ✓
VALIDATION ✓

For optimization (when you're paying per token):

Everything above, plus:
REASONING_DIALS ✓
CONTEXT_UNDERSTANDING (if you have search) ✓
ASSUMPTIONS_LOG (for high-stakes decisions) ✓

The Bottom Line

Most people write prompts like they're writing instructions for a human. But AI doesn't work like humans.

AI needs:

Clear priority hierarchies (what wins when values conflict)
Explicit success criteria (what does "done" look like)
Specific constraints (not vague guidance)
Validation checks (how to know if it's right)

The 13-section framework eliminates contradictions and makes AI behavior predictable.

Start with the Core 4. Add sections as you need them. Test in the console, not in chat.

And remember: If you don't understand the task well enough to do it yourself, you can't write a prompt for it.

Master the task first. Write the prompt second.

I have a part two of this.
A full on “advanced class” to writing prompts.
You want it?

✌️ Tim "prompt daddy" Hanson
CMO @Penfriend.ai

Same brain, different platforms: X, Threads, LinkedIn.

P.S. This is the exact framework we use at Penfriend to build production prompts that analyze content, generate outlines, and provide editorial feedback. It's the difference between a demo that impresses and a product that ships.

If you want to see how we apply this to content creation specifically, reply to this email and I'll send you one of our actual production prompts with annotations.

Penfriend.ai
Made by content marketers. Used by better ones.

What did you think of today's newsletter?

Spicy | Supreme | S'alright | Subpar

What to do next

Share This Update: Know someone who’d benefit? Forward this newsletter to your content team.
Get your First 3 Articles FREE We just dropped the biggest update we’ve ever done to Penfriend a few weeks ago.
Let Us Do It For You: We have a DFY service where we build out your next 150 articles. Let us handle your 2025 content strategy for you.