Feb 1, 2026

Intermediate 25 min

The Generation Process

Once relevant documents are retrieved, they’re used to augment the LLM’s generation process. This is where RAG’s power becomes evident - the model can now generate responses grounded in actual, retrieved information.

Context Augmentation

The retrieved documents are formatted into a prompt template that the LLM can understand and use effectively.

Prompt Template Structure

Context:
[Document 1 content]
[Document 2 content]
[Document 3 content]

Question: [User's original query]

Instructions: Answer the question based on the context provided above.
If the context doesn't contain enough information, say so.
Cite the specific documents you used in your answer.

Answer:

Example Augmented Prompt

Query: “What are the new features in Python 3.12?”

Retrieved Documents:

Doc 1: Python 3.12 Release Notes (excerpt)
Doc 2: Performance improvements in Python 3.12
Doc 3: Breaking changes in Python 3.12

Augmented Prompt:

Context:
Document 1: Python 3.12 was released in October 2023. Key features include:
- Improved error messages with more precise locations
- Per-interpreter GIL for better subinterpreter isolation
- New f-string syntax allowing inline expressions

Document 2: Python 3.12 shows significant performance improvements:
- Up to 5% faster execution compared to Python 3.11
- Optimized frame stack implementation
- Better memory management

Document 3: Breaking changes in Python 3.12:
- Removed deprecated modules: asynchat, asyncore, smtpd
- Changed behavior of some built-in functions
- Updated type hints syntax

Question: What are the new features in Python 3.12?

Answer based on the context provided above:

Grounded Generation

The LLM generates a response while being “grounded” in the retrieved context. This fundamentally changes how the model behaves.

How Grounding Works

Grounded vs Ungrounded Generation

This animated concept requires JavaScript to be enabled.

Frames:

Ungrounded: LLM relies only on training data. May hallucinate or provide outdated information. No source attribution.
Grounded: LLM uses retrieved context. Facts are based on actual documents. Can cite specific sources. More accurate and trustworthy.

Benefits of Grounding

1. Factual Accuracy ✓

Responses are based on actual retrieved content
Facts come from verified sources
Less likely to make things up

2. Reduced Hallucination ✓

Model is constrained by provided context
Can’t invent information not in documents
Admits when context doesn’t have the answer

3. Source Attribution ✓

Responses can reference specific documents
Users can verify information
Builds trust through transparency

4. Confidence Calibration ✓

Model can indicate uncertainty
Can say “based on the provided context…”
Distinguishes between what it knows and what’s in the context

RAG vs. Standard LLM: Side-by-Side Comparison

Let’s see the dramatic difference RAG makes with real examples:

Example 1: Recent Information

Query: “What are the new features in Python 3.12?”

Standard LLM vs RAG Response

This animated concept requires JavaScript to be enabled.

Frames:

Standard LLM: 'I don't have information about Python 3.12 as my training data only goes up to early 2023. Python 3.11 introduced features like...' ❌ Outdated, can't help with current version
RAG System: 'Python 3.12, released in October 2023, introduces: improved error messages, per-interpreter GIL, new f-string syntax, and 5% faster execution. [Sources: Python 3.12 Release Notes]' ✅ Current, accurate, cited

Example 2: Domain-Specific Knowledge

Query: “What’s our company’s remote work policy?”

Standard LLM Response:

I don't have access to your company's specific policies. 
Generally, remote work policies might include...
[Generic information that doesn't help]

❌ Can’t access company-specific information

RAG-Enhanced Response:

According to your company's HR Policy Document (updated March 2024):
- Employees can work remotely up to 3 days per week
- Must be in office on Tuesdays for team meetings
- Requires manager approval for full-time remote work
- Equipment provided: laptop, monitor, ergonomic chair

[Source: HR Policy Document, Section 4.2]

✅ Specific, accurate, company policy with citation

Example 3: Technical Documentation

Query: “How do I configure SSL for our application?”

Standard LLM Response:

To configure SSL, you typically need to:
1. Obtain an SSL certificate
2. Configure your web server
3. Update application settings
[Generic steps that may not match your setup]

❌ Generic advice, not specific to your application

RAG-Enhanced Response:

Based on your application's deployment guide:

1. Place SSL certificates in /etc/ssl/certs/
2. Update config.yaml:
   ssl:
     enabled: true
     cert_path: /etc/ssl/certs/app.crt
     key_path: /etc/ssl/private/app.key
3. Restart the application: sudo systemctl restart app

[Source: Deployment Guide v2.3, Section 7]

✅ Specific to your application, step-by-step, cited

Generation Best Practices

1. Prompt Engineering

Good Prompt Template:

Context: {retrieved_docs}

Question: {user_query}

Instructions:
- Answer based only on the provided context
- If unsure, say "I don't have enough information"
- Cite specific documents used
- Be concise and accurate

Answer:

Why it works:

Clear instructions for the LLM
Encourages source citation
Handles uncertainty gracefully
Focuses on accuracy

2. Context Window Management

Challenge: LLMs have limited context windows (4k-32k tokens)

Solutions:

Retrieve fewer, more relevant documents
Summarize long documents before including
Prioritize most relevant sections
Use sliding window for long documents

3. Citation Formatting

Inline Citations:

Python 3.12 introduces improved error messages [1] and 
performance improvements of up to 5% [2].

Sources:
[1] Python 3.12 Release Notes
[2] Python 3.12 Performance Benchmarks

Document References:

According to the Python 3.12 Release Notes, key features include...

[Source: Python 3.12 Release Notes, October 2023]

4. Handling Insufficient Context

When context doesn’t have the answer:

Bad Response:

[Makes up an answer anyway]

Good Response:

Based on the provided documents, I don't have enough 
information to answer this question. The available 
context covers X and Y, but doesn't address Z.

Would you like me to search for additional information?

Response Quality Metrics

How do we measure if generation is working well?

Key Metrics

1. Faithfulness

Is the response grounded in the retrieved context?
Does it cite actual information from documents?
Measured by comparing response to source documents

2. Answer Relevance

Does the response actually answer the question?
Is it on-topic and helpful?
Measured by semantic similarity to query

3. Context Relevance

Were the retrieved documents actually relevant?
Did they contain information needed to answer?
Measured by human evaluation or LLM-as-judge

4. Completeness

Does the response cover all aspects of the question?
Is any important information missing?
Measured against ground truth answers

Key Takeaways

Before moving to the final page, remember:

Context augmentation combines retrieved docs with the query
Grounded generation constrains the LLM to use provided context
RAG dramatically outperforms standard LLMs for current/specific information
Source citation builds trust and enables verification
Prompt engineering is crucial for quality responses

What’s Next?

In the final page, you’ll test your knowledge with an interactive quiz and learn about next steps for implementing RAG in your own projects!

Progress 80%

Page 4 of 5

← Previous → Next

Sign In

Grounded vs Ungrounded Generation

Frames:

Standard LLM vs RAG Response

Frames: