Introduction

Generative AI represents a paradigm shift in artificial intelligence—systems that can create new content including text, images, code, and more. Foundation models like GPT-4, Claude, and Gemini are transforming how we build AI applications.

What is Generative AI?

Traditional AI (Discriminative):

Input → Model → Classification/Prediction
"Is this spam?" → [0.95 spam, 0.05 not spam]

Generative AI:

Input (Prompt) → Model → New Content
"Write a poem about AI" → [Generated poem]

Types of Generative AI:

| Type | Examples | What it Creates | |------|----------|-----------------| | Text | GPT-4, Claude, Gemini | Articles, code, conversations | | Image | DALL-E, Midjourney, Stable Diffusion | Art, photos, designs | | Audio | AudioLM, MusicLM | Music, speech, sound effects | | Video | Sora, Runway | Video clips, animations | | Code | GitHub Copilot, CodeWhisperer | Programs, functions | | Multimodal | GPT-4V, Gemini | Combining multiple types |

Foundation Models

Large pre-trained models that serve as a foundation for many downstream tasks.

Characteristics:

┌────────────────────────────────────────────────────────┐
│              FOUNDATION MODEL                          │
├────────────────────────────────────────────────────────┤
│  • Trained on massive, diverse datasets               │
│  • Billions of parameters                             │
│  • General-purpose capabilities                        │
│  • Adaptable to many tasks                            │
│  • Few-shot and zero-shot learning                    │
└────────────────────────────────────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        ▼                 ▼                 ▼
   Text Tasks        Code Tasks        Vision Tasks
   • Chat            • Generation      • Analysis
   • Summary         • Completion      • Generation
   • Translation     • Debugging       • Editing

Major Foundation Models:

| Model | Provider | Parameters | Strengths | |-------|----------|------------|-----------| | GPT-4 | OpenAI | ~1.7T | Reasoning, coding, multimodal | | Claude 3 | Anthropic | - | Safety, long context, analysis | | Gemini | Google | - | Multimodal, integration | | Llama 3 | Meta | 8B-405B | Open source, customizable | | Mistral | Mistral AI | 7B-8x22B | Efficient, open source |

How LLMs Work

Transformer Architecture Recap:

Input tokens → Embedding → [Transformer Blocks × N] → Output tokens
                               │
                               ▼
                    ┌─────────────────────┐
                    │  Self-Attention     │
                    │  (see all tokens)   │
                    ├─────────────────────┤
                    │  Feed-Forward       │
                    │  (process each)     │
                    └─────────────────────┘

Training Process:

1. Pre-training:

Predict next token on massive text corpus
Learn language patterns, facts, reasoning
Requires enormous compute (millions of GPU hours)

2. Fine-tuning:

Train on specific task data
Instruction tuning (follow directions)
RLHF (align with human preferences)

Generation Process:

Prompt: "The capital of France is"
           │
           ▼
    [Model processes]
           │
           ▼
    Probability distribution:
    "Paris": 0.85
    "Lyon": 0.05
    "London": 0.01
    ...
           │
           ▼
    Sample (with temperature)
           │
           ▼
    Output: "Paris"
    
    Then: "The capital of France is Paris"
    Continue generating...

Key Concepts

Context Window

The amount of text the model can "see" at once.

| Model | Context Window | |-------|---------------| | GPT-4 | 8K - 128K tokens | | Claude 3 | 200K tokens | | Gemini 1.5 | 1M+ tokens |

Why it matters:

Longer = can process more documents
Longer = can maintain longer conversations
Trade-off with cost and latency

Tokens

Text is split into tokens (roughly words or word pieces).

"Tokenization is interesting" → ["Token", "ization", " is", " interesting"]

Roughly: 1 token ≈ 4 characters ≈ 0.75 words

Temperature

Controls randomness in generation.

Temperature = 0: Deterministic, always picks most likely
Temperature = 1: Standard sampling
Temperature = 2: More random, creative

Low temp → Factual, consistent
High temp → Creative, varied

Prompt Engineering

Getting the best results from foundation models.

Basic Techniques:

1. Zero-Shot:

Classify this review as positive or negative:
"The product arrived broken and customer service was unhelpful."

2. Few-Shot:

Classify reviews:
"Great product!" → Positive
"Terrible experience" → Negative
"Worked perfectly" → Positive

Now classify:
"The product arrived broken" →

3. Chain-of-Thought:

Solve this step by step:
Q: If I have 3 apples and buy 2 more, then eat 1, how many do I have?

Let's think step by step:
1. Start with 3 apples
2. Buy 2 more: 3 + 2 = 5 apples
3. Eat 1: 5 - 1 = 4 apples
Answer: 4 apples

Advanced Patterns:

| Pattern | Description | Use Case | |---------|-------------|----------| | Role prompting | "You are an expert..." | Specialized responses | | System prompts | Set behavior rules | Chatbots, apps | | Structured output | "Return JSON..." | API integration | | RAG | Retrieve then generate | Knowledge-grounded responses |

Retrieval-Augmented Generation (RAG)

Combine foundation models with external knowledge.

User Query
    │
    ▼
┌───────────────┐
│   Retriever   │ ← Vector Database
│  (find docs)  │   (your documents)
└───────┬───────┘
        │ relevant docs
        ▼
┌───────────────┐
│   Generator   │ ← Foundation Model
│ (LLM answer)  │
└───────┬───────┘
        │
        ▼
   Response with citations

Benefits:

Ground responses in your data
Reduce hallucinations
Keep information current
No model retraining needed

Fine-Tuning vs RAG vs Prompting

| Approach | When to Use | Cost | Flexibility | |----------|-------------|------|-------------| | Prompting | Quick experiments, simple tasks | $ | High | | RAG | Need current/private knowledge | $$ | High | | Fine-tuning | Specific style/domain, high volume | $$$ | Lower |

Building GenAI Applications

Application Patterns:

1. Conversational AI:

User → Chat Interface → LLM → Response
              ↑
        Conversation history

2. Content Generation:

Template/Prompt → LLM → Generated Content → Review → Publish

3. Code Assistant:

Code Context + Request → LLM → Generated Code → Tests → Integration

4. Knowledge Assistant (RAG):

Query → Retrieve → Augment Prompt → Generate → Cite Sources

Cloud GenAI Services

Azure:

Azure OpenAI Service: GPT-4, DALL-E, embeddings
Azure AI Studio: Build and deploy GenAI apps
Semantic Kernel: Orchestration SDK
Prompt Flow: Design and test prompts

AWS:

Amazon Bedrock: Claude, Titan, Llama, Stable Diffusion
Amazon Q: Enterprise AI assistant
PartyRock: No-code GenAI apps

Google Cloud:

Vertex AI: Gemini, PaLM, Imagen
Generative AI Studio: Experiment with models
Duet AI: Coding assistant

Challenges and Limitations

Common Issues:

| Challenge | Description | Mitigation | |-----------|-------------|------------| | Hallucinations | Making up facts | RAG, verification, grounding | | Bias | Reflecting training data biases | Testing, guardrails | | Inconsistency | Different answers each time | Lower temperature, caching | | Context limits | Can't process everything | Chunking, summarization | | Cost | Can be expensive at scale | Caching, smaller models | | Latency | Slow for real-time | Streaming, edge models |

Safety Considerations:

Content filtering
Prompt injection attacks
Data privacy
Misuse prevention

Exam Tips

Common exam questions test:

Choosing between prompting, RAG, and fine-tuning
Understanding tokens and context windows
When to use different foundation models
RAG architecture and benefits
Responsible use of generative AI

Watch for keywords:

"Need current information" → RAG
"Specific writing style" → Fine-tuning
"Quick prototype" → Prompting
"Reduce hallucinations" → RAG, grounding
"Creative content" → Higher temperature

Key Takeaway

Generative AI and foundation models are transforming software development. Understanding how to effectively use these models—through prompting, RAG, or fine-tuning—is essential. Focus on the right approach for your use case, be aware of limitations, and always consider responsible AI practices. The field is evolving rapidly, so staying current with new capabilities and best practices is crucial.

Introduction

What is Generative AI?

Traditional AI (Discriminative):

Input → Model → Classification/Prediction
"Is this spam?" → [0.95 spam, 0.05 not spam]

Generative AI:

Input (Prompt) → Model → New Content
"Write a poem about AI" → [Generated poem]

Types of Generative AI:

Foundation Models

Large pre-trained models that serve as a foundation for many downstream tasks.

Characteristics:

┌────────────────────────────────────────────────────────┐
│              FOUNDATION MODEL                          │
├────────────────────────────────────────────────────────┤
│  • Trained on massive, diverse datasets               │
│  • Billions of parameters                             │
│  • General-purpose capabilities                        │
│  • Adaptable to many tasks                            │
│  • Few-shot and zero-shot learning                    │
└────────────────────────────────────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        ▼                 ▼                 ▼
   Text Tasks        Code Tasks        Vision Tasks
   • Chat            • Generation      • Analysis
   • Summary         • Completion      • Generation
   • Translation     • Debugging       • Editing

Major Foundation Models:

How LLMs Work

Transformer Architecture Recap:

Input tokens → Embedding → [Transformer Blocks × N] → Output tokens
                               │
                               ▼
                    ┌─────────────────────┐
                    │  Self-Attention     │
                    │  (see all tokens)   │
                    ├─────────────────────┤
                    │  Feed-Forward       │
                    │  (process each)     │
                    └─────────────────────┘

Training Process:

1. Pre-training:

Predict next token on massive text corpus
Learn language patterns, facts, reasoning
Requires enormous compute (millions of GPU hours)

2. Fine-tuning:

Train on specific task data
Instruction tuning (follow directions)
RLHF (align with human preferences)

Generation Process:

Prompt: "The capital of France is"
           │
           ▼
    [Model processes]
           │
           ▼
    Probability distribution:
    "Paris": 0.85
    "Lyon": 0.05
    "London": 0.01
    ...
           │
           ▼
    Sample (with temperature)
           │
           ▼
    Output: "Paris"
    
    Then: "The capital of France is Paris"
    Continue generating...

Key Concepts

Context Window

The amount of text the model can "see" at once.

| Model | Context Window | |-------|---------------| | GPT-4 | 8K - 128K tokens | | Claude 3 | 200K tokens | | Gemini 1.5 | 1M+ tokens |

Why it matters:

Longer = can process more documents
Longer = can maintain longer conversations
Trade-off with cost and latency

Tokens

Text is split into tokens (roughly words or word pieces).

"Tokenization is interesting" → ["Token", "ization", " is", " interesting"]

Roughly: 1 token ≈ 4 characters ≈ 0.75 words

Temperature

Controls randomness in generation.

Temperature = 0: Deterministic, always picks most likely
Temperature = 1: Standard sampling
Temperature = 2: More random, creative

Low temp → Factual, consistent
High temp → Creative, varied

Prompt Engineering

Getting the best results from foundation models.

Basic Techniques:

1. Zero-Shot:

Classify this review as positive or negative:
"The product arrived broken and customer service was unhelpful."

2. Few-Shot:

Classify reviews:
"Great product!" → Positive
"Terrible experience" → Negative
"Worked perfectly" → Positive

Now classify:
"The product arrived broken" →

3. Chain-of-Thought:

Solve this step by step:
Q: If I have 3 apples and buy 2 more, then eat 1, how many do I have?

Let's think step by step:
1. Start with 3 apples
2. Buy 2 more: 3 + 2 = 5 apples
3. Eat 1: 5 - 1 = 4 apples
Answer: 4 apples

Advanced Patterns:

Retrieval-Augmented Generation (RAG)

Combine foundation models with external knowledge.

User Query
    │
    ▼
┌───────────────┐
│   Retriever   │ ← Vector Database
│  (find docs)  │   (your documents)
└───────┬───────┘
        │ relevant docs
        ▼
┌───────────────┐
│   Generator   │ ← Foundation Model
│ (LLM answer)  │
└───────┬───────┘
        │
        ▼
   Response with citations

Benefits:

Ground responses in your data
Reduce hallucinations
Keep information current
No model retraining needed

Fine-Tuning vs RAG vs Prompting

Building GenAI Applications

Application Patterns:

1. Conversational AI:

User → Chat Interface → LLM → Response
              ↑
        Conversation history

2. Content Generation:

Template/Prompt → LLM → Generated Content → Review → Publish

3. Code Assistant:

Code Context + Request → LLM → Generated Code → Tests → Integration

4. Knowledge Assistant (RAG):

Query → Retrieve → Augment Prompt → Generate → Cite Sources

Cloud GenAI Services

Azure:

Azure OpenAI Service: GPT-4, DALL-E, embeddings
Azure AI Studio: Build and deploy GenAI apps
Semantic Kernel: Orchestration SDK
Prompt Flow: Design and test prompts

AWS:

Amazon Bedrock: Claude, Titan, Llama, Stable Diffusion
Amazon Q: Enterprise AI assistant
PartyRock: No-code GenAI apps

Google Cloud:

Vertex AI: Gemini, PaLM, Imagen
Generative AI Studio: Experiment with models
Duet AI: Coding assistant

Challenges and Limitations

Common Issues:

Safety Considerations:

Content filtering
Prompt injection attacks
Data privacy
Misuse prevention

Exam Tips

Common exam questions test:

Choosing between prompting, RAG, and fine-tuning
Understanding tokens and context windows
When to use different foundation models
RAG architecture and benefits
Responsible use of generative AI

Watch for keywords:

"Need current information" → RAG
"Specific writing style" → Fine-tuning
"Quick prototype" → Prompting
"Reduce hallucinations" → RAG, grounding
"Creative content" → Higher temperature

Generative AI and Foundation Models

Recommended Prerequisites

Introduction

What is Generative AI?

Foundation Models

Characteristics:

Major Foundation Models:

How LLMs Work

Transformer Architecture Recap:

Training Process:

Generation Process:

Key Concepts

Context Window

Tokens

Temperature

Prompt Engineering

Basic Techniques:

Advanced Patterns:

Retrieval-Augmented Generation (RAG)

Fine-Tuning vs RAG vs Prompting

Building GenAI Applications

Application Patterns:

Cloud GenAI Services

Azure:

AWS:

Google Cloud:

Challenges and Limitations

Common Issues:

Safety Considerations:

Exam Tips

Key Takeaway

Tags

Quick Feedback

Generative AI and Foundation Models

Recommended Prerequisites

Introduction

What is Generative AI?

Foundation Models

Characteristics:

Major Foundation Models:

How LLMs Work

Transformer Architecture Recap:

Training Process:

Generation Process:

Key Concepts

Context Window

Tokens

Temperature

Prompt Engineering

Basic Techniques:

Advanced Patterns:

Retrieval-Augmented Generation (RAG)

Fine-Tuning vs RAG vs Prompting

Building GenAI Applications

Application Patterns:

Cloud GenAI Services

Azure:

AWS:

Google Cloud:

Challenges and Limitations

Common Issues:

Safety Considerations:

Exam Tips

Key Takeaway

Tags