Introduction
Generative AI represents a paradigm shift in artificial intelligence—systems that can create new content including text, images, code, and more. Foundation models like GPT-4, Claude, and Gemini are transforming how we build AI applications.
What is Generative AI?
Traditional AI (Discriminative):
Input → Model → Classification/Prediction
"Is this spam?" → [0.95 spam, 0.05 not spam]
Generative AI:
Input (Prompt) → Model → New Content
"Write a poem about AI" → [Generated poem]
Types of Generative AI:
| Type | Examples | What it Creates | |------|----------|-----------------| | Text | GPT-4, Claude, Gemini | Articles, code, conversations | | Image | DALL-E, Midjourney, Stable Diffusion | Art, photos, designs | | Audio | AudioLM, MusicLM | Music, speech, sound effects | | Video | Sora, Runway | Video clips, animations | | Code | GitHub Copilot, CodeWhisperer | Programs, functions | | Multimodal | GPT-4V, Gemini | Combining multiple types |
Foundation Models
Large pre-trained models that serve as a foundation for many downstream tasks.
Characteristics:
┌────────────────────────────────────────────────────────┐
│ FOUNDATION MODEL │
├────────────────────────────────────────────────────────┤
│ • Trained on massive, diverse datasets │
│ • Billions of parameters │
│ • General-purpose capabilities │
│ • Adaptable to many tasks │
│ • Few-shot and zero-shot learning │
└────────────────────────────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
Text Tasks Code Tasks Vision Tasks
• Chat • Generation • Analysis
• Summary • Completion • Generation
• Translation • Debugging • Editing
Major Foundation Models:
| Model | Provider | Parameters | Strengths | |-------|----------|------------|-----------| | GPT-4 | OpenAI | ~1.7T | Reasoning, coding, multimodal | | Claude 3 | Anthropic | - | Safety, long context, analysis | | Gemini | Google | - | Multimodal, integration | | Llama 3 | Meta | 8B-405B | Open source, customizable | | Mistral | Mistral AI | 7B-8x22B | Efficient, open source |
How LLMs Work
Transformer Architecture Recap:
Input tokens → Embedding → [Transformer Blocks × N] → Output tokens
│
▼
┌─────────────────────┐
│ Self-Attention │
│ (see all tokens) │
├─────────────────────┤
│ Feed-Forward │
│ (process each) │
└─────────────────────┘
Training Process:
1. Pre-training:
- Predict next token on massive text corpus
- Learn language patterns, facts, reasoning
- Requires enormous compute (millions of GPU hours)
2. Fine-tuning:
- Train on specific task data
- Instruction tuning (follow directions)
- RLHF (align with human preferences)
Generation Process:
Prompt: "The capital of France is"
│
▼
[Model processes]
│
▼
Probability distribution:
"Paris": 0.85
"Lyon": 0.05
"London": 0.01
...
│
▼
Sample (with temperature)
│
▼
Output: "Paris"
Then: "The capital of France is Paris"
Continue generating...
Key Concepts
Context Window
The amount of text the model can "see" at once.
| Model | Context Window | |-------|---------------| | GPT-4 | 8K - 128K tokens | | Claude 3 | 200K tokens | | Gemini 1.5 | 1M+ tokens |
Why it matters:
- Longer = can process more documents
- Longer = can maintain longer conversations
- Trade-off with cost and latency
Tokens
Text is split into tokens (roughly words or word pieces).
"Tokenization is interesting" → ["Token", "ization", " is", " interesting"]
Roughly: 1 token ≈ 4 characters ≈ 0.75 words
Temperature
Controls randomness in generation.
Temperature = 0: Deterministic, always picks most likely
Temperature = 1: Standard sampling
Temperature = 2: More random, creative
Low temp → Factual, consistent
High temp → Creative, varied
Prompt Engineering
Getting the best results from foundation models.
Basic Techniques:
1. Zero-Shot:
Classify this review as positive or negative:
"The product arrived broken and customer service was unhelpful."
2. Few-Shot:
Classify reviews:
"Great product!" → Positive
"Terrible experience" → Negative
"Worked perfectly" → Positive
Now classify:
"The product arrived broken" →
3. Chain-of-Thought:
Solve this step by step:
Q: If I have 3 apples and buy 2 more, then eat 1, how many do I have?
Let's think step by step:
1. Start with 3 apples
2. Buy 2 more: 3 + 2 = 5 apples
3. Eat 1: 5 - 1 = 4 apples
Answer: 4 apples
Advanced Patterns:
| Pattern | Description | Use Case | |---------|-------------|----------| | Role prompting | "You are an expert..." | Specialized responses | | System prompts | Set behavior rules | Chatbots, apps | | Structured output | "Return JSON..." | API integration | | RAG | Retrieve then generate | Knowledge-grounded responses |
Retrieval-Augmented Generation (RAG)
Combine foundation models with external knowledge.
User Query
│
▼
┌───────────────┐
│ Retriever │ ← Vector Database
│ (find docs) │ (your documents)
└───────┬───────┘
│ relevant docs
▼
┌───────────────┐
│ Generator │ ← Foundation Model
│ (LLM answer) │
└───────┬───────┘
│
▼
Response with citations
Benefits:
- Ground responses in your data
- Reduce hallucinations
- Keep information current
- No model retraining needed
Fine-Tuning vs RAG vs Prompting
| Approach | When to Use | Cost | Flexibility | |----------|-------------|------|-------------| | Prompting | Quick experiments, simple tasks | $ | High | | RAG | Need current/private knowledge | $$ | High | | Fine-tuning | Specific style/domain, high volume | $$$ | Lower |
Building GenAI Applications
Application Patterns:
1. Conversational AI:
User → Chat Interface → LLM → Response
↑
Conversation history
2. Content Generation:
Template/Prompt → LLM → Generated Content → Review → Publish
3. Code Assistant:
Code Context + Request → LLM → Generated Code → Tests → Integration
4. Knowledge Assistant (RAG):
Query → Retrieve → Augment Prompt → Generate → Cite Sources
Cloud GenAI Services
Azure:
- Azure OpenAI Service: GPT-4, DALL-E, embeddings
- Azure AI Studio: Build and deploy GenAI apps
- Semantic Kernel: Orchestration SDK
- Prompt Flow: Design and test prompts
AWS:
- Amazon Bedrock: Claude, Titan, Llama, Stable Diffusion
- Amazon Q: Enterprise AI assistant
- PartyRock: No-code GenAI apps
Google Cloud:
- Vertex AI: Gemini, PaLM, Imagen
- Generative AI Studio: Experiment with models
- Duet AI: Coding assistant
Challenges and Limitations
Common Issues:
| Challenge | Description | Mitigation | |-----------|-------------|------------| | Hallucinations | Making up facts | RAG, verification, grounding | | Bias | Reflecting training data biases | Testing, guardrails | | Inconsistency | Different answers each time | Lower temperature, caching | | Context limits | Can't process everything | Chunking, summarization | | Cost | Can be expensive at scale | Caching, smaller models | | Latency | Slow for real-time | Streaming, edge models |
Safety Considerations:
- Content filtering
- Prompt injection attacks
- Data privacy
- Misuse prevention
Exam Tips
Common exam questions test:
- Choosing between prompting, RAG, and fine-tuning
- Understanding tokens and context windows
- When to use different foundation models
- RAG architecture and benefits
- Responsible use of generative AI
Watch for keywords:
- "Need current information" → RAG
- "Specific writing style" → Fine-tuning
- "Quick prototype" → Prompting
- "Reduce hallucinations" → RAG, grounding
- "Creative content" → Higher temperature
Key Takeaway
Generative AI and foundation models are transforming software development. Understanding how to effectively use these models—through prompting, RAG, or fine-tuning—is essential. Focus on the right approach for your use case, be aware of limitations, and always consider responsible AI practices. The field is evolving rapidly, so staying current with new capabilities and best practices is crucial.
