Write real Python code to call Claude and OpenAI APIs — streaming, error handling, and practical patterns.
Before you can use Claude or OpenAI APIs, you need to set up your local environment. This is a one-time setup that takes 5 minutes.
Step 1: Install the SDKs
# Install Anthropic SDK (Claude) pip install anthropic # Install OpenAI SDK pip install openai
Step 2: Get API Keys
console.anthropic.com, sign up, create an API keyplatform.openai.com, sign up, create an API keyStep 3: Set Environment Variables
# On macOS/Linux: Add to ~/.bashrc or ~/.zshrc export ANTHROPIC_API_KEY="your-key-here" export OPENAI_API_KEY="your-key-here" # On Windows: Use System Variables or: set ANTHROPIC_API_KEY="your-key-here" set OPENAI_API_KEY="your-key-here" # Then reload your shell source ~/.bashrc
Always use environment variables. If you hardcode a key and push to GitHub, attackers can use your key before you realize it's exposed. Use environment variables, .env files, or secrets managers.
The Anthropic SDK makes calling Claude APIs simple and intuitive. Let's cover the core methods and patterns you'll use every day.
Basic API Call
import anthropic client = anthropic.Anthropic() # Uses ANTHROPIC_API_KEY env var response = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, system="You are a helpful assistant.", messages=[ {"role": "user", "content": "Hello!"} ] ) print(response.content[0].text)
Multi-Turn Conversation
import anthropic client = anthropic.Anthropic() conversation_history = [] def chat(user_message): """Add user message and get response.""" conversation_history.append({ "role": "user", "content": user_message }) response = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, system="You are a helpful code assistant.", messages=conversation_history ) assistant_message = response.content[0].text conversation_history.append({ "role": "assistant", "content": assistant_message }) return assistant_message # Conversation print(chat("What's a closure in Python?")) print(chat("Can you give me an example?")) print(chat("How is that different from a class?"))
Key Parameters:
Which Claude version to use. Current best: "claude-sonnet-4-5-20250929"
Maximum tokens in the response. Higher = longer responses, higher cost
System prompt that defines the model's behavior
List of user/assistant messages, the conversation history
temperature (0-1): Controls randomness. 0 = deterministic, 1 = creative.
top_p: Nucleus sampling. top_k: Limits vocabulary.
For most tasks: temperature=0 (deterministic). For creative tasks: temperature=0.7 (balanced).
The OpenAI SDK has a similar structure to Anthropic's. Learn both to understand the patterns. They map directly to each other.
OpenAI Basic Call
from openai import OpenAI client = OpenAI() # Uses OPENAI_API_KEY env var response = client.chat.completions.create( model="gpt-4o", max_tokens=1024, system="You are a helpful assistant.", messages=[ {"role": "user", "content": "Hello!"} ] ) print(response.choices[0].message.content)
Side-by-Side Comparison
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5...",
max_tokens=1024,
system="...",
messages=[...]
)
text = response.content[0].text
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
max_tokens=1024,
system="...",
messages=[...]
)
text = response.choices[0].message.content
Claude: Best for long-context tasks, reasoning, code analysis. GPT-4: Best for creative writing, multimodal (image) tasks, speed. Most teams use both — Claude for heavy thinking, GPT-4 for quick tasks.
Streaming is crucial for UX. Instead of waiting for the full response, you get tokens as they arrive. Users see text appearing in real-time.
Streaming with Anthropic
import anthropic client = anthropic.Anthropic() with client.messages.stream( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[ {"role": "user", "content": "Write a haiku"} ] ) as stream: for text in stream.text_stream: print(text, end="", flush=True) # Print without newline
Streaming with OpenAI
from openai import OpenAI client = OpenAI() stream = client.chat.completions.create( model="gpt-4o", max_tokens=1024, stream=True, messages=[ {"role": "user", "content": "Write a haiku"} ] ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True)
Use streaming for: chat interfaces, real-time UIs, long responses (code generation). Don't stream for: batch processing, APIs where you need the full response at once.
APIs fail. Rate limits, timeouts, network errors. Production code must handle these gracefully.
Production-Grade Error Handling
import time import anthropic client = anthropic.Anthropic() def call_claude_with_retry(prompt, max_retries=3): """Call Claude with exponential backoff on failure.""" for attempt in range(max_retries): try: response = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": prompt }] ) return response.content[0].text except anthropic.RateLimitError as e: print(f"Rate limited on attempt {attempt + 1}") if attempt < max_retries - 1: wait_time = 2 ** attempt # Exponential backoff print(f"Waiting {wait_time}s before retry...") time.sleep(wait_time) else: raise except anthropic.APIError as e: print(f"API error: {e.status_code} {e.message}") if attempt < max_retries - 1: wait_time = 2 ** attempt time.sleep(wait_time) else: raise except Exception as e: print(f"Unexpected error: {e}") raise return None # Usage try: result = call_claude_with_retry("Hello Claude") print(result) except Exception as e: print(f"Failed after retries: {e}")
Don't retry immediately. Use exponential backoff: wait 1s, then 2s, then 4s. This gives the API time to recover and prevents cascading failures.
LLM APIs charge by tokens. Track usage to manage costs. A million tokens might cost $0.50 (Claude) or $2.00 (GPT-4). Small optimizations add up.
Tracking Token Usage
import anthropic client = anthropic.Anthropic() response = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "Hello" }] ) # Token usage input_tokens = response.usage.input_tokens output_tokens = response.usage.output_tokens total_tokens = input_tokens + output_tokens # Cost estimation (Claude Sonnet pricing) input_cost = (input_tokens / 1_000_000) * 3 # $3 per 1M input tokens output_cost = (output_tokens / 1_000_000) * 15 # $15 per 1M output tokens total_cost = input_cost + output_cost print(f"Input: {input_tokens}, Output: {output_tokens}") print(f"Cost: ${total_cost:.6f}")
Cost Optimization Strategies:
For tasks that don't need immediate results (overnight processing), Anthropic offers batch API at 50% discount. Great for data processing pipelines.
Let's build a complete terminal chat application with conversation history, streaming, and error handling. This is production-grade code you can use as a template.
import os import json import anthropic client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY")) HISTORY_FILE = "chat_history.json" def load_history(): """Load conversation history from file.""" if os.path.exists(HISTORY_FILE): with open(HISTORY_FILE, "r") as f: return json.load(f) return [] def save_history(history): """Save conversation history to file.""" with open(HISTORY_FILE, "w") as f: json.dump(history, f, indent=2) def chat(user_input, history): """Send message and stream response.""" history.append({"role": "user", "content": user_input}) print("\nAssistant: ", end="", flush=True) full_response = "" try: with client.messages.stream( model="claude-sonnet-4-5-20250929", max_tokens=1024, system="You are a helpful AI assistant.", messages=history ) as stream: for text in stream.text_stream: print(text, end="", flush=True) full_response += text except anthropic.APIError as e: print(f"\nError: {e}") return history print("\n") history.append({"role": "assistant", "content": full_response}) return history def main(): """Main chat loop.""" history = load_history() print("✨ Claude Terminal Chat (type 'quit' to exit, 'clear' to reset)") print("━" * 50) while True: user_input = input("\nYou: ").strip() if not user_input: continue if user_input.lower() == "quit": print("Goodbye!") break if user_input.lower() == "clear": history = [] print("History cleared.") continue history = chat(user_input, history) save_history(history) if __name__ == "__main__": main()
To Run:
# Set your API key export ANTHROPIC_API_KEY="your-key" # Run the app python chat_app.py
This chat app has: conversation persistence (saves to disk), streaming for real-time UX, error handling for API failures, and a clean CLI interface. This is a foundation you can extend for production use.
1. Where should you store API keys?
2. Why use streaming for chat interfaces?
3. What's the best strategy for handling API rate limits?
4. How can you reduce LLM API costs?
You've mastered the foundations of Prompt Engineering & AI Agents.
You've learned structure, techniques, advanced patterns, iteration, and hands-on coding.
You're ready to build production AI applications.
Topic 6 covered: Environment setup, Anthropic SDK patterns, OpenAI SDK for comparison, streaming for UX, production-grade error handling with exponential backoff, cost optimization via token tracking, and a complete mini-project (terminal chat app).
Phase 1 Completed: You've journeyed from "How do LLMs work?" through structure, techniques, advanced patterns, iteration & debugging, and real hands-on API coding. You can now:
Next Phase (Topics 7-20): Build AI Agents, RAG systems, multi-agent architectures, and domain-specific applications.
You've unlocked the ability to go beyond simple prompts into autonomous systems.