Azure AI Foundry Platform Guide for Developers

Expert Answer: Build enterprise AI apps with Azure AI Foundry. Step-by-step guide to RAG, agents, and deployment. From Microsoft Certified Trainer Scott Hay. This is a practitioner's guide based on real-world deployments, not vendor documentation.

Azure AI Foundry consolidates every Azure AI service into a unified development platform, eliminating the context-switching that slows enterprise AI development. If you're building on existing Azure infrastructure, Foundry gives you a single control plane for GPT-4o models, vector search, agents, evaluation, and deployment—with built-in governance and compliance that standalone APIs can't match.

What You'll Learn

How to create an Azure AI Foundry hub and project with proper RBAC and networking
Deploy Azure OpenAI GPT-4o models and configure token limits for cost control
Build a RAG application using Azure AI Search for hybrid vector search on company documents
Implement multi-agent workflows using Azure AI Agent Service with built-in orchestration
Use Prompt Flow for visual LLM chain design and automated evaluation metrics
Deploy production AI apps with managed endpoints and integrated Application Insights monitoring

Prerequisites

Active Azure subscription with Contributor access (Visual Studio subscriptions include credits)
Azure OpenAI Service approved for your subscription (apply at aka.ms/oaiapply)
Basic familiarity with Azure Portal and REST APIs or Python/C# SDKs
Sample documents or data to test RAG functionality (PDFs, Word docs, or plain text)

Step 1

Create an Azure AI Foundry Hub with Governance Controls

Navigate to ai.azure.com and select 'Create new hub' from the Foundry portal. Choose your subscription and create a new resource group named 'rg-ai-foundry-prod'. Select a region that supports Azure OpenAI (East US, West Europe, or your compliance region). Enable managed identity, configure your virtual network if required for data sovereignty, and set up Azure Key Vault integration for secrets management. The hub provides centralized billing, security policies, and shared compute across all your AI projects—saving 40-60% on duplicate infrastructure costs compared to managing services individually.

💡 Tip: Use dedicated hubs for dev/test/prod environments with separate RBAC policies. This prevents accidental production deployments and simplifies cost tracking.

Step 2

Deploy Azure OpenAI Models to Your Project

Within your hub, create a project named 'enterprise-rag-app'. Navigate to the Deployments section and click 'Create deployment'. Select 'gpt-4o' for production workloads or 'gpt-4o-mini' for cost-sensitive scenarios (saves ~85% on token costs). Set Tokens Per Minute (TPM) quota to 30K for initial testing—you can scale this based on actual load. Deploy a text-embedding-3-large model for vector search embeddings. Each deployment gets a unique endpoint URL and API key managed through Azure Key Vault, eliminating hardcoded secrets in your codebase.

💡 Tip: Enable Content Safety filters at the deployment level to automatically block harmful prompts and responses, meeting enterprise compliance requirements without custom code.

Step 3

Configure Azure AI Search for Vector and Hybrid Search

Create an Azure AI Search resource in the same region as your hub—colocation reduces latency by 40-80ms per query. Select the Basic tier for development or Standard for production (supports up to 15 million documents). In the Foundry portal, navigate to 'Connected resources' and link your Search service. Create a new index with vector fields enabled: configure dimensions to 3072 for text-embedding-3-large compatibility. Upload your company documents using the built-in data ingestion pipeline—it automatically chunks documents, generates embeddings via your OpenAI deployment, and populates the vector index. This RAG foundation lets your models answer questions using current company data instead of stale training data.

⚠ Watch out: Search queries cost $0.10 per 1K requests on Standard tier. Enable query result caching in production to reduce costs by 60-70% for repeated questions.

Step 4

Build a RAG Application with Prompt Flow

Open Prompt Flow from the Foundry project menu and select 'Create new flow'. Choose the 'QnA with your data' template—it provides a pre-built chain with retrieval, prompt engineering, and response generation nodes. Connect the retrieval node to your Azure AI Search index from Step 3. Configure the LLM node to use your gpt-4o deployment with a system prompt that instructs the model to cite sources. Add an evaluation node that measures groundedness (hallucination detection) and relevance scores automatically. The visual canvas eliminates boilerplate orchestration code and lets you A/B test prompt variations in minutes instead of hours of refactoring.

💡 Tip: Use Prompt Flow's built-in evaluation runs to compare GPT-4o vs GPT-4o-mini performance on your specific data. Many teams save 80% on token costs by using mini for simple queries.

Step 5

Implement Multi-Agent Workflows with Azure AI Agent Service

Navigate to the Agent Service section in your project and create a new agent named 'document-analyst'. Assign it the Document Intelligence skill to extract structured data from uploaded PDFs. Create a second agent named 'data-validator' with custom code that verifies extracted data against business rules. Use the orchestration designer to chain these agents: document-analyst processes the PDF, passes results to data-validator, which returns a approval/rejection decision. The managed service handles message queuing, retry logic, and state management—eliminating weeks of custom orchestration code. This pattern scales to 10+ specialized agents without infrastructure complexity.

💡 Tip: Azure AI Agent Service integrates with Semantic Kernel—reuse existing SK plugins as agent skills without rewriting them for a proprietary framework.

Step 6

Add Semantic Kernel for Complex Agent Orchestration

Install the Semantic Kernel SDK in your development environment (NuGet for C# or pip for Python). Configure SK to use your Foundry project's Azure OpenAI endpoint via managed identity—no API keys in code. Create a planner plugin that dynamically selects agents based on user intent: route document questions to your RAG flow, data extraction requests to Document Intelligence, and multi-step analysis to your agent chain from Step 5. SK's function calling automatically maps natural language requests to the correct backend service. This architecture future-proofs your app—adding new AI capabilities means registering new plugins, not rewriting orchestration logic.

⚠ Watch out: SK planners can make 5-10 LLM calls per complex request. Set TPM quotas 3-5x higher than your expected direct API usage to avoid throttling during plan execution.

Step 7

Configure Content Safety and Responsible AI Policies

Enable Azure AI Content Safety in your project settings—it provides real-time filtering for hate speech, violence, self-harm, and sexual content across 100+ languages. Set severity thresholds (0-6 scale): use 2 for customer-facing apps, 4 for internal tools. Configure the Responsible AI dashboard to track prompt injection attempts, jailbreak patterns, and potential data leakage. Set up alerts when safety violations exceed baseline thresholds. These built-in controls satisfy most enterprise compliance requirements (GDPR, HIPAA, SOC2) without building custom safety layers—saving 200+ hours of security engineering.

💡 Tip: Content Safety annotates flagged content with specific violation categories. Log these to Application Insights for audit trails that compliance teams can review during certifications.

Step 8

Deploy Your AI App with Managed Endpoints

In Prompt Flow, select your tested flow and click 'Deploy'. Choose 'Managed online endpoint' for automatic scaling and load balancing. Configure instance type: 2 vCPU for development, 8+ vCPU for production workloads serving 1000+ requests/hour. Enable Application Insights integration—it automatically logs token usage, latency percentiles, and error rates. Set autoscaling rules: scale out when CPU exceeds 70% or queue depth hits 100 requests. The managed endpoint provides a REST API with Azure AD authentication and rate limiting built in—your frontend just calls HTTPS, and Foundry handles infrastructure, monitoring, and updates.

💡 Tip: Use deployment slots for blue-green deployments. Test new prompt versions in staging slots, then swap to production with zero downtime when validation passes.

Step 9

Integrate with Existing Azure Infrastructure

Connect your AI app to existing Azure resources using managed identity—no connection strings or passwords. Grant your Foundry project's identity read access to Azure Storage for document processing, write access to Cosmos DB for conversation history, and query access to SQL Database for business data retrieval. Use Azure API Management as a gateway: it provides token-based auth, request throttling, and usage analytics for AI endpoints. Set up Azure DevOps pipelines to deploy Prompt Flow updates via Infrastructure as Code—your entire AI stack becomes version-controlled and auditable. This integration leverages your existing Azure investment and governance frameworks instead of introducing shadow IT.

⚠ Watch out: Managed identity requires Contributor or custom roles on target resources. Work with your Azure admins to grant minimum required permissions—avoid Owner role for security.

Step 10

Monitor Costs and Optimize Token Usage

Open Cost Management in your Azure portal and filter by your AI Foundry resource group. Azure OpenAI tokens typically represent 60-80% of AI costs—track this in the 'Azure OpenAI' meter. Enable token usage dashboards in Application Insights to identify high-cost queries: sort by tokens consumed per request. Optimize expensive queries by caching results for 1-24 hours using Azure Cache for Redis—saves 50-70% on repetitive questions. Switch low-complexity queries from GPT-4o to GPT-4o-mini (85% cost reduction). Set budget alerts at 80% and 100% of expected monthly spend. Most enterprise RAG apps cost $200-800/month for 10K-50K queries—far less than building and maintaining custom models.

💡 Tip: Use Azure AI Foundry's built-in A/B testing to measure if GPT-4o's quality justifies 6-8x higher cost vs mini for your specific use case. Many teams run 70% traffic on mini, 30% on full model.

Summary

You now have a production-ready Azure AI Foundry environment with deployed language models, vector search for RAG, multi-agent workflows, and managed endpoints—all integrated with your existing Azure infrastructure and security policies. This architecture scales from prototype to enterprise production without rearchitecting, and costs 40-60% less than managing Azure AI services separately due to shared compute and governance overhead.

Next Steps

Schedule a 30-minute Azure AI Foundry architecture review to optimize your specific workload and identify cost savings opportunities
Enroll in AI-102: Designing and Implementing Azure AI Solutions to earn Microsoft certification and master advanced Foundry patterns
Book a Semantic Kernel workshop to learn production agent orchestration patterns and reusable plugin development
Request an AI readiness assessment ($2,500) to evaluate your team's skills gaps and create a 90-day Azure AI adoption roadmap

Need AI Workflow Implementation for Your Business?

I build AI solutions that work for boring businesses—HVAC, dental, construction, professional services. Custom implementations in 90 days. We help your team implement the workflow, offload the task, and learn how to use the right assistant or agent day to day.

Book a Free Consultation

Scott Hay Microsoft Certified Trainer & AI Solutions Architect Microsoft Certified Trainer (MCT) • Delivers 11 Microsoft Copilot courses (MS-4002, MS-4004, MS-4010, MS-4014, MS-4015, MS-4017, MS-4018, MS-4019, MS-4021, MS-4022, and MS-4023) plus Azure AI, Power BI • Azure AI Agents, Semantic Kernel, Power BI (PL-300), Power Platform certified • Former Microsoft and Amazon — 30+ years building production systems • Builds practical AI implementations for businesses with 90-day delivery