FastAPI for MLOps: Python Project Structure and API Best Practices

In this lesson, you will learn how to structure a Machine Learning (ML) project like a real production system, complete with a src directory layout, layered configuration, environment management, logging, and a FastAPI service that exposes your model through clean Application Programming Interface (API) routes.

fastapi-for-mlops-python-project-structure-featured.png

This lesson is the 1st of a 2-part series on Software Engineering for Machine Learning Operations (MLOps):

FastAPI for MLOps: Python Project Structure and API Best Practices (this tutorial)
Lesson 2

To learn how to build reliable, scalable ML software the right way, just keep reading.

Looking for the source code to this post?

Introduction

Modern ML systems do not succeed because of models alone — they succeed because of the software engineering wrapped around them. Most real-world failures in MLOps come from poor structure, missing configuration, messy environments, unclear APIs, or nonexistent logging, not from bad ML.

This lesson gives you the engineering foundation you need to build ML systems that are stable, testable, and production-ready. You’ll learn how to structure your project, manage environments, load configurations, build APIs, and prepare your system for future modules like testing, deployment, and automation.

To learn how solid software engineering underpins every ML workflow, just keep reading.

What You Will Build and Learn

In this lesson, you’ll build the backbone of a real ML application: a clean repository layout, environment management with modern tooling, configuration loading via Pydantic, structured logging, a FastAPI interface, and a simple service layer to power prediction.

These concepts form the “foundation layer” every MLOps system relies on — regardless of the model you eventually plug in.

Why Software Engineering Comes First in MLOps Best Practices

ML projects fail not because the model is wrong, but because the plumbing around the model collapses. Scripts turn into spaghetti, notebooks become unmaintainable, configs get scattered, and environments drift until the system becomes impossible to debug.

Good software engineering fixes this by introducing structure, consistency, and predictable behavior. When your API, config, logs, and model code work together cleanly, everything built on top (e.g., testing, serving, scaling, monitoring) suddenly becomes reliable.

Where This Fits in the Overall Curriculum

This lesson is the foundation of the entire MLOps series. Everything that comes next — testing, model integration, deployment workflows, Continuous Integration/Continuous Delivery (CI/CD) automation, monitoring, and scaling — builds on the engineering habits you establish here.

Think of this as your “software engineering base layer.” Once you master this structure, adding real models, adding load testing, or plugging the system into cloud infrastructure becomes far easier.

Python Project Structure Best Practices for MLOps

A well-structured repository is the first sign of a healthy ML system. Before we write any API code or load a model, we need a layout that cleanly separates configuration, services, models, and utilities. This not only prevents chaos — it makes testing, scaling, and future modules dramatically easier.

How to Structure a Python Project with src/ Layout

ML projects quickly become messy if everything sits at the root level. The src/ layout prevents naming collisions, enforces imports that match production structure, and makes it clear where application code actually lives.

This is the same structure used in mature Python services deployed in production environments.

Python Project Structure Explained: Repository Walkthrough

Here’s the repository layout we’re working with in this module (the exact tree will be shown later when you provide it):

sw-eng-mlops/
│
├── src/
│   ├── core/
│   ├── models/
│   ├── services/
│   ├── api/
│   ├── utils/
│   └── config/
│
├── tests/
│   ├── unit/
│   ├── integration/
│   └── performance/
│
├── pyproject.toml
├── README.md
├── setup_env.sh
└── .env.example

This structure is intentionally clean: core/ contains primitives, models/ stores your ML logic, services/ contains business logic, and api/ exposes everything through FastAPI routes.

Python Project Structure Best Practices: Directory Breakdown

core/ — The Application Base Layer

This folder contains shared components such as logging setup, base classes, or utility abstractions. Everything here is meant to be reusable across the whole system.

models/ — ML or Dummy Model Code

Even if you’re starting with a dummy model, isolating model code here makes it easy to swap in real models later.

services/ — The Business Logic Layer

This is where you place the logic that actually powers /predict, not inside the API route. This separation keeps production-grade APIs maintainable.

api/ — FastAPI Endpoints

Routes live here. Each endpoint calls a service, which calls a model.

Tight, clean, and testable.

utils/ — Shared Helpers

Config loaders, yaml readers, or general-purpose helper functions sit here.

If it isn’t domain logic or a model, it goes here.

config/ — Configuration Files

YAML configs, BaseSettings classes, validation logic, and environment overrides.

Centralizing config makes behavior predictable and testable.

How This Structure Scales to Larger ML Systems

This layout scales easily as your ML workload grows:

Add a new model → create a folder inside models/.
Add a new prediction workflow → add a service in services/.
Add new API functionality → add a route in api/.
Add data pipelines or vector DB logic → expand core/ or services/.

This way, the project grows horizontally, not chaotically.

Would you like immediate access to 3,457 images curated and labeled with hand gestures to train, explore, and experiment with … for free? Head over to Roboflow and get a free account to grab these hand gesture images.

Managing Python Dependencies with Poetry for ML Projects

Modern MLOps projects rely on predictable, repeatable environments — and this section teaches you how to create exactly that. Before we build APIs or load models, we need a clean, isolated workspace where dependencies are installed, versions are pinned, and tools behave consistently across machines.

To learn how to manage dependencies, virtual environments, and setup scripts in real-world ML projects, just keep reading.

Python Poetry vs PDM vs UV: Choosing a Package Manager for MLOps

There are 3 modern Python toolchains worth knowing:

Poetry: full-featured dependency + environment + packaging manager.
PDM (Python Dependency Manager): simpler and faster than Poetry, with PEP-582 support.
UV: an extremely fast Rust-based package manager from Astral.

All 3 support pyproject.toml, the modern Python standard for dependencies and metadata.

Teams often standardize on a single tool, but your project supports all three, so students can use whichever they prefer.

Understanding pyproject.toml for Python Project Configuration

Your pyproject.toml defines:

project name, version, description
dependencies like fastapi, pydantic, pyyaml
dev tools like pytest (Lesson 2)
optional entrypoints (start-server = "src.main:main")

In other words, it is the single source of truth for installation and build metadata.

Any tool (Poetry, PDM, UV, pip) reads this file to install exactly what the project needs.

This is how professional ML systems avoid “works on my machine” issues.

Installing Dependencies (Poetry, PDM, UV)

Using Poetry (recommended)

poetry install
poetry shell
poetry run python src/main.py

Poetry creates an isolated virtual environment and resolves all versions deterministically.

Using UV (lightweight + blazing fast)

uv venv
source .venv/bin/activate
uv pip install -e .
python src/main.py

UV is perfect for fast installs and CI systems where speed matters.

Using PDM (simple + modern)

pdm install
pdm run python src/main.py

PDM feels like npm — no venv folder by default; lightweight and straightforward.

Managing Python Virtual Environments for Reproducible MLOps

Regardless of what tool you choose, the goal is the same: isolate project dependencies from the system Python installation.

Poetry creates its own environment automatically.
UV uses .venv/ inside your project.
PDM can create or avoid virtual environments depending on the configuration.

The important principle:

Never install ML dependencies globally.

Environments keep your project reproducible and safe.

Automating MLOps Setup with Python Environment Scripts

Your project includes a helper script:

./scripts/setup_env.sh

This script:

Detects whether Poetry, UV, or plain pip is available
Installs dependencies using the detected tool
Creates or activates the .env file
Shows the next steps to start the API

This is extremely helpful for teams because it removes all “setup guessing” and gives new developers a consistent starting point.

You now know how environments, dependency managers, and pyproject.toml work together to create a stable foundation for ML systems. With everything installed and configured, you’re ready to build and serve a real API.

Up next, we’ll create your first ML service with FastAPI and connect it to your project’s service layer.

Need Help Configuring Your Development Environment?

Having trouble configuring your development environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join PyImageSearch University — you will be up and running with this tutorial in a matter of minutes.

All that said, are you:

Short on time?
Learning on your employer’s administratively locked system?
Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
Ready to run the code immediately on your Windows, macOS, or Linux system?

Then join PyImageSearch University today!

Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.

And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!

Configuration Management in MLOps: YAML, .env, and Pydantic

How the entire ML system loads, merges, and applies configuration at runtime.

Configuration is one of the most important engineering foundations in any ML system. In Lesson 1, we want students to walk away understanding not only why configuration matters but exactly how this project loads and merges config values. That means stepping through the real code inside src/core/config.py, the .env.example, and configs/config.yaml.

We also want to show how the API, model, and services consume configuration. So when students replace the dummy model with a real one, the pattern already scales.

Let’s walk through it piece by piece.

Using Pydantic Settings for MLOps Configuration Management

Your configuration system starts with a Settings class:

class Settings(BaseSettings):
    api_host: str = "0.0.0.0"
    api_port: int = 8000
    debug: bool = False
    environment: str = "development"
    log_level: str = "INFO"

    class Config:
        env_file = ".env"
        env_file_encoding = "utf-8"

What This Means for MLOps Configuration and System Design

Pydantic’s BaseSettings automatically reads:
- environment variables
- .env file
- any overrides you pass at runtime
Defaults are provided in code so the system always works, even if .env is missing.
Type safety ensures that if someone writes API_PORT=hello, the app will fail fast.

This is the right pattern for ML systems where dozens of environment variables must be synchronized across dev, test, staging, and production.

Loading YAML and Merging Layers

Next comes one of the most important parts of your system:

def load_config() -> Settings:
    settings = Settings()

    config_path = "configs/config.yaml"
    if os.path.exists(config_path):
        yaml_config = load_yaml_config(config_path)

        for key, value in yaml_config.items():
            if hasattr(settings, key):
                setattr(settings, key, value)

    return settings

Why This Is Powerful

You now have layered configuration, which production ML systems use everywhere:

Layer 1: Code defaults

Ensures the app always runs.

Layer 2: YAML (configs/config.yaml)

Great for team-shared configs, model settings, cache sizes, service parameters.

Layer 3: .env file

Local overrides (ports, debug mode, secrets).

Layer 4: Runtime environment variables

Final source of truth in cloud deployments.

This layered system prevents the “hard-coded value” trap and keeps ML infra consistent across environments.

Designing YAML Configs for Scalable MLOps Pipelines

Your YAML file contains deeper structural config:

api_host: "0.0.0.0"
api_port: 8000
debug: true
environment: "development"

log_level: "INFO"

model:
  name: "dummy_classifier"
  version: "1.0.0"
  cache_size: 100

service:
  timeout: 30
  max_retries: 3

Even though Settings does not yet support nested objects for models or services, YAML allows you to introduce new structured configuration later. This is how real ML teams configure:

model version
tokenizer version
max batch size
timeouts
cache settings
experiment IDs

Using .env Files for Secure MLOps Configuration

You also provide .env.example:

API_PORT=8000
API_HOST=0.0.0.0
DEBUG=true
ENVIRONMENT=development
LOG_LEVEL=INFO

Why Configuration Management Matters in MLOps Systems

.env.example acts as documentation and a template.
You copy it to .env, fill values, and the system boots.
This is a best practice in every production ML repo.

How the App Uses Configuration (src/main.py)

Your FastAPI entrypoint reads config like this:

logger.info(f"Starting server on {settings.api_host}:{settings.api_port}")

uvicorn.run(
    "main:app",
    host=settings.api_host,
    port=settings.api_port,
    reload=settings.debug
)

Meaning:

Change .env to API_PORT=9000: Your app automatically runs on port 9000.
Change YAML to debug: false: Hot reload turns off.

This is the practical benefit of structured configuration: no hard-coded values are buried inside the code.

How FastAPI Uses Configuration in Production MLOps Systems

Today, your inference service is simple, but in real projects, you might use:

model name
version
batch size
latency budget
max retries
cache settings
rate limits

All of these come from settings, not hardcoded logic.

In this lesson, you teach the pattern, so when the dummy model is eventually replaced with an Open Neural Network Exchange (ONNX) model, a Hugging Face model, or a custom PyTorch model, the service already has the right structure.

Extending MLOps Configuration Safely in Python Projects

Suppose tomorrow you want:

MODEL_PATH=models/checkpoint.pt
ENABLE_CACHE=true
CACHE_TTL=300

You add:

model_path: str = "models/dummy.pt"
enable_cache: bool = False
cache_ttl: int = 120

Then update .env.example. Then, optionally override in YAML.

The app instantly supports new behavior — no rewrites, no refactoring, no confusion.

This is the level of software engineering maturity we want students to learn.

Logging Best Practices for MLOps and FastAPI Applications

Logging is one of the most underappreciated parts of an ML system. A model prediction might take milliseconds, but diagnosing a production issue without proper logs can take hours. Good logs reduce that time to minutes. In this section, we’ll look at how our lesson’s project initializes a logger, formats log messages, and uses logs consistently across the entire API.

Why Logging Is Critical for ML Systems

ML systems fail in ways traditional software does not.

A model might produce an unexpected prediction, a dependency might break silently, or the environment might load the wrong configuration. Logging gives you the breadcrumbs needed to understand:

What inputs reached the API
What model version was used
What the service did before failing
How often errors occur
Whether latency is increasing

Logs are your “black box recorder” when something goes wrong, and they’re equally important when everything seems to be working — because they tell you why things are working.

Logger Initialization

The project defines a single shared logger in src/core/logger.py:

import logging
import sys

logger = logging.getLogger("mlops-lesson1")
logger.setLevel(logging.INFO)

handler = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)

if not logger.handlers:
    logger.addHandler(handler)

Here’s what this setup accomplishes:

A named logger (mlops-lesson1) groups logs for later aggregation (e.g., in Datadog, ELK (Elasticsearch, Logstash, Kibana), OpenTelemetry).
INFO as the default level ensures we capture meaningful operational details without spamming output.
A StreamHandler writes logs to stdout — the standard for containerized deployments (Docker, Kubernetes).
A simple timestamped formatter makes logs human-readable while remaining machine-parseable.
The if not logger.handlers: guard prevents duplicate logs if modules are reloaded.

This small file gives us a production-friendly logger with minimal overhead.

Log Formatting and Levels

The logger uses this format:

2025-01-01 12:34:56 - INFO - Prediction result: positive

Each part of the log line matters:

Timestamp: crucial for correlating logs with events or latency spikes.
Log level: signals severity: INFO, WARNING, ERROR.
Message: the human-readable explanation.

In MLOps systems, you’ll most commonly use:

INFO for model loading, API calls, predictions
WARNING for slow responses, unexpected patterns
ERROR when something fails

Because FastAPI reloads modules during development, you may see log duplication without safeguards — which is why we include the if not logger.handlers: check.

If you later want structured JSON logs (for cloud log ingestion), this same module is the place to upgrade.

Logging Across the App

The logger is used in multiple places, showing a consistent logging strategy.

Health endpoint (src/main.py)

@app.get("/health")
async def health_check():
    logger.info("Health check requested")
    return {"status": "ok"}

This gives visibility into uptime checks — important when a load balancer or Kubernetes performs probes.

Prediction endpoint (src/services/inference_service.py)

logger.info(f"Making prediction for input: {input_text[:50]}...")
prediction = model.predict(input_text)
logger.info(f"Prediction result: {prediction}")

Here we log:

The incoming input (truncated to avoid leaking full user data)
The model’s output
Any errors

If something goes wrong:

except Exception as e:
    logger.error(f"Error during prediction: {str(e)}")
    raise

This ensures errors appear in the logs before FastAPI converts them into HTTP exceptions.

Server startup (main.py)

logger.info(f"Starting server on {settings.api_host}:{settings.api_port}")

This is important for:

verifying the config loaded correctly
ensuring the correct port is used
debugging environments with conflicting overrides

Together, This Gives Us Structured, Traceable Behavior Across the App

If a user reports:

“The API feels slow today.”

You can immediately look at:

prediction request timestamps
whether model loading was triggered again
whether latency warnings appear
whether certain inputs correlate with errors

Without logs, you’re flying blind.

FastAPI for MLOps: Building a Production ML API

APIs are the interface between your ML system and the outside world. Whether the consumer is a mobile app, a batch job, another microservice, or a human developer testing in Postman, every interaction eventually flows through an API. In MLOps, your API becomes the stable contract that hides internal details (model type, version, preprocessing, logging) — allowing you to upgrade models without breaking clients.

Why FastAPI Is Ideal for MLOps API Development

FastAPI gives you a fast, typed, and production-ready way to expose ML predictions.

It handles validation, serialization, documentation, and error responses, so your ML logic stays clean and modular.

The goal is simple: your API should stay stable even when everything behind it changes — models, configs, logging, monitoring, infrastructure.

Creating a FastAPI Application for Machine Learning APIs

Your project defines the API inside src/main.py:

from fastapi import FastAPI
app = FastAPI(
    title="ML Service API",
    description="Code Foundations & API Engineering for MLOps",
    version="0.1.0"
)

This initializes a fully documented ML service with:

A title for the UI
A description that shows up in Swagger
A semantic version
Automatically generated schemas

FastAPI instantly gives you API docs and a clean, declarative way to add endpoints.

Implementing Health Check Endpoints in FastAPI (MLOps)

A health endpoint is the first thing any production system needs.

Kubernetes, AWS Application Load Balancer (ALB), Docker Compose, Jenkins, and uptime monitors all rely on it.

Your implementation:

@app.get("/health")
async def health_check():
    logger.info("Health check requested")
    return {"status": "ok"}

This performs 2 critical functions:

Confirms the API server is alive
Confirms logs are working

It also gives you a simple smoke test to verify the environment.

Building a FastAPI Prediction Endpoint for ML Models

The /predict endpoint is where real ML work happens.

@app.post("/predict")
async def predict_route(input: str):
    return {"prediction": predict_service(input)}

This endpoint:

Accepts a simple string input
Passes it into the inference service
Returns a structured JSON prediction

Because prediction logic is isolated in services/inference_service.py, the API stays lightweight and focused on HTTP behavior — not business logic.

Behind This Endpoint Is Your Prediction Engine

from models.dummy_model import DummyModel

model = DummyModel()

def predict(input_text: str) -> str:
    logger.info(f"Making prediction for input: {input_text[:50]}...")
    prediction = model.predict(input_text)
    logger.info(f"Prediction result: {prediction}")
    return prediction

Even though this is a dummy model, the structure mirrors real production design:

The service layer owns the prediction logic
The model is instantiated once
Logging wraps the input and output

When you upgrade to a real transformer or classifier, the API does not need to change.

Deploying FastAPI with Uvicorn for MLOps Applications

The server entrypoint lives at the bottom of main.py:

def main():
    logger.info(f"Starting server on {settings.api_host}:{settings.api_port}")
    uvicorn.run(
        "main:app",
        host=settings.api_host,
        port=settings.api_port,
        reload=settings.debug
    )

A few details matter:

reload=True reloads on code changes → perfect for development
host and port come from config → ideal for containers/cloud
logging is integrated → so you can trace server start behavior

You can run the server with:

poetry run start-server

uvicorn src.main:app --reload

Both give you a live API with hot reload.

Auto-Generated API Docs (Swagger, ReDoc)

FastAPI automatically exposes:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
OpenAPI schema: http://localhost:8000/openapi.json

These docs are invaluable in ML workflows because:

You can test predictions interactively
Product, QA, and frontend engineers can explore endpoints
Payload schemas are always up to date
No one needs to ask “What does this endpoint expect?”

FastAPI generates this from your Python type hints, which makes documentation essentially free.

MLOps Architecture: Service Layer Design Patterns

The service layer is where your application’s real business logic lives. In an ML system, this includes preprocessing, model selection, inference, error handling, postprocessing, and logging. By keeping this logic out of your API routes, you ensure that your codebase remains modular, testable, and ready for future model upgrades.

Why We Separate Services from Routes

FastAPI routes should only handle HTTP concerns: input validation, request parsing, and response formatting.

They should not know how your model works internally.

Separating logic into a services/ folder gives you:

Cleaner API routes: easier to read and maintain
Better testability: you can unit test the inference logic without starting a server
Loose coupling: upgrading models doesn’t require rewriting routes
Clear ownership: one layer handles HTTP, the other handles ML logic

This separation is one of the most critical software engineering patterns in MLOps — you want your system flexible enough that models can change, scale, or switch frameworks without touching your API.

Designing an ML Inference Service

Your inference logic lives in:

src/services/inference_service.py

Let’s look at how it’s structured:

from models.dummy_model import DummyModel
from core.logger import logger

# Initialize model
model = DummyModel()
logger.info(f"Loaded model: {model.model_name}")

This loads the model once at startup. In a real ML system, this is where:

You load a transformer model
You warm up a GPU
You hydrate a vector store
You initialize the tokenizer/preprocessor state

Then comes the prediction function:

def predict(input_text: str) -> str:
    logger.info(f"Making prediction for input: {input_text[:50]}...")
   
    try:
        prediction = model.predict(input_text)
        logger.info(f"Prediction result: {prediction}")
        return prediction
    except Exception as e:
        logger.error(f"Error during prediction: {str(e)}")
        raise

This function represents the business logic of your ML service:

It trims the input for logging
Calls the model’s predict()
Logs errors and output cleanly
Returns only the result — not HTTP details

This is exactly why we keep services separate: inference is not an HTTP concern, so it does not belong in a route.

Scaling MLOps Systems with Modular Service Architecture

A great design scales. Tomorrow, your system might need:

SentimentService: for NLP
RecommendationService: for personalization
VisionService: that loads YOLO or CLIP
BatchService: for async workflows
RetrievalService: for Retrieval-Augmented Generation (RAG) pipelines

You don’t modify main.py or existing endpoints.

You simply add more files under:

src/services/
├── inference_service.py  
├── recommendation_service.py  
├── vision_service.py  
└── retrieval_service.py

Each service becomes independent, testable, and reusable.

Later in Lesson 2, this design becomes even more powerful because:

Unit tests: target individual services
Integration tests: validate routes and services working together
Load tests: measure the throughput of the /predict pipeline

By the time you add real ML models, this service layer becomes the heart of your system.

Model Abstraction in MLOps: Decoupling ML from APIs

Models change constantly in MLOps. Today you may be serving a dummy classifier; tomorrow it might be a 7B LLM or a YOLOv12 object detector. A good software engineering foundation treats the model as a pluggable, versioned component that can be replaced with minimal friction.

Your current models/ directory demonstrates exactly how this abstraction works.

Designing a Python ML Model Class for MLOps

Your lesson uses a simple placeholder model located at:

src/models/dummy_model.py

The goal of this class isn’t to perform “real” ML — it’s to give you a clean structure that mimics how production model classes are written.

class DummyModel:
    def __init__(self) -> None:
        self.model_name = "dummy_classifier"
        self.version = "1.0.0"
   
    def predict(self, input_data: Any) -> str:
        text = str(input_data).lower()
        if "good" in text or "great" in text:
            return "positive"
        return "negative"

Even in this tiny model, you already see foundational patterns:

A constructor to load or initialize model state
A predict() method that defines the inference interface
model_name and version fields for introspection and tracking

This interface is intentionally minimal: it forces your service and API layers to depend on an abstraction, not on implementation details.

In real MLOps systems, this exact pattern makes it easy to introduce new models without breaking your API.

How to Replace Dummy Models with Production ML Models

Here’s where the abstraction shines.

If tomorrow you decide to replace the dummy model with:

A Hugging Face transformer
A PyTorch Lightning checkpoint
A TensorRT engine
An ONNX Runtime session
A vLLM text-generation server
A YOLO detection model

…all you need to do is drop a new file into:

src/models/

For example:

src/models/
├── dummy_model.py
├── sentiment_model.py
├── llm_generation_model.py
└── object_detector.py

And update your service:

from models.sentiment_model import SentimentModel
model = SentimentModel()

Nothing else changes.

Your FastAPI routes stay the same.

Your service interface stays the same.

Your tests stay the same (except for new model-specific tests).

This is model decoupling.

This is how ML systems avoid turning into tangled spaghetti when models evolve.

Versioning the Model Class

Model versioning is a real production concern, and your dummy model subtly teaches the pattern.

self.version = "1.0.0"

Model versioning matters because:

You may deploy multiple models at once
Clients might depend on specific behaviors
A/B testing needs separate versions
Rollbacks require deterministic reproducibility
Monitoring tools (e.g., Prometheus or Langfuse) track model changes

In production, versioning happens in several places:

version field in the class
model registry tag (MLflow, SageMaker, Hugging Face Hub)
Docker image tag
config.yaml entry
model card metadata

Your project follows the simplest, clearest entrypoint: a version attribute that propagates everywhere the model is used.

Later in Lesson 2, test cases and load tests will automatically pick up this version, mimicking real-world CI/CD systems that validate each model release.

Building Reusable Utilities in Python MLOps Projects

A well-designed ML system always contains a dedicated utilities layer — small, reusable functions that solve cross-cutting problems without polluting your core logic, service layer, or API routes.

In this project, the src/utils/ folder gives you a clean space to organize those helpers, starting with configuration loading, and is ready to grow as your system becomes more complex.

This layer keeps your codebase maintainable, testable, and extensible.

Loading YAML Configs

Your primary helper is load_yaml_config() found in:

src/utils/helpers.py

Here’s the implementation:

def load_yaml_config(path: str) -> Dict[str, Any]:
    config_path = Path(path)
   
    if not config_path.exists():
        return {}
   
    try:
        with open(config_path, 'r', encoding='utf-8') as file:
            config = yaml.safe_load(file)
            return config if config is not None else {}
    except yaml.YAMLError as e:
        print(f"Error loading YAML config from {path}: {e}")
        return {}
    except Exception as e:
        print(f"Unexpected error loading config from {path}: {e}")
        return {}

This function may look simple, but it embodies 3 production-level lessons:

Separation of concerns

Your application logic (FastAPI, inference services) should not know how a YAML file is parsed. They should only receive clean configuration objects.

Fault tolerance

In real deployments:

configs may be missing
YAML indentation may break
a misconfigured CI pipeline may pass an empty file

Returning {} instead of crashing gives you graceful degradation.

Extensibility

Tomorrow you may add:

JSON config support
remote config loading (S3, Google Cloud Storage (GCS), Azure Blob)
encrypted secrets
multiple config layers

This helper becomes the foundation.

Inside core/config.py, you saw how load_yaml_config() merges YAML values into your Pydantic settings. This is a real-world pattern used in production MLOps stacks like Airflow, FastAPI microservices, Ray Serve, and MLflow.

Adding New Helper Functions

The utilities layer is designed to grow organically as your system grows.

Common helpers you may introduce later include:

String helpers

text normalization
input cleaning
token counting

File helpers

safe file writes
temporary directory management
checksum calculation for model files

Model helpers

downloading artifacts from cloud storage
caching models on disk
validating model signatures

API helpers

request validation
standardized error responses
retry/backoff wrappers around external calls

Monitoring helpers

timing decorators
metrics emitters (Prometheus, StatsD, OpenTelemetry)
latency buckets

All of these belong in one place:

src/utils/

This prevents your service layer or route handlers from becoming cluttered and ensures that common functionality is implemented once and reused everywhere.

Running a FastAPI MLOps Application Locally

At this point, you have a fully structured ML application: configuration, logging, models, service layer, and a clean FastAPI interface. Now it’s time to actually run the system locally.

This section walks you through running the API with Poetry, UV, or PDM, depending on your setup. We’ll conclude with a quick validation test to ensure everything works end-to-end.

Running via Poetry

If you’re using Poetry (recommended for most workflows), your steps are:

# Install dependencies
poetry install

# Activate the environment
poetry shell

# Start the API server
poetry run python src/main.py

You should see log lines like:

INFO - Starting server on 0.0.0.0:8000
INFO - Loaded model: dummy_classifier

**Figure 1:** Running ML API using Poetry

Running via UV

If you prefer UV (super-fast installer by Astral), run:

# Create and activate a virtual environment
uv venv
source .venv/bin/activate

# Install project in editable mode
uv pip install -e .

# Start the API
python src/main.py

This path is great for users who want lightweight dependency management without Poetry’s abstraction.

Running Python MLOps Projects with PDM

If your workflow uses PDM, run:

# Install dependencies
pdm install

# Start the server
pdm run python src/main.py

PDM offers a cleaner pyproject-first workflow and works well for CI/CD pipelines that prefer explicit environment setup.

**Figure 2:** Terminal showing a successful server started via PDM dependency resolution.

Testing FastAPI Endpoints: Health Check and Prediction API

Once the server is running, validate the system with 2 quick API calls.

Health Check

Open:

http://localhost:8000/health

Expected response:

{"status": "ok"}

This confirms:

the API is reachable
config and logger initialized
FastAPI routes are registered

Prediction Test

Send a prediction request:

curl -X POST "http://localhost:8000/predict?input=This+is+good"

Expected response:

{"prediction": "positive"}

Under the hood:

the service layer logs the request
the dummy model classifies sentiment
the API returns structured JSON

**Figure 3:** Auto-generated documentation for the ML API.

**Figure 4:** Real terminal output from running the `/predict` endpoint, validating the end-to-end workflow of the ML API.

What's next? We recommend PyImageSearch University.

Course information:
86+ total classes • 115+ hours hours of on-demand code walkthrough videos • Last updated: April 2026
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 86+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 86 Certificates of Completion
✓ 115+ hours hours of on-demand video
✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this lesson, you learned how to build a clean, scalable foundation for ML systems using real software-engineering practices. You now understand why ML projects must be structured like production services — not experiments — if they are ever going to ship reliably.

We began by exploring the why: ML code becomes maintainable only when you enforce clear boundaries between configuration, logic, services, and I/O. That idea naturally led to the src/ layout, which gave our project a predictable and extensible shape.

You then learned how to manage dependencies using Poetry, UV, or PDM — ensuring that every ML environment is reproducible, isolated, and easy to rebuild. This solved the classic “it works on my machine” trap that haunts ML teams.

Next, we built a robust configuration system using Pydantic BaseSettings, merging defaults, YAML files, and .env variables into a single typed interface. You now have a configuration pattern used by real-world production ML systems.

We also implemented structured logging, enabling the application to communicate what it’s doing internally — a prerequisite for debugging, observability, and monitoring.

From there, you built your first production-style ML API with FastAPI, complete with /health, /predict, and auto-generated documentation. You learned how to expose ML logic cleanly, and why APIs are the interface between ML systems and the real world.

We introduced the Service Layer, showing how routes should delegate to independent business logic so APIs stay thin and models stay swappable. This design decision is what makes the system testable and future-proof.

You then explored model abstraction, using a simple dummy model to illustrate how real models (PyTorch, TensorFlow, ONNX, vLLM, Transformers) can be slotted in without changing the API layer.

Finally, you saw how helper utilities make the system cleaner, and how to run the full application with Poetry, UV, or PDM. The result is a working ML service that looks, behaves, and organizes itself like production-grade software.

By completing this lesson, you’ve built the foundation required for every advanced MLOps practice: testing, performance monitoring, CI/CD, orchestration, and deployment.

You’re now ready for Lesson 2, where we transform this service into a fully tested, validated, and performance-monitored ML system.

Citation Information

Singh, V. “FastAPI for MLOps: Python Project Structure and API Best Practices,” PyImageSearch, S. Huot, A. Sharma, and P. Thakur, eds., 2026, https://pyimg.co/yn8a5

@incollection{Singh_2026_fastapi-for-mlops-python-project-structure,
  author = {Vikram Singh},
  title = {{FastAPI for MLOps: Python Project Structure and API Best Practices}},
  booktitle = {PyImageSearch},
  editor = {Susan Huot and Aditya Sharma and Piyush Thakur},
  year = {2026},
  url = {https://pyimg.co/yn8a5},
}

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!