transformers Archives - PyImageSearch

Build a VLC Playlist Generator with SmolVLM for Video Highlight Tagging

October 27, 2025

Table of Contents Build a VLC Playlist Generator with SmolVLM for Video Highlight Tagging Configuring Your Development Environment Setup and Imports Helper Functions Main VideoHighlightDetector Class Creating the XSPF Playlist Gradio Interface Logic Launch the Gradio Application Output Summary Citation…

Read More of Build a VLC Playlist Generator with SmolVLM for Video Highlight Tagging

The Rise of Multimodal LLMs and Efficient Serving with vLLM

September 15, 2025

Table of Contents The Rise of Multimodal LLMs and Efficient Serving with vLLM Introduction to Multimodal LLMs What Are Multimodal LLMs? Milestones in Multimodal LLM Evolution Flamingo (DeepMind, 2022) GPT-4V (OpenAI, 2023) LLaVA (Large Language and Vision Assistant, 2023) BakLLaVA…

Read More of The Rise of Multimodal LLMs and Efficient Serving with vLLM

Preparing the BLIP Backend for Deployment with Redis Caching and FastAPI

September 1, 2025

Table of Contents Preparing the BLIP Backend for Deployment with Redis Caching and FastAPI Introduction What We’re Building in This Lesson Why Redis Caching Matters for Inference What Is Caching? What Is Redis? Configuring Your Development Environment Running a Local…

Read More of Preparing the BLIP Backend for Deployment with Redis Caching and FastAPI

Meet BLIP: The Vision-Language Model Powering Image Captioning

August 25, 2025

Table of Contents Meet BLIP: The Vision-Language Model Powering Image Captioning What Is Image Captioning and Why Is It Challenging? Why It’s Challenging Why Traditional Vision Tasks Aren’t Enough Configuring Your Development Environment A Brief History of Image Captioning Models…

Read More of Meet BLIP: The Vision-Language Model Powering Image Captioning

Computer Vision

Hugging Face Datasets

Synthetic Data Generation

Tutorial

Vision-Language Models

Synthetic Data Generation Using the BLIP and PaliGemma Models

August 11, 2025

Table of Contents Synthetic Data Generation Using the BLIP and PaliGemma Models Why VLM-as-Judge and Synthetic VQA Configuring Your Development Environment Set Up and Imports Download Images Locally Inference with the Salesforce BLIP Model Convert JSON File to the Hugging…

Read More of Synthetic Data Generation Using the BLIP and PaliGemma Models

Vision-Language Models

Generating Video Highlights Using the SmolVLM2 Model

June 30, 2025

Table of Contents Generating Video Highlights Using the SmolVLM2 Model Configuring Your Development Environment Setup and Imports Setup Logger Get Video Duration in Seconds Load Model and Processor Analyze Video Content Determine Highlights Process Video Segment Concatenating Video Scenes into…

Read More of Generating Video Highlights Using the SmolVLM2 Model

Vision-Language Models

SmolVLM to SmolVLM2: Compact Models for Multi-Image VQA

June 23, 2025

Table of Contents SmolVLM to SmolVLM2: Compact Models for Multi-Image VQA SmolVLM 1: A Compact Yet Capable Vision-Language Model What Is SmolVLM? Why SmolVLM? The Three Variants of SmolVLM Architecture Overview Vision Encoder: SigLIP Variants Pixel Shuffle (Space-to-Depth) for Image…

Read More of SmolVLM to SmolVLM2: Compact Models for Multi-Image VQA

Vision-Language Models

AI for Healthcare: Fine-Tuning Google’s PaliGemma 2 for Brain Tumor Detection

May 26, 2025

Table of Contents AI for Healthcare: Fine-Tuning Google’s PaliGemma 2 for Brain Tumor Detection Configuring Your Development Environment Setup and Imports Load the Brain Tumor Dataset Format Dataset to PaliGemma Format Display Train Image and Label COCO Format BBox to…

Read More of AI for Healthcare: Fine-Tuning Google’s PaliGemma 2 for Brain Tumor Detection

Vision-Language Models

Object Detection in Gaming: Fine-Tuning Google’s PaliGemma 2 for Valorant

April 28, 2025

Table of Contents Object Detection in Gaming: Fine-Tuning Google’s PaliGemma 2 for Valorant Configuring Your Development Environment Setup and Imports Load the Valorant Dataset Format Dataset to PaliGemma Format Display Train Image and Label COCO Format BBox to XYXY Format…

Read More of Object Detection in Gaming: Fine-Tuning Google’s PaliGemma 2 for Valorant

Previous Page
Page 1
Page 2
Page 3
Next Page

Build a VLC Playlist Generator with SmolVLM for Video Highlight Tagging

The Rise of Multimodal LLMs and Efficient Serving with vLLM

Preparing the BLIP Backend for Deployment with Redis Caching and FastAPI

Meet BLIP: The Vision-Language Model Powering Image Captioning

Synthetic Data Generation Using the BLIP and PaliGemma Models

Generating Video Highlights Using the SmolVLM2 Model

SmolVLM to SmolVLM2: Compact Models for Multi-Image VQA

AI for Healthcare: Fine-Tuning Google’s PaliGemma 2 for Brain Tumor Detection

Object Detection in Gaming: Fine-Tuning Google’s PaliGemma 2 for Valorant

Topics

Books & Courses

PyImageSearch

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch