Image Classification with Gemini Pro

Home » Blog » Image Classification with Gemini Pro

Image Classification with Gemini Pro

Introduction to Gemini Pro for Image Classification
Transitioning from Image Processing to Image Classification with Gemini Pro
Comparative Analysis: Gemini Pro vs. ChatGPT-3.5 in Image Classification
Exploring the Variants: Gemini Pro and Gemini Pro Vision
Setting Up Gemini Pro for Generating Image Classification Code in PyTorch
Setting Up Gemini Pro for Image Classification
Generating PyTorch Code for Image Classification with Gemini Pro

Preparing Your Development Environment for Gemini Pro

Step 1: Installing the Google Generative AI Library
Step 2: Importing Essential Python Packages
Step 3: Securely Configuring Your API Key

Creating and Configuring the Gemini Pro Model
Enhancing Code Presentation with Markdown
Generating PyTorch Code for Image Classification

Exploring the Differences in Code Generation for Image Classification Between ChatGPT-35 and Gemini Pro

Setting Up Your Environment and Data for Image Classification
Training Models and Understanding the Architecture
Evaluating Model Performance and Analyzing Results
Detailed Comparison: ChatGPT-3.5 vs. Gemini Pro for Image Classification

Summary and Key Takeaways

Citation Information

Image Classification with Gemini Pro

In this tutorial, you’ll learn how to use the Gemini Pro generative model with the Google AI Python SDK (software development kit) to generate code for image classification in PyTorch. We’ll delve into the effectiveness of this generated code, particularly its capability to train on popular datasets like MNIST or CIFAR-10 and achieve decent classification accuracy. Additionally, the tutorial will feature a side-by-side comparison with ChatGPT-3.5, providing valuable insights into each model’s unique code generation abilities and performance nuances.

This lesson is the 3rd in a 6-part series on Gemini Pro:

Introduction to Gemini Pro Vision
Image Processing with Gemini Pro
Image Classification with Gemini Pro (this tutorial)
Lesson 4
Lesson 5
Lesson 6

To learn how to create image classification code in PyTorch using Gemini Pro and compare its performance with ChatGPT-3.5, just keep reading.

Introduction to Gemini Pro for Image Classification

In our previous tutorial, we explored the versatile Gemini Pro, a part of the Google AI Python SDK, focusing on image processing. We introduced Gemini Pro, analyzed the Python code it generated, and compared it with ChatGPT-3.5 and Bard. While Gemini Pro demonstrated proficiency in code generation, it encountered limitations in Google Colab compatibility and had issues with errors and overwriting variables. ChatGPT-3.5, in contrast, showed an edge in producing error-free, Colab-compatible code.

Figure 1 shows the Google AI Studio interface using the Gemini Pro model to generate image classification codes in the PyTorch framework.

Figure 1: Snapshot of Google AI Studio generating code for image classification in PyTorch using Gemini Pro (source: image by the Author). — **Figure 1:** Snapshot of Google AI Studio generating code for image classification in PyTorch using Gemini Pro (source: image by the Author).

Transitioning from Image Processing to Image Classification with Gemini Pro

Expanding from our previous exploration of image processing, we now turn our attention to image classification within the PyTorch framework using Gemini Pro. This tutorial will rigorously examine how Gemini Pro handles classifying images from renowned datasets like MNIST or CIFAR-10, available through Torchvision. We’ll delve into the model’s ability to manage training and testing, along with its effectiveness in generating vital performance metrics (e.g., True Positives, False Positives, and confusion matrices).

Comparative Analysis: Gemini Pro vs. ChatGPT-3.5 in Image Classification

In the second part of our exploration, we’ll conduct a comparative analysis between the neural networks generated by Gemini Pro and those by ChatGPT-3.5. This comparison will not only assess their innovative approaches in code generation but also evaluate which model achieves higher accuracy in image classification. Such an analysis will offer valuable insights into the capabilities and adaptability of each model in this specialized field of AI-driven image analysis.

Exploring the Variants: Gemini Pro and Gemini Pro Vision

As we know from earlier tutorials on Gemini at PyImageSearch, Deepmind released two Gemini variants, which allow users to choose between two distinct models: Gemini Pro and Gemini Pro Vision. For those interested in a deeper dive into Gemini Pro Vision, check out our comprehensive PyImageSearch tutorial titled Introduction to Gemini Pro Vision. Additionally, if you’re keen on understanding more about Gemini Pro and its performance in image processing, be sure to check out our previous tutorial, which offers valuable insights into its capabilities and comparison with other models.

Setting Up Gemini Pro for Generating Image Classification Code in PyTorch

Now, let’s dive into our latest blog post, where we’ll set up Gemini Pro and delve into its capabilities for image classification. We’ll walk through the code generation process for this task and also conduct a detailed comparison with ChatGPT-3.5. This will provide a clearer understanding of how these models stack up against each other in practical AI applications.

Setting Up Gemini Pro for Image Classification

As we previously set up in our tutorial, we’ll continue using the Google AI Python SDK, which grants access to various models, including Gemini Pro.

To obtain your API key, visit Google MakerSuite and sign in with your Google account. Once logged in, you’ll enter Google AI Studio, where you can generate your API key, following the steps provided there. This key is essential for programmatically accessing the Gemini Pro model and other resources offered by the SDK.

Here, you’ll find an option to generate your API key, as illustrated in Figure 2.

Figure 2: Snapshot of Google AI Studio demonstrating API key generation (source: image by the Author). — **Figure 2:** Snapshot of Google AI Studio demonstrating API key generation (source: image by the Author).

Once you’ve generated your API key, it’s important to copy and securely save it. This key will play a crucial role in your work with the Gemini Pro model, especially as you generate image processing code using the model. Keeping it in a safe place ensures you have continuous access to Gemini Pro’s functionalities.

Generating PyTorch Code for Image Classification with Gemini Pro

In this section, we step into the fascinating world of AI-driven code creation. Here, we utilize the Google AI Python SDK to prompt Gemini Pro into crafting PyTorch code for image classification, setting the stage for a compelling comparison with ChatGPT-3.5’s code generation.

This part of our exploration will not only showcase Gemini Pro’s abilities but also offer a side-by-side analysis with ChatGPT-3.5, highlighting the strengths and innovative approaches of each model in handling a complex task like image classification through PyTorch.

Preparing Your Development Environment for Gemini Pro

Step 1: Installing the Google Generative AI Library

We start by installing the google-generativeai library using pip that would allow us to interact with Google’s generative models and, especially, the Gemini Pro model in Python, as shown below:

!pip install -q -U google-generativeai

Line 1: Installs the google-generativeai library

Step 2: Importing Essential Python Packages

import textwrap
import google.generativeai as genai
from IPython.display import Markdown

Lines 1-3: Imports three key Python packages. textwrap is employed for its text manipulation capabilities, essential for formatting. google.generativeai, abbreviated as genai, forms the core module, offering a range of AI functionalities. Lastly, IPython.display‘s Markdown is included, primarily for enhancing the display of outputs within the Colab notebook. Together, these packages form the foundation for the code’s AI and display functionalities.

Step 3: Securely Configuring Your API Key

# Used to securely store your API key
from google.colab import userdata
# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY=userdata.get("GEMINI_API_KEY")
genai.configure(api_key=GOOGLE_API_KEY)

On Lines 5-9, the google.colab library’s userdata module is used to securely fetch the “GEMINI_API_KEY”, which is then stored in GOOGLE_API_KEY. An alternative method to retrieve the API key could be through os.getenv('GOOGLE_API_KEY').

The script then uses genai.configure(api_key=GOOGLE_API_KEY) to set up the GenAI library with this API key, ensuring authenticated access to its functionalities. This approach is particularly beneficial in Google Colab notebooks for securely managing API keys.

Creating and Configuring the Gemini Pro Model

model = genai.GenerativeModel("gemini-pro")

On Line 11, an instance of the GenerativeModel class is created using the genai library, specifically initializing it with the “gemini-pro” model. This action assigns the Gemini Pro model to the model variable, enabling its application in various AI-driven activities (e.g., text generation and data analysis). This step is crucial for leveraging Gemini Pro’s functionalities within the script.

Here, we’re opting to use the default settings of the GenerativeModel, as we’re not specifying any optional parameters (e.g., generation_config and safety_settings). This approach simplifies the setup and allows us to utilize the model’s built-in configurations.

Enhancing Code Presentation with Markdown

def to_markdown(text):
    text = text.replace("•", "  *")
    return Markdown(textwrap.indent(text, "> ", predicate=lambda _: True))

Lines 13-15 introduce a to_markdown helper function, transforming a string into Markdown format, ideal for Jupyter notebooks. It starts by converting bullet points into Markdown’s asterisk syntax, followed by indenting each line with a blockquote symbol using textwrap.indent, applying this uniformly across all lines.

The final output is a Markdown object, well-suited for display in Markdown-compatible environments. This enhances the text’s presentation, making it more suitable for environments like Jupyter notebooks that support Markdown rendering.

Generating PyTorch Code for Image Classification

response = model.generate_content(
    "Write a image multiclass classification code in pytorch framework using a public dataset"
    " I would be training and testing the image classification code in Google colab",
    stream=True
)
response.resolve()
to_markdown(response.text)

On Lines 17-23, the script employs model.generate_content to create code based on a specific prompt about writing multiclass classification code in the PyTorch framework using a public dataset intended for use in Google Colab. The stream=True setting indicates that the model’s response is streamed, and to_markdown(response.text) is used afterward to convert the text response into Markdown format. This enhances the text’s clarity and layout, making it well-suited for display in Markdown-friendly environments, like Jupyter notebooks.

# Import Necessary Libraries
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms, datasets

# Load the Public Dataset
train_data = datasets.ImageFolder(root='path/to/train', transform=transforms.ToTensor())
test_data = datasets.ImageFolder(root='path/to/test', transform=transforms.ToTensor())

# Initialize the DataLoaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=32, shuffle=True)

# Define the Convolutional Neural Network (CNN) Model
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        # Convolutional Layers
        self.conv1 = nn.Conv2d(3, 6, 5)  # Input channels: 3 (RGB), Output channels: 6, Kernel size: 5x5
        self.pool = nn.MaxPool2d(2, 2)    # Kernel size: 2x2, Stride: 2
        self.conv2 = nn.Conv2d(6, 16, 5)  # Input channels: 6, Output channels: 16, Kernel size: 5x5
        self.pool = nn.MaxPool2d(2, 2)    # Kernel size: 2x2, Stride: 2

        # Fully Connected Layers
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Pass the input through the convolutional layers
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))

        # Flatten the feature maps
        x = x.view(x.size(0), -1)

        # Pass the flattened feature maps through the fully connected layers
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Initialize the CNN Model and Specify the Loss Function and Optimizer
model = CNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# Train the Model
for epoch in range(20):  # Adjust the number of epochs as needed
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        # Get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)

        # Calculate the loss
        loss = criterion(outputs, labels)

        # Backward pass
        loss.backward()

        # Update the parameters
        optimizer.step()

        # Print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # Print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

# Test the Model
correct = 0
total = 0
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

# Save the Trained Model
torch.save(model.state_dict(), 'image_classifier.pt')

Upon reviewing the PyTorch code generated by Gemini Pro for image classification, it generally aligns well with best practices and shows a structured approach. However, several critical areas for improvement were identified, particularly those that could lead to code errors:

Use of Public Dataset: The code does not currently utilize a specified public dataset like MNIST or CIFAR-10, as required by the prompt. Incorporating one of these datasets using Torchvision’s dataset utilities would align the script with the prompt’s requirements and is a significant oversight.
Missing import torch.nn.functional as F: The F.relu function is used in the forward method, but import torch.nn.functional as F is not included at the beginning of the script.
Dataset Path Specifications: Incorrect dataset paths will impede the model’s ability to train and test properly. Be sure to replace 'path/to/train' and 'path/to/test' with the actual paths to your train and test datasets.

Other points for improvement, while important, are less likely to cause immediate functional errors:

Duplicate Pooling Layer: The self.pool layer is defined twice in the CNN class. While this doesn’t cause a functional error (since it’s the same operation repeated), it’s redundant and can be defined just once.
Output Layer Dimension: The output layer self.fc3 in the CNN class has 10 neurons, which implies that the model is designed for a dataset with 10 classes. Ensure this aligns with the number of classes in your dataset (e.g., MNIST or CIFAR-10).
Flattening Operation: The flattening operation in the forward method (x = x.view(x.size(0), -1)) assumes a specific size of the feature maps after the convolutional layers. Be sure that the size calculation (16 * 5 * 5) correctly matches the output size of the last convolutional layer.
Print Statement in Training Loop: The conditional if i % 2000 == 1999 in the training loop might not be reached, depending on the size of your dataset and batch size. Adjust this condition to suit the number of batches in your training data.
Testing Accuracy Print Statement: The message ‘Accuracy of the network on the 10000 test images: %d %%’ assumes there are 10,000 test images. This should be modified to reflect the actual size of your test dataset.

Addressing these areas is essential to enhance the model’s accuracy and functionality.

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
100%|██████████| 170498071/170498071 [00:05<00:00, 29302880.52it/s]
Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified
Epoch 1, Loss: 2.213340243262708
Epoch 2, Loss: 1.614285399999155
Epoch 3, Loss: 1.4277218132067824
Epoch 4, Loss: 1.3255487715496737
Epoch 5, Loss: 1.2489837502579555
Epoch 6, Loss: 1.1976356018534706
Epoch 7, Loss: 1.1457488671745486
Epoch 8, Loss: 1.097599425660375
Epoch 9, Loss: 1.0516416126352441
Epoch 10, Loss: 1.0136031122768627
Accuracy of the network on the 10000 test images: 57 %

After running the corrected PyTorch code on the CIFAR-10 dataset and adding the necessary import statement, the output indicates a successful model training and testing process.

The logs indicate that the CIFAR-10 dataset was successfully downloaded, extracted, and verified. This process involves retrieving the dataset from an online source, saving it locally, and then extracting the contents for use in the model training and testing. The confirmation of the files being “already downloaded and verified” suggests that the dataset was set up correctly for the subsequent steps in the code.

The model’s training loss decreased progressively over 10 epochs, indicating improvement in learning. However, the final accuracy on the test set was 57%, suggesting that while the model learned from the data, its predictive performance was poor.

The 57% low accuracy of the network on the CIFAR-10 dataset could primarily be attributed to the model’s architecture. While factors like hyperparameter settings also play a role, the architecture is a key determinant in how well the model can learn and generalize from the training data. Adjusting the model’s structure may be necessary for significant improvements in accuracy.

Exploring the Differences in Code Generation for Image Classification Between ChatGPT-3.5 and Gemini Pro

In this section, we shift gears to examine how ChatGPT-3.5 handles the same image classification task. Here, we provide ChatGPT-3.5 with the identical prompt given to Gemini Pro and observe its approach to generating PyTorch code for image classification. The focus is not only on the code generation process but also on how well the model performs on test data. This comparison aims to highlight the differences in efficiency, effectiveness, and accuracy between the two AI models in tackling a similar challenge.

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Check if a GPU is available, if not, use CPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Define transformations for data augmentation and normalization
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

# Load CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=64,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc1 = nn.Linear(128 * 8 * 8, 512)
        self.fc2 = nn.Linear(512, 10)

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        x = x.view(-1, 128 * 8 * 8)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

net = Net()
net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

# Training loop
for epoch in range(10):  # You can adjust the number of epochs as needed
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    print(f"Epoch {epoch+1}, Loss: {running_loss / len(trainloader)}")

print("Training finished")

net.eval()
all_labels = []
all_predictions = []

with torch.no_grad():
    for data in testloader:
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)

        outputs = net(inputs)
        _, predicted = torch.max(outputs, 1)

        all_labels.extend(labels.cpu().numpy())
        all_predictions.extend(predicted.cpu().numpy())

accuracy = accuracy_score(all_labels, all_predictions)
confusion = confusion_matrix(all_labels, all_predictions)
classification_rep = classification_report(all_labels, all_predictions, target_names=classes)

print(f"Accuracy: {accuracy * 100:.2f}%")
print("Confusion Matrix:")
print(confusion)
print("Classification Report:")
print(classification_rep)

The code generated by ChatGPT-3.5 for image classification in PyTorch appears comprehensive and well-structured. Let’s do a detailed review.

Setting Up Your Environment and Data for Image Classification

Library Import and Device Setup: Correctly imports necessary libraries and optimizes for GPU usage.
Data Augmentation and Normalization: Implements effective techniques for training and testing datasets.
Dataset Loading and DataLoader Initialization: Accurately loads and prepares the CIFAR-10 dataset for both training and testing.

Training Models and Understanding the Architecture

Neural Network Architecture: Features a more complex CNN architecture than Gemini Pro, potentially enhancing learning.
Training Loop: Well-structured with loss calculation, optimizer steps, and torch.relu for activation.
GPU Utilization: Efficiently uses GPU for training and testing, boosting performance.

Evaluating Model Performance and Analyzing Results

Evaluation Metrics: Evaluates the model with accuracy, confusion matrix, and classification report.
Final Performance Metrics: Offers a detailed analysis of model performance with accuracy, confusion matrix, and classification report.

Detailed Comparison: ChatGPT-3.5 vs. Gemini Pro for Image Classification

Public Dataset Usage: Unlike Gemini Pro, ChatGPT-3.5’s code correctly incorporates the CIFAR-10 dataset.
Data Augmentation: ChatGPT-3.5 includes data augmentation, which is absent in Gemini Pro’s code.
Complex Network Architecture: ChatGPT-3.5’s network is more intricate, suggesting improved learning capabilities.
Detailed Performance Metrics: Provides a more comprehensive performance evaluation than Gemini Pro.

Overall, ChatGPT-3.5’s approach to image classification showcases a well-rounded and potentially more effective solution than Gemini Pro, particularly in terms of dataset handling, model complexity, and depth of performance analysis.

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
100%|██████████| 170498071/170498071 [00:03<00:00, 43389823.48it/s]
Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified
Epoch 1, Loss: 1.656345052792288
Epoch 2, Loss: 1.2884532127081585
Epoch 3, Loss: 1.0894466095873157
Epoch 4, Loss: 0.9669426654458351
Epoch 5, Loss: 0.8824315952218097
Epoch 6, Loss: 0.8171949713583797
Epoch 7, Loss: 0.7658578648667811
Epoch 8, Loss: 0.7242285415644536
Epoch 9, Loss: 0.6837977789856894
Epoch 10, Loss: 0.6462546251618954
Training finished
Accuracy: 77.29%
Confusion Matrix:
[[796   9  51  27   9  19  13   5  41  30]
 [ 14 875   6  12   2   5   9   1  19  57]
 [ 46   2 653  52  60  77  80  21   3   6]
 [ 16   5  45 564  34 227  92   8   2   7]
 [ 13   1  50  54 726  43  69  41   3   0]
 [  8   2  20 131  31 754  29  22   1   2]
 [  4   1  28  43  19  30 873   0   2   0]
 [  6   0  33  32  40  90   9 786   0   4]
 [ 50  16  11  15   4  17   7   5 857  18]
 [ 31  51   5  11   4  12   9  11  21 845]]
Classification Report:
              precision    recall  f1-score   support

       plane       0.81      0.80      0.80      1000
         car       0.91      0.88      0.89      1000
        bird       0.72      0.65      0.69      1000
         cat       0.60      0.56      0.58      1000
        deer       0.78      0.73      0.75      1000
         dog       0.59      0.75      0.66      1000
        frog       0.73      0.87      0.80      1000
       horse       0.87      0.79      0.83      1000
        ship       0.90      0.86      0.88      1000
       truck       0.87      0.84      0.86      1000

    accuracy                           0.77     10000
   macro avg       0.78      0.77      0.77     10000
weighted avg       0.78      0.77      0.77     10000

The initial results from running the ChatGPT-3.5 generated code show a well-managed process, with the CIFAR-10 dataset being downloaded, extracted, and validated accurately. The training demonstrated a consistent reduction in loss across 10 epochs, indicating effective learning.

In terms of evaluation, the model attained a notable 77.29% accuracy on the test set, which is considerably higher than Gemini Pro’s achievement of 57% accuracy. This significant difference underscores the effectiveness of ChatGPT-3.5’s approach. Additionally, the code included comprehensive evaluation metrics (e.g., confusion matrix and classification report), offering an in-depth understanding of the model’s performance across various classes.

It’s noteworthy that the code from ChatGPT-3.5 was executed without any human modifications, demonstrating its robustness and reliability. This contrasts with the Gemini Pro code, which required specific fixes, such as correcting an import error, adding the CIFAR-10 dataset, and modifying epoch reporting for it to function correctly. This comparison underscores ChatGPT-3.5’s proficiency in generating ready-to-use, reliable code for complex tasks like image classification.

What's next? We recommend PyImageSearch University.

Course information:
86+ total classes • 115+ hours hours of on-demand code walkthrough videos • Last updated: October 2025
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 86+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 86 Certificates of Completion
✓ 115+ hours hours of on-demand video
✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary and Key Takeaways

This comprehensive post explores image classification using Gemini Pro and its comparison with ChatGPT-3.5. Initially, it covers the setup of Gemini Pro and then delves into the use of Gemini Pro for image classification, revealing its limitations and the need for code adjustments. Gemini Pro’s generated code, while fundamentally sound, required modifications for integrating the CIFAR-10 dataset, fixing import errors, and correcting print statements in training loops. These adjustments were essential for the model to achieve a moderate 57% accuracy rate.

Contrastingly, ChatGPT-3.5’s code for a similar task demonstrated its robustness by requiring no alterations and achieving a higher accuracy rate of 77.29%. This notable difference in performance and the readiness of the code highlight ChatGPT-3.5’s advanced capabilities in creating efficient, accurate code for complex AI tasks, marking an area for improvement in Gemini Pro’s code generation process.

Citation Information

Sharma, A. “Image Classification with Gemini Pro,” PyImageSearch, P. Chugh, A. R. Gosthipaty, S. Huot, K. Kidriavsteva, and R. Raha, eds., 2024, https://pyimg.co/melcg

@incollection{Sharma_2024_Image-Classification-with-Gemini-Pro,
  author = {Aditya Sharma},
  title = {Image Classification with Gemini Pro},
  booktitle = {PyImageSearch},
  editor = {Puneet Chugh and Aritra Roy Gosthipaty and Susan Huot and Kseniia Kidriavsteva and Ritwik Raha},
  year = {2024},
  url = {https://pyimg.co/melcg},
}

Unleash the potential of computer vision with Roboflow - Free!

Step into the realm of the future by signing up or logging into your Roboflow account. Unlock a wealth of innovative dataset libraries and revolutionize your computer vision operations.
Jumpstart your journey by choosing from our broad array of datasets, or benefit from PyimageSearch’s comprehensive library, crafted to cater to a wide range of requirements.
Transfer your data to Roboflow in any of the 40+ compatible formats. Leverage cutting-edge model architectures for training, and deploy seamlessly across diverse platforms, including API, NVIDIA, browser, iOS, and beyond. Integrate our platform effortlessly with your applications or your favorite third-party tools.
Equip yourself with the ability to train a potent computer vision model in a mere afternoon. With a few images, you can import data from any source via API, annotate images using our superior cloud-hosted tool, kickstart model training with a single click, and deploy the model via a hosted API endpoint. Tailor your process by opting for a code-centric approach, leveraging our intuitive, cloud-based UI, or combining both to fit your unique needs.
Embark on your journey today with absolutely no credit card required. Step into the future with Roboflow.

Join Roboflow Now

Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF

Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.

Table of Contents

Image Classification with Gemini Pro

Introduction to Gemini Pro for Image Classification

Transitioning from Image Processing to Image Classification with Gemini Pro

Comparative Analysis: Gemini Pro vs. ChatGPT-3.5 in Image Classification

Exploring the Variants: Gemini Pro and Gemini Pro Vision

Setting Up Gemini Pro for Generating Image Classification Code in PyTorch

Setting Up Gemini Pro for Image Classification

Generating PyTorch Code for Image Classification with Gemini Pro

Preparing Your Development Environment for Gemini Pro

Step 1: Installing the Google Generative AI Library

Step 2: Importing Essential Python Packages

Step 3: Securely Configuring Your API Key

Creating and Configuring the Gemini Pro Model

Enhancing Code Presentation with Markdown

Generating PyTorch Code for Image Classification

Exploring the Differences in Code Generation for Image Classification Between ChatGPT-3.5 and Gemini Pro

Setting Up Your Environment and Data for Image Classification

Training Models and Understanding the Architecture

Evaluating Model Performance and Analyzing Results

Detailed Comparison: ChatGPT-3.5 vs. Gemini Pro for Image Classification

What's next? We recommend PyImageSearch University.

Summary and Key Takeaways

Citation Information

Unleash the potential of computer vision with Roboflow - Free!

Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF

About the Author

Comment section

PyImageSearch University

Python argparse, and command line arguments

Traffic Sign Classification with Keras and Deep Learning

Torch Hub Series #1: Introduction to Torch Hub

Topics

Books & Courses

PyImageSearch

Table of Contents

What's next? We recommend PyImageSearch University.

Unleash the potential of computer vision with Roboflow - Free!

Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF

About the Author

Image Processing with Gemini Pro

Understanding Tasks in Diffusers: Part 1

Comment section

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?