Table of Contents
- Image Classification with Gemini Pro
- Introduction to Gemini Pro for Image Classification
- Transitioning from Image Processing to Image Classification with Gemini Pro
- Comparative Analysis: Gemini Pro vs. ChatGPT-3.5 in Image Classification
- Exploring the Variants: Gemini Pro and Gemini Pro Vision
- Setting Up Gemini Pro for Generating Image Classification Code in PyTorch
- Setting Up Gemini Pro for Image Classification
- Generating PyTorch Code for Image Classification with Gemini Pro
- Preparing Your Development Environment for Gemini Pro
- Step 1: Installing the Google Generative AI Library
- Step 2: Importing Essential Python Packages
- Step 3: Securely Configuring Your API Key
- Creating and Configuring the Gemini Pro Model
- Enhancing Code Presentation with Markdown
- Generating PyTorch Code for Image Classification
- Exploring the Differences in Code Generation for Image Classification Between ChatGPT-35 and Gemini Pro
- Summary and Key Takeaways
Image Classification with Gemini Pro
In this tutorial, you’ll learn how to use the Gemini Pro generative model with the Google AI Python SDK (software development kit) to generate code for image classification in PyTorch. We’ll delve into the effectiveness of this generated code, particularly its capability to train on popular datasets like MNIST or CIFAR-10 and achieve decent classification accuracy. Additionally, the tutorial will feature a side-by-side comparison with ChatGPT-3.5, providing valuable insights into each model’s unique code generation abilities and performance nuances.
This lesson is the 3rd in a 6-part series on Gemini Pro:
- Introduction to Gemini Pro Vision
- Image Processing with Gemini Pro
- Image Classification with Gemini Pro (this tutorial)
- Lesson 4
- Lesson 5
- Lesson 6
To learn how to create image classification code in PyTorch using Gemini Pro and compare its performance with ChatGPT-3.5, just keep reading.
Introduction to Gemini Pro for Image Classification
In our previous tutorial, we explored the versatile Gemini Pro, a part of the Google AI Python SDK, focusing on image processing. We introduced Gemini Pro, analyzed the Python code it generated, and compared it with ChatGPT-3.5 and Bard. While Gemini Pro demonstrated proficiency in code generation, it encountered limitations in Google Colab compatibility and had issues with errors and overwriting variables. ChatGPT-3.5, in contrast, showed an edge in producing error-free, Colab-compatible code.
Figure 1 shows the Google AI Studio interface using the Gemini Pro model to generate image classification codes in the PyTorch framework.
Transitioning from Image Processing to Image Classification with Gemini Pro
Expanding from our previous exploration of image processing, we now turn our attention to image classification within the PyTorch framework using Gemini Pro. This tutorial will rigorously examine how Gemini Pro handles classifying images from renowned datasets like MNIST or CIFAR-10, available through Torchvision. We’ll delve into the model’s ability to manage training and testing, along with its effectiveness in generating vital performance metrics (e.g., True Positives, False Positives, and confusion matrices).
Comparative Analysis: Gemini Pro vs. ChatGPT-3.5 in Image Classification
In the second part of our exploration, we’ll conduct a comparative analysis between the neural networks generated by Gemini Pro and those by ChatGPT-3.5. This comparison will not only assess their innovative approaches in code generation but also evaluate which model achieves higher accuracy in image classification. Such an analysis will offer valuable insights into the capabilities and adaptability of each model in this specialized field of AI-driven image analysis.
Exploring the Variants: Gemini Pro and Gemini Pro Vision
As we know from earlier tutorials on Gemini at PyImageSearch, Deepmind released two Gemini variants, which allow users to choose between two distinct models: Gemini Pro and Gemini Pro Vision. For those interested in a deeper dive into Gemini Pro Vision, check out our comprehensive PyImageSearch tutorial titled Introduction to Gemini Pro Vision. Additionally, if you’re keen on understanding more about Gemini Pro and its performance in image processing, be sure to check out our previous tutorial, which offers valuable insights into its capabilities and comparison with other models.
Setting Up Gemini Pro for Generating Image Classification Code in PyTorch
Now, let’s dive into our latest blog post, where we’ll set up Gemini Pro and delve into its capabilities for image classification. We’ll walk through the code generation process for this task and also conduct a detailed comparison with ChatGPT-3.5. This will provide a clearer understanding of how these models stack up against each other in practical AI applications.
Setting Up Gemini Pro for Image Classification
As we previously set up in our tutorial, we’ll continue using the Google AI Python SDK, which grants access to various models, including Gemini Pro.
To obtain your API key, visit Google MakerSuite and sign in with your Google account. Once logged in, you’ll enter Google AI Studio, where you can generate your API key, following the steps provided there. This key is essential for programmatically accessing the Gemini Pro model and other resources offered by the SDK.
Here, you’ll find an option to generate your API key, as illustrated in Figure 2.
Once you’ve generated your API key, it’s important to copy and securely save it. This key will play a crucial role in your work with the Gemini Pro model, especially as you generate image processing code using the model. Keeping it in a safe place ensures you have continuous access to Gemini Pro’s functionalities.
Generating PyTorch Code for Image Classification with Gemini Pro
In this section, we step into the fascinating world of AI-driven code creation. Here, we utilize the Google AI Python SDK to prompt Gemini Pro into crafting PyTorch code for image classification, setting the stage for a compelling comparison with ChatGPT-3.5’s code generation.
This part of our exploration will not only showcase Gemini Pro’s abilities but also offer a side-by-side analysis with ChatGPT-3.5, highlighting the strengths and innovative approaches of each model in handling a complex task like image classification through PyTorch.
Preparing Your Development Environment for Gemini Pro
Step 1: Installing the Google Generative AI Library
We start by installing the google-generativeai
library using pip
that would allow us to interact with Google’s generative models and, especially, the Gemini Pro model in Python, as shown below:
!pip install -q -U google-generativeai
Line 1: Installs the google-generativeai
library
Step 2: Importing Essential Python Packages
import textwrap import google.generativeai as genai from IPython.display import Markdown
Lines 1-3: Imports three key Python packages. textwrap
is employed for its text manipulation capabilities, essential for formatting. google.generativeai
, abbreviated as genai
, forms the core module, offering a range of AI functionalities. Lastly, IPython.display
‘s Markdown
is included, primarily for enhancing the display of outputs within the Colab notebook. Together, these packages form the foundation for the code’s AI and display functionalities.
Step 3: Securely Configuring Your API Key
# Used to securely store your API key from google.colab import userdata # Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable. GOOGLE_API_KEY=userdata.get("GEMINI_API_KEY") genai.configure(api_key=GOOGLE_API_KEY)
On Lines 5-9, the google.colab
library’s userdata
module is used to securely fetch the “GEMINI_API_KEY”, which is then stored in GOOGLE_API_KEY
. An alternative method to retrieve the API key could be through os.getenv('GOOGLE_API_KEY')
.
The script then uses genai.configure(api_key=GOOGLE_API_KEY)
to set up the GenAI library with this API key, ensuring authenticated access to its functionalities. This approach is particularly beneficial in Google Colab notebooks for securely managing API keys.
Creating and Configuring the Gemini Pro Model
model = genai.GenerativeModel("gemini-pro")
On Line 11, an instance of the GenerativeModel
class is created using the genai
library, specifically initializing it with the “gemini-pro” model. This action assigns the Gemini Pro model to the model
variable, enabling its application in various AI-driven activities (e.g., text generation and data analysis). This step is crucial for leveraging Gemini Pro’s functionalities within the script.
Here, we’re opting to use the default settings of the GenerativeModel
, as we’re not specifying any optional parameters (e.g., generation_config
and safety_settings
). This approach simplifies the setup and allows us to utilize the model’s built-in configurations.
Enhancing Code Presentation with Markdown
def to_markdown(text): text = text.replace("•", " *") return Markdown(textwrap.indent(text, "> ", predicate=lambda _: True))
Lines 13-15 introduce a to_markdown
helper function, transforming a string into Markdown format, ideal for Jupyter notebooks. It starts by converting bullet points into Markdown’s asterisk syntax, followed by indenting each line with a blockquote symbol using textwrap.indent
, applying this uniformly across all lines.
The final output is a Markdown object, well-suited for display in Markdown-compatible environments. This enhances the text’s presentation, making it more suitable for environments like Jupyter notebooks that support Markdown rendering.
Generating PyTorch Code for Image Classification
response = model.generate_content( "Write a image multiclass classification code in pytorch framework using a public dataset" " I would be training and testing the image classification code in Google colab", stream=True ) response.resolve() to_markdown(response.text)
On Lines 17-23, the script employs model.generate_content
to create code based on a specific prompt about writing multiclass classification code in the PyTorch framework using a public dataset intended for use in Google Colab. The stream=True
setting indicates that the model’s response is streamed, and to_markdown(response.text)
is used afterward to convert the text response into Markdown format. This enhances the text’s clarity and layout, making it well-suited for display in Markdown-friendly environments, like Jupyter notebooks.
# Import Necessary Libraries import torch import torchvision import torch.nn as nn import torch.optim as optim from torchvision import transforms, datasets # Load the Public Dataset train_data = datasets.ImageFolder(root='path/to/train', transform=transforms.ToTensor()) test_data = datasets.ImageFolder(root='path/to/test', transform=transforms.ToTensor()) # Initialize the DataLoaders train_loader = torch.utils.data.DataLoader(train_data, batch_size=32, shuffle=True) test_loader = torch.utils.data.DataLoader(test_data, batch_size=32, shuffle=True) # Define the Convolutional Neural Network (CNN) Model class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() # Convolutional Layers self.conv1 = nn.Conv2d(3, 6, 5) # Input channels: 3 (RGB), Output channels: 6, Kernel size: 5x5 self.pool = nn.MaxPool2d(2, 2) # Kernel size: 2x2, Stride: 2 self.conv2 = nn.Conv2d(6, 16, 5) # Input channels: 6, Output channels: 16, Kernel size: 5x5 self.pool = nn.MaxPool2d(2, 2) # Kernel size: 2x2, Stride: 2 # Fully Connected Layers self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): # Pass the input through the convolutional layers x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) # Flatten the feature maps x = x.view(x.size(0), -1) # Pass the flattened feature maps through the fully connected layers x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x # Initialize the CNN Model and Specify the Loss Function and Optimizer model = CNN() criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) # Train the Model for epoch in range(20): # Adjust the number of epochs as needed running_loss = 0.0 for i, data in enumerate(train_loader, 0): # Get the inputs; data is a list of [inputs, labels] inputs, labels = data # Zero the parameter gradients optimizer.zero_grad() # Forward pass outputs = model(inputs) # Calculate the loss loss = criterion(outputs, labels) # Backward pass loss.backward() # Update the parameters optimizer.step() # Print statistics running_loss += loss.item() if i % 2000 == 1999: # Print every 2000 mini-batches print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 2000)) running_loss = 0.0 # Test the Model correct = 0 total = 0 with torch.no_grad(): for data in test_loader: images, labels = data outputs = model(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print('Accuracy of the network on the 10000 test images: %d %%' % ( 100 * correct / total)) # Save the Trained Model torch.save(model.state_dict(), 'image_classifier.pt')
Upon reviewing the PyTorch code generated by Gemini Pro for image classification, it generally aligns well with best practices and shows a structured approach. However, several critical areas for improvement were identified, particularly those that could lead to code errors:
- Use of Public Dataset: The code does not currently utilize a specified public dataset like MNIST or CIFAR-10, as required by the prompt. Incorporating one of these datasets using Torchvision’s dataset utilities would align the script with the prompt’s requirements and is a significant oversight.
- Missing
import torch.nn.functional as F
: TheF.relu
function is used in theforward
method, butimport torch.nn.functional as F
is not included at the beginning of the script. - Dataset Path Specifications: Incorrect dataset paths will impede the model’s ability to train and test properly. Be sure to replace
'path/to/train'
and'path/to/test'
with the actual paths to your train and test datasets.
Other points for improvement, while important, are less likely to cause immediate functional errors:
- Duplicate Pooling Layer: The
self.pool
layer is defined twice in theCNN
class. While this doesn’t cause a functional error (since it’s the same operation repeated), it’s redundant and can be defined just once. - Output Layer Dimension: The output layer
self.fc3
in theCNN
class has 10 neurons, which implies that the model is designed for a dataset with 10 classes. Ensure this aligns with the number of classes in your dataset (e.g., MNIST or CIFAR-10). - Flattening Operation: The flattening operation in the
forward
method (x = x.view(x.size(0), -1)
) assumes a specific size of the feature maps after the convolutional layers. Be sure that the size calculation (16 * 5 * 5
) correctly matches the output size of the last convolutional layer. - Print Statement in Training Loop: The conditional
if i % 2000 == 1999
in the training loop might not be reached, depending on the size of your dataset and batch size. Adjust this condition to suit the number of batches in your training data. - Testing Accuracy Print Statement: The message ‘Accuracy of the network on the 10000 test images: %d %%’ assumes there are 10,000 test images. This should be modified to reflect the actual size of your test dataset.
Addressing these areas is essential to enhance the model’s accuracy and functionality.
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz 100%|██████████| 170498071/170498071 [00:05<00:00, 29302880.52it/s] Extracting ./data/cifar-10-python.tar.gz to ./data Files already downloaded and verified Epoch 1, Loss: 2.213340243262708 Epoch 2, Loss: 1.614285399999155 Epoch 3, Loss: 1.4277218132067824 Epoch 4, Loss: 1.3255487715496737 Epoch 5, Loss: 1.2489837502579555 Epoch 6, Loss: 1.1976356018534706 Epoch 7, Loss: 1.1457488671745486 Epoch 8, Loss: 1.097599425660375 Epoch 9, Loss: 1.0516416126352441 Epoch 10, Loss: 1.0136031122768627 Accuracy of the network on the 10000 test images: 57 %
After running the corrected PyTorch code on the CIFAR-10 dataset and adding the necessary import statement, the output indicates a successful model training and testing process.
The logs indicate that the CIFAR-10 dataset was successfully downloaded, extracted, and verified. This process involves retrieving the dataset from an online source, saving it locally, and then extracting the contents for use in the model training and testing. The confirmation of the files being “already downloaded and verified” suggests that the dataset was set up correctly for the subsequent steps in the code.
The model’s training loss decreased progressively over 10 epochs, indicating improvement in learning. However, the final accuracy on the test set was 57%, suggesting that while the model learned from the data, its predictive performance was poor.
The 57% low accuracy of the network on the CIFAR-10 dataset could primarily be attributed to the model’s architecture. While factors like hyperparameter settings also play a role, the architecture is a key determinant in how well the model can learn and generalize from the training data. Adjusting the model’s structure may be necessary for significant improvements in accuracy.
Exploring the Differences in Code Generation for Image Classification Between ChatGPT-3.5 and Gemini Pro
In this section, we shift gears to examine how ChatGPT-3.5 handles the same image classification task. Here, we provide ChatGPT-3.5 with the identical prompt given to Gemini Pro and observe its approach to generating PyTorch code for image classification. The focus is not only on the code generation process but also on how well the model performs on test data. This comparison aims to highlight the differences in efficiency, effectiveness, and accuracy between the two AI models in tackling a similar challenge.
import torch import torch.nn as nn import torch.optim as optim import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import numpy as np from sklearn.metrics import accuracy_score, confusion_matrix, classification_report # Check if a GPU is available, if not, use CPU device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # Define transformations for data augmentation and normalization transform_train = transforms.Compose([ transforms.RandomHorizontalFlip(), transforms.RandomCrop(32, padding=4), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), ]) transform_test = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), ]) # Load CIFAR-10 dataset trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train) trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True, num_workers=2) testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test) testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False, num_workers=2) classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1) self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1) self.pool = nn.MaxPool2d(kernel_size=2, stride=2) self.fc1 = nn.Linear(128 * 8 * 8, 512) self.fc2 = nn.Linear(512, 10) def forward(self, x): x = self.pool(torch.relu(self.conv1(x))) x = self.pool(torch.relu(self.conv2(x))) x = x.view(-1, 128 * 8 * 8) x = torch.relu(self.fc1(x)) x = self.fc2(x) return x net = Net() net.to(device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9) # Training loop for epoch in range(10): # You can adjust the number of epochs as needed running_loss = 0.0 for i, data in enumerate(trainloader, 0): inputs, labels = data inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() print(f"Epoch {epoch+1}, Loss: {running_loss / len(trainloader)}") print("Training finished") net.eval() all_labels = [] all_predictions = [] with torch.no_grad(): for data in testloader: inputs, labels = data inputs, labels = inputs.to(device), labels.to(device) outputs = net(inputs) _, predicted = torch.max(outputs, 1) all_labels.extend(labels.cpu().numpy()) all_predictions.extend(predicted.cpu().numpy()) accuracy = accuracy_score(all_labels, all_predictions) confusion = confusion_matrix(all_labels, all_predictions) classification_rep = classification_report(all_labels, all_predictions, target_names=classes) print(f"Accuracy: {accuracy * 100:.2f}%") print("Confusion Matrix:") print(confusion) print("Classification Report:") print(classification_rep)
The code generated by ChatGPT-3.5 for image classification in PyTorch appears comprehensive and well-structured. Let’s do a detailed review.
Setting Up Your Environment and Data for Image Classification
- Library Import and Device Setup: Correctly imports necessary libraries and optimizes for GPU usage.
- Data Augmentation and Normalization: Implements effective techniques for training and testing datasets.
- Dataset Loading and DataLoader Initialization: Accurately loads and prepares the CIFAR-10 dataset for both training and testing.
Training Models and Understanding the Architecture
- Neural Network Architecture: Features a more complex CNN architecture than Gemini Pro, potentially enhancing learning.
- Training Loop: Well-structured with loss calculation, optimizer steps, and
torch.relu
for activation. - GPU Utilization: Efficiently uses GPU for training and testing, boosting performance.
Evaluating Model Performance and Analyzing Results
- Evaluation Metrics: Evaluates the model with accuracy, confusion matrix, and classification report.
- Final Performance Metrics: Offers a detailed analysis of model performance with accuracy, confusion matrix, and classification report.
Detailed Comparison: ChatGPT-3.5 vs. Gemini Pro for Image Classification
- Public Dataset Usage: Unlike Gemini Pro, ChatGPT-3.5’s code correctly incorporates the CIFAR-10 dataset.
- Data Augmentation: ChatGPT-3.5 includes data augmentation, which is absent in Gemini Pro’s code.
- Complex Network Architecture: ChatGPT-3.5’s network is more intricate, suggesting improved learning capabilities.
- Detailed Performance Metrics: Provides a more comprehensive performance evaluation than Gemini Pro.
Overall, ChatGPT-3.5’s approach to image classification showcases a well-rounded and potentially more effective solution than Gemini Pro, particularly in terms of dataset handling, model complexity, and depth of performance analysis.
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz 100%|██████████| 170498071/170498071 [00:03<00:00, 43389823.48it/s] Extracting ./data/cifar-10-python.tar.gz to ./data Files already downloaded and verified Epoch 1, Loss: 1.656345052792288 Epoch 2, Loss: 1.2884532127081585 Epoch 3, Loss: 1.0894466095873157 Epoch 4, Loss: 0.9669426654458351 Epoch 5, Loss: 0.8824315952218097 Epoch 6, Loss: 0.8171949713583797 Epoch 7, Loss: 0.7658578648667811 Epoch 8, Loss: 0.7242285415644536 Epoch 9, Loss: 0.6837977789856894 Epoch 10, Loss: 0.6462546251618954 Training finished Accuracy: 77.29% Confusion Matrix: [[796 9 51 27 9 19 13 5 41 30] [ 14 875 6 12 2 5 9 1 19 57] [ 46 2 653 52 60 77 80 21 3 6] [ 16 5 45 564 34 227 92 8 2 7] [ 13 1 50 54 726 43 69 41 3 0] [ 8 2 20 131 31 754 29 22 1 2] [ 4 1 28 43 19 30 873 0 2 0] [ 6 0 33 32 40 90 9 786 0 4] [ 50 16 11 15 4 17 7 5 857 18] [ 31 51 5 11 4 12 9 11 21 845]] Classification Report: precision recall f1-score support plane 0.81 0.80 0.80 1000 car 0.91 0.88 0.89 1000 bird 0.72 0.65 0.69 1000 cat 0.60 0.56 0.58 1000 deer 0.78 0.73 0.75 1000 dog 0.59 0.75 0.66 1000 frog 0.73 0.87 0.80 1000 horse 0.87 0.79 0.83 1000 ship 0.90 0.86 0.88 1000 truck 0.87 0.84 0.86 1000 accuracy 0.77 10000 macro avg 0.78 0.77 0.77 10000 weighted avg 0.78 0.77 0.77 10000
The initial results from running the ChatGPT-3.5 generated code show a well-managed process, with the CIFAR-10 dataset being downloaded, extracted, and validated accurately. The training demonstrated a consistent reduction in loss across 10 epochs, indicating effective learning.
In terms of evaluation, the model attained a notable 77.29% accuracy on the test set, which is considerably higher than Gemini Pro’s achievement of 57% accuracy. This significant difference underscores the effectiveness of ChatGPT-3.5’s approach. Additionally, the code included comprehensive evaluation metrics (e.g., confusion matrix and classification report), offering an in-depth understanding of the model’s performance across various classes.
It’s noteworthy that the code from ChatGPT-3.5 was executed without any human modifications, demonstrating its robustness and reliability. This contrasts with the Gemini Pro code, which required specific fixes, such as correcting an import error, adding the CIFAR-10 dataset, and modifying epoch reporting for it to function correctly. This comparison underscores ChatGPT-3.5’s proficiency in generating ready-to-use, reliable code for complex tasks like image classification.
What's next? We recommend PyImageSearch University.
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary and Key Takeaways
This comprehensive post explores image classification using Gemini Pro and its comparison with ChatGPT-3.5. Initially, it covers the setup of Gemini Pro and then delves into the use of Gemini Pro for image classification, revealing its limitations and the need for code adjustments. Gemini Pro’s generated code, while fundamentally sound, required modifications for integrating the CIFAR-10 dataset, fixing import errors, and correcting print statements in training loops. These adjustments were essential for the model to achieve a moderate 57% accuracy rate.
Contrastingly, ChatGPT-3.5’s code for a similar task demonstrated its robustness by requiring no alterations and achieving a higher accuracy rate of 77.29%. This notable difference in performance and the readiness of the code highlight ChatGPT-3.5’s advanced capabilities in creating efficient, accurate code for complex AI tasks, marking an area for improvement in Gemini Pro’s code generation process.
Citation Information
Sharma, A. “Image Classification with Gemini Pro,” PyImageSearch, P. Chugh, A. R. Gosthipaty, S. Huot, K. Kidriavsteva, and R. Raha, eds., 2024, https://pyimg.co/melcg
@incollection{Sharma_2024_Image-Classification-with-Gemini-Pro, author = {Aditya Sharma}, title = {Image Classification with Gemini Pro}, booktitle = {PyImageSearch}, editor = {Puneet Chugh and Aritra Roy Gosthipaty and Susan Huot and Kseniia Kidriavsteva and Ritwik Raha}, year = {2024}, url = {https://pyimg.co/melcg}, }
Unleash the potential of computer vision with Roboflow - Free!
- Step into the realm of the future by signing up or logging into your Roboflow account. Unlock a wealth of innovative dataset libraries and revolutionize your computer vision operations.
- Jumpstart your journey by choosing from our broad array of datasets, or benefit from PyimageSearch’s comprehensive library, crafted to cater to a wide range of requirements.
- Transfer your data to Roboflow in any of the 40+ compatible formats. Leverage cutting-edge model architectures for training, and deploy seamlessly across diverse platforms, including API, NVIDIA, browser, iOS, and beyond. Integrate our platform effortlessly with your applications or your favorite third-party tools.
- Equip yourself with the ability to train a potent computer vision model in a mere afternoon. With a few images, you can import data from any source via API, annotate images using our superior cloud-hosted tool, kickstart model training with a single click, and deploy the model via a hosted API endpoint. Tailor your process by opting for a code-centric approach, leveraging our intuitive, cloud-based UI, or combining both to fit your unique needs.
- Embark on your journey today with absolutely no credit card required. Step into the future with Roboflow.
Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF
Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.
Comment section
Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing from readers, a couple years ago I made the tough decision to no longer offer 1:1 help over blog post comments.
At the time I was receiving 200+ emails per day and another 100+ blog post comments. I simply did not have the time to moderate and respond to them all, and the sheer volume of requests was taking a toll on me.
Instead, my goal is to do the most good for the computer vision, deep learning, and OpenCV community at large by focusing my time on authoring high-quality blog posts, tutorials, and books/courses.
If you need help learning computer vision and deep learning, I suggest you refer to my full catalog of books and courses — they have helped tens of thousands of developers, students, and researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.
Click here to browse my full catalog.