Table of Contents
- ML Days in Tashkent — Day 3: Demos and Workshops
- Day 3 at the Google ML Community Summit: A Hands-On Showcase of Global Talent
- 3-Minute Demos: A Rapid-Fire Showcase of Innovation
- Meeting with Pedro: Fine-Tuning Stable Diffusion
- “Keras (3) Is All You Need” — A Presentation by Aritra and Aakash
- Semantic Segmentation with Keras-CV
- Summary
ML Days in Tashkent — Day 3: Demos and Workshops
In this tutorial, we will depart from our regular ML blogs and discover how our authors made the most of the Google Machine Learning Community Summit in Tashkent, Uzbekistan. But again, stick around for a surprise demo at the end. 😉
This blog is the last of a 3-part series:
- ML Days in Tashkent — Day 1: City Tour
- ML Days in Tashkent — Day 2: Sprints and Sessions
- ML Days in Tashkent — Day 3: Demos and Workshops (this tutorial)
Day 3 at the Google ML Community Summit: A Hands-On Showcase of Global Talent
The 3rd day of the Google Machine Learning Community Summit was a dynamic and interactive experience, focusing on hands-on demonstrations by Google Developer Experts (GDEs) and TensorFlow User Group (TFUG) members from around the world.
3-Minute Demos: A Rapid-Fire Showcase of Innovation
- Brief Yet Impactful Presentations: The format of the day was centered around 3-minute demos, where each community member and GDE had the opportunity to present a concise yet powerful demonstration of their work and projects from the past year. This format made for a fast-paced and diverse showcase of ideas and applications in AI and ML.
- A Kaleidoscope of Projects: The range of projects and work presented was astounding. In just 3 minutes, each participant managed to highlight the core of their work, offering insights into the innovative ways in which AI and ML are being applied across various fields. From healthcare and education to finance and arts, the demos covered a wide spectrum of industries and use cases.
- Networking and Connections: These presentations also served as a platform for networking and knowledge exchange. It was a chance for participants to learn from each other and explore potential collaborations. In these interactions, the community’s strength and diversity were palpable.
Meeting with Pedro: Fine-Tuning Stable Diffusion
- A Highlight Encounter: Among the many talented individuals we met was Pedro Gengo (a fan of PyImageSearch), who has been working on fine-tuning Stable Diffusion. His work is particularly interesting because it touches upon one of the most exciting areas in generative AI. Fine-tuning models like Stable Diffusion opens up possibilities for more tailored and specific applications, pushing the boundaries of what can be achieved with text-to-image models.
- The Power of Community: Meeting Pedro (GDE and TFUG organizer from Sao Paulo, Brazil) and witnessing his work was a reminder of how the AI/ML community is thriving, with enthusiasts and experts from all corners of the globe contributing to the field’s advancement. His work exemplifies the innovative spirit that drives the community forward.
This 3rd day of the summit was a testament to the power of brevity and the depth of talent within the AI/ML community. The array of demos presented a vivid picture of the current state and future potential of AI and ML, showcasing the creativity and technical prowess of the community members. It was a day filled with learning, inspiration, and a collective passion for pushing the envelope in AI and ML.
The day progressed with an invaluable feedback session involving Martin Gorner and Mark McDonald, focusing on Keras Core and Generative AI.
- Keras Core Feedback: The feedback session for Keras 3.0, conducted by Keras Product Manager Martin Gorner, was a constructive and positive experience, highlighting areas for potential enhancement. While Keras 3.0 currently does not support wrapping PyTorch or JAX models, this opens up opportunities for future developments to increase framework interoperability. Additionally, the handling of the pseudo-random number generator (PRNG) seed in JAX, though not in line with some expectations, presents a unique chance to refine and improve this aspect for more consistent outcomes. The inability to directly utilize community repositories like parameter-efficient fine-tuning (PEFT) and Transformers was also noted, not as a shortfall, but as a potential area for expanding the framework’s capabilities. Overall, the session was an affirmative step toward continual improvement and adaptation in the ever-evolving landscape of machine learning technologies.
- Generative AI — Addressing Concerns and Exploring Futures
- Quality of Models: A major topic was the quality of models in generative AI. This is a critical concern as the efficacy and reliability of these models directly impact their practical applications.
- Accessibility and Enterprise Use: The conversation also touched upon the division of technology accessibility, particularly the difference between PaLM (intended for widespread use) and Vertex AI (geared toward enterprises). This raised important considerations about the democratization of AI technologies.
- Gemini Release and Capabilities: Questions were raised about the release dates of ‘Gemini’ and its expected multimodal capabilities, reflecting the community’s anticipation and interest in this upcoming feature.
- Unique Use Cases: The session concluded with a discussion on unique use cases of generative AI, highlighting the diverse applications and the potential of this technology to transform various industries.
“Keras (3) Is All You Need” — A Presentation by Aritra and Aakash
At its core, deep learning involves manipulating multi-dimensional arrays known as tensors. These tensors represent various forms of data (e.g., images, sound, and text). Neural networks, which are central to deep learning, operate by performing complex tensor operations (e.g., convolutions, matrix multiplications, and element-wise functions). Simplifying deep learning to tensor manipulation highlights the fundamental nature of these operations in building and training neural network models.
Libraries like NumPy in Python are capable of tensor manipulations, especially matrix multiplications, which are essential in linear algebra. These basic operations form the foundation of more complex neural network computations. However, deep learning requires more advanced and optimized tensor operations, often executed on GPUs or TPUs for efficiency.
Backpropagation, a key algorithm in training neural networks, requires the computation of gradients. Automatic differentiation engines in frameworks (e.g., TensorFlow, PyTorch, and JAX) make this process efficient and accurate. They allow developers to define models in a high-level, intuitive manner while handling the complex differentiation operations under the hood.
Keras 3’s major advancement is its backend-agnostic nature. While earlier versions were closely tied to TensorFlow, Keras 3 can interface with various backends. This means it can perform tensor operations using different engines (e.g., TensorFlow, Torch, or JAX) and leverage their respective automatic differentiation capabilities. This flexibility allows developers to choose their preferred backend based on performance, features, or compatibility requirements.
Keras 3 offers an API that is reminiscent of NumPy, making it familiar and easy to use, especially for those who have experience with NumPy. Additionally, it provides neural network-specific functions and layers (e.g., softmax activation, convolutional layers, etc.), which are essential for building deep learning models. This blend of general and specialized APIs makes Keras 3 both versatile and powerful for deep learning tasks.
Keras 3 can serve as a direct replacement for TensorFlow’s implementation of Keras. This means models built with TensorFlow’s Keras API can be seamlessly transitioned to Keras 3 with minimal changes. This compatibility is crucial for existing projects and for those who want to take advantage of the latest features in Keras 3 without reworking their entire codebase.
Keras 3 supports various data pipelines, allowing flexibility in how data is fed into the models. Whether it’s TensorFlow’s tf.data for efficient data loading and preprocessing or PyTorch’s DataLoader for its ease of use and flexibility, Keras 3 can integrate with these mechanisms. This feature enables developers to leverage the strengths of different data-handling libraries and choose the one that best fits their project’s needs.
Semantic Segmentation with Keras-CV
And now, finally, it’s time for that demo to see how we can segment the people in the following image.
To make it cooler, those above include most of the ML GDEs and Community Members from India.
This code demonstrates setting up an environment for deep learning with TensorFlow’s nightly build, using the Keras-CV library to perform image segmentation with the DeepLabV3Plus
model. Here’s a breakdown of the code:
!pip install -q git+https://github.com/keras-team/keras-cv.git@master !pip uninstall -q keras -y !pip uninstall -q tensorflow -y !pip install -q tf-nightly # needed for some data processing in keras-cv !pip install -q keras-nightly
- Lines 1-5: These lines handle the installation and setup of required Python packages. The script installs
keras-cv
from its GitHub repository, uninstalls existing versions ofkeras
andtensorflow
, and installs their nightly builds (tf-nightly
andkeras-nightly
). Nightly builds include the latest features and updates that aren’t yet in the stable release.
import os os.environ["KERAS_BACKEND"] = "jax" import keras_core as keras import keras_cv import numpy as np
- Lines 7-13: The script imports necessary libraries and sets the environment. It changes the Keras backend to “jax” using the
os
library.keras_core
andkeras_cv
are imported for building and utilizing deep learning models, andnumpy
is used for numerical operations.
model = keras_cv.models.DeepLabV3Plus.from_preset( "deeplab_v3_plus_resnet50_pascalvoc", num_classes=21, input_shape=[512, 512, 3], )
- Lines 15-19: Here, a
DeepLabV3Plus
model is instantiated using a preset configuration (deeplab_v3_plus_resnet50_pascalvoc
). This preset is likely tailored for the PASCAL VOC dataset, a common benchmark in visual object recognition. The model has21
classes and an input shape of512×512×3
(height, width, channels).
filepath = keras.utils.get_file(origin="https://i.imgur.com/RxmO6qM.jpg") image = keras.utils.load_img(filepath) resize = keras_cv.layers.Resizing(height=512, width=512) image = resize(image) image = keras.ops.expand_dims(np.array(image), axis=0) preds = keras.ops.expand_dims(keras.ops.argmax(model(image), axis=-1), axis=-1) keras_cv.visualization.plot_segmentation_mask_gallery( image, value_range=(0, 255), num_classes=1, y_true=None, y_pred=preds, scale=3, rows=1, cols=1, )
- Lines 21-27: The script downloads an image from a URL and processes it for model prediction. The image is resized to match the model’s input shape, converted to a NumPy array, and its dimensions are adjusted to fit the model’s input requirements. The model then predicts the segmentation mask for the image.
keras.ops.argmax
is used to extract the most likely class for each pixel.
- Lines 28-37: Finally, the script uses
keras_cv.visualization.plot_segmentation_mask_gallery
to visualize the segmentation mask on the image. It sets parameters like the value range, number of classes, scaling, and layout for the gallery.
The above example showcases an advanced use of Keras and TensorFlow for image segmentation, a task common in computer vision applications (e.g., autonomous driving, medical imaging, and photo editing).
Summary
The Google Machine Learning Community Summit provided a vivid snapshot of the current state and the exciting future of Machine Learning and Generative AI. Here’s a summary of the key takeaways.
Harnessing the Power of Frameworks and Tools
- Keras’s Evolution: The advancements in frameworks like Keras, particularly with its multi-backend functionalities, have been a game-changer. Keras has enabled developers to seamlessly integrate the best features of Python, TensorFlow, and JAX, paving the way for more flexible and powerful machine learning applications.
- Generative AI Adoption: The adoption of Generative AI tools by the community has led to an explosion of creativity and innovation. The summit showcased a variety of projects and use cases that leveraged these tools, demonstrating their immense potential and versatility in solving complex problems.
The Open, Developer-First Future
- An Open Horizon: The future of Machine Learning and Generative AI is being shaped by an open, developer-first approach. This mindset prioritizes accessibility, collaboration, and community involvement, ensuring that advancements in AI and ML are not just technologically sound but also ethically grounded and widely accessible.
- Safety and Security: A foundational element of this future is the emphasis on safety and security. As the field evolves, these considerations must remain at the forefront, guiding the development and deployment of AI/ML technologies.
The Pace and Direction of ML Research
- Rapid Advancements: The field of ML is advancing at an unprecedented rate. However, there is a tendency in the broader AI/ML landscape to keep research and projects under wraps, often leading to a selective presentation of results.
- The Role of the Google ML Community: The summit highlighted the crucial role of the Google ML community as a beacon of openness and collaboration. This community serves as a powerful reminder that sharing knowledge and building open-source tools are key to ensuring the field progresses in the right direction. By fostering an environment of transparency and open exchange, the community is helping to shape an AI/ML ecosystem that is inclusive, ethical, and forward-thinking.
Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF
Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.
Comment section
Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing from readers, a couple years ago I made the tough decision to no longer offer 1:1 help over blog post comments.
At the time I was receiving 200+ emails per day and another 100+ blog post comments. I simply did not have the time to moderate and respond to them all, and the sheer volume of requests was taking a toll on me.
Instead, my goal is to do the most good for the computer vision, deep learning, and OpenCV community at large by focusing my time on authoring high-quality blog posts, tutorials, and books/courses.
If you need help learning computer vision and deep learning, I suggest you refer to my full catalog of books and courses — they have helped tens of thousands of developers, students, and researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.
Click here to browse my full catalog.