Meet BLIP: The Vision-Language Model Powering Image Captioning