The background picture is Simple and easy apple pie served with vanilla ice cream, on a gingham tablecloth in Lysekil, Sweden. Create a funny caption for a new meme about apple pie. Here is the prompt that creates a caption for the apple pie picture. Instead of providing examples of what you are asking the model to do, you can just simply ask it what to do directly. I am using their latest “zero-shot” style of prompting with their new Da Vinci Instruct model. OpenAI’s GPT-3 Da Vinci is currently the largest AI model for Natural Language Processing. The model is a lot smaller, but it’s free to use. And there’s the open-source GPT-Neo model from EleutherAI. There’s the latest GPT-3 Da Vinci model from OpenAI that does an excellent job, but you have to be enrolled in their beta program to use it. I use two different implementations of GPT to generate the captions. Meme by AI-Memer, Image by Pharlap, Caption by OpenAI GPT-3, License: CC BY-SA 4.0 Generating Captions I present the top 10 images to the user to choose their favorite.įor example, if you search for “apple pie”, you will be presented with the top 10 images sorted by closest match. įor a final filtering pass, I run the images from the 3 Wikipedia pages and the 20 images from the OpenImages through the image encoder and compare the results to the embedding of the text query. I then download the top 20 matching images using the OpenImages download API. When the user types in a query, I run it through CLIP and compare it to the cached embeddings. I ran each of the descriptions through OpenAI’s CLIP system and cached the embeddings for quick access. A dataset of image descriptions is available for download. The OpenImages dataset from Google is comprised of 675,000 photos scraped from Flikr that were all released under the Creative Commons Attribution license. There are typically 3 to 10 images on a Wikipedia page so there will be about 9 to 30 images in total coming down. I use the pyfileobj() function in python to download the image files. I use Goldsmith’s Wikipedia search API to find the top 3 pages related to the text query and gather the image descriptions using the CommonsAPI on the Magnus Toolserver. Most of them are released with permissive rights, like the Creative Commons Attribution license. The Wikimedia Commons has over 73 million JPEG files. For more information about how CLIP works, check out my article, here. The CLIP model was pretrained on 40 million pairs of images with text labels such that the embeddings encoded from the images will be similar to the embeddings encoded from the text labels. The CLIP system accomplishes two functions, encoding both text and images into “embeddings”, which are strings of numbers that represent the gist of the original data. I use OpenAI’s CLIP to perform a semantic search. The background images are pulled from two sources, the Wikimedia Commons and the OpenImages dataset. Meme by AI-Memer, Image by Mike K, Caption by OpenAI GPT-3, License: CC BY-SA 4.0 Finding Images
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |