Using prompto for multimodal prompting with Azure OpenAI¶

In [1]:

Copied!





from prompto.settings import Settings
from prompto.experiment import Experiment
from dotenv import load_dotenv
import warnings
import os
from prompto.settings import Settings
from prompto.experiment import Experiment
from dotenv import load_dotenv
import warnings
import os

When using prompto to query models from the OpenAI API, lines in our experiment .jsonl files must have "api": "openai" in the prompt dict.

Environment variables¶

For the AzureOpenAI API, there are four environment variables that could be set:

AZURE_OPENAI_API_KEY: the API key for the Azure OpenAI API
AZURE_OPENAI_API_ENDPOINT: the endpoint for the Azure OpenAI API
AZURE_OPENAI_API_VERSION: the version of the Azure OpenAI API (optional)

As mentioned in the environment variables docs, there are also model-specific environment variables too which can be utilised. In particular, when you specify a model_name key in a prompt dict, one could also specify a AZURE_OPENAI_API_KEY_model_name environment variable to indicate the API key used for that particular model (where "model_name" is replaced to whatever the corresponding value of the model_name key is). We will see a concrete example of this later. The same applies for the AZURE_OPENAI_API_ENDPOINT_model_name and AZURE_OPENAI_API_VERSION_model_name environment variables.

To set environment variables, one can simply have these in a .env file which specifies these environment variables as key-value pairs:

AZURE_OPENAI_API_KEY=<YOUR-AZURE-OPENAI-KEY>
AZURE_OPENAI_API_ENDPOINT=<YOUR-AZURE-OPENAI-ENDPOINT>
AZURE_OPENAI_API_VERSION=<DEFAULT-AZURE-OPENAI-API-VERSION>

If you make this file, you can run the following which should return True if it's found one, or False otherwise:

In [2]:

Copied!

load_dotenv(dotenv_path=".env")
load_dotenv(dotenv_path=".env")

Out[2]:

True

Now, we obtain those values. We raise an error if the AZURE_OPENAI_API_KEY or AZURE_OPENAI_API_ENDPOINT environment variables haven't been set:

In [3]:

Copied!

AZURE_OPENAI_API_KEY = os.environ.get("AZURE_OPENAI_API_KEY")
if AZURE_OPENAI_API_KEY is None:
    raise ValueError("AZURE_OPENAI_API_KEY is not set")
AZURE_OPENAI_API_KEY = os.environ.get("AZURE_OPENAI_API_KEY")
if AZURE_OPENAI_API_KEY is None:
    raise ValueError("AZURE_OPENAI_API_KEY is not set")

In [4]:

Copied!

AZURE_OPENAI_API_ENDPOINT = os.environ.get("AZURE_OPENAI_API_ENDPOINT")
if AZURE_OPENAI_API_ENDPOINT is None:
    raise ValueError("AZURE_OPENAI_API_ENDPOINT is not set")
AZURE_OPENAI_API_ENDPOINT = os.environ.get("AZURE_OPENAI_API_ENDPOINT")
if AZURE_OPENAI_API_ENDPOINT is None:
    raise ValueError("AZURE_OPENAI_API_ENDPOINT is not set")

We will only raise a warning if AZURE_OPENAI_API_VERSION hasn't been set:

In [5]:

Copied!





AZURE_OPENAI_API_VERSION = os.environ.get("AZURE_OPENAI_API_VERSION")
if AZURE_OPENAI_API_VERSION is None:
    warnings.warn("AZURE_OPENAI_API_VERSION is not set")
else:
    print(f"Default AzureOpenAI version: {AZURE_OPENAI_API_VERSION}")
AZURE_OPENAI_API_VERSION = os.environ.get("AZURE_OPENAI_API_VERSION")
if AZURE_OPENAI_API_VERSION is None:
    warnings.warn("AZURE_OPENAI_API_VERSION is not set")
else:
    print(f"Default AzureOpenAI version: {AZURE_OPENAI_API_VERSION}")

Default AzureOpenAI version: 2024-02-01

If you get any errors or warnings in the above two cells, try to fix your .env file like the example we have above to get these variables set.

Types of prompts¶

With the OpenAI API, the prompt (given via the "prompt" key in the prompt dict) can take several forms:

a string: a single prompt to obtain a response for
a list of strings: a sequence of prompts to send to the model
- this is useful in the use case of simulating a conversation with the model by defining the user prompts sequentially
a list of dictionaries with keys "role" and "content", where "role" is one of "user", "assistant", or "system" and "content" is the message
- this is useful in the case of passing in some conversation history or to pass in a system prompt to the model

Multimodal prompts¶

For prompting the model with multimodal inputs, we use this last format where we define a prompt by specifying the role of the prompt and then a list of parts that make up the prompt. Individual pieces of the part can be text, images or video which are passed to the model as a multimodal input. In this setting, the prompt can be defined flexibly with text interspersed with images or video.

When specifying an individual part of the prompt, we define this using a dictionary with the keys "type" and "image_url". There also may sometimes need to be a "mime_type" key too:

"type" is one of "text" or "image_url"
if "type" is "text", then you must have a "text" key with the text content
if "type" is "image_url", then you must have a "image_url" key. This can either just be a string specifying either a local path or a URL to an image (starting with "https://"), or is itself a dictionary with keys "url" specifying the image, and (optionally) "detail" which can be "low", "high" or "auto" (default "auto").

This is similar to how you'd set up a multimodal prompt for the OpenAI API (see OpenAI's documentation).

An example of a multimodal prompt is the following:

[
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What’s in this image?"},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                }
            },
        ]
    },
]

Here, we have a list of one dictionary where we specify the "role" as "user" and "content" as a list of two elements: the first specifies a text string and the second is a dictionary specifying an image.

To specify this same prompt, we could also have directly passed in the URL as the value for the "image_url" key:

[
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What’s in this image?"},
            {
                "type": "image_url",
                "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            },
        ]
    },
]

For this notebook, we have created an input file in data/input/azure-openai-multimodal-example.jsonl with several multimodal prompts with local files as an illustration.

Specifying local files¶

When specifying the local files, the file paths must be relative file paths to the media/ folder in the data folder. For example, if you have an image file image.jpg in the media/ folder, you would specify this as "image_url": "image.jpg" in the prompt. If you have a video file video.mp4 in the media/videos/ folder, you would specify this as "image_url": "videos/video.mp4" in the prompt.

In [6]:

Copied!





settings = Settings(data_folder="./data", max_queries=30)
experiment = Experiment(
    file_name="azure-openai-multimodal-example.jsonl", settings=settings
)
settings = Settings(data_folder="./data", max_queries=30)
experiment = Experiment(
    file_name="azure-openai-multimodal-example.jsonl", settings=settings
)

We set max_queries to 30 so we send 30 queries a minute (every 2 seconds).

In [7]:

Copied!

print(settings)
print(settings)

Settings: data_folder=./data, max_queries=30, max_attempts=3, parallel=False
Subfolders: input_folder=./data/input, output_folder=./data/output, media_folder=./data/media

In [8]:

Copied!

len(experiment.experiment_prompts)
len(experiment.experiment_prompts)

Out[8]:

We can see the prompts that we have in the experiment_prompts attribute:

In [9]:

Copied!

experiment.experiment_prompts
experiment.experiment_prompts

Out[9]:

[{'id': 0,
  'api': 'azure-openai',
  'model_name': 'reginald-gpt-4o',
  'prompt': [{'role': 'user',
    'content': ['describe what is happening in this image',
     {'type': 'image_url', 'image_url': 'pantani_giro.jpg'}]}],
  'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 100}},
 {'id': 1,
  'api': 'azure-openai',
  'model_name': 'reginald-gpt-4o',
  'prompt': [{'role': 'user',
    'content': [{'type': 'image_url', 'image_url': 'mortadella.jpg'},
     'what is this?']}],
  'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 100}},
 {'id': 2,
  'api': 'azure-openai',
  'model_name': 'reginald-gpt-4o',
  'prompt': [{'role': 'user',
    'content': ['what is in this image?',
     {'type': 'image_url', 'image_url': 'pantani_giro.jpg'}]},
   {'role': 'assistant',
    'content': 'This is image shows a group of cyclists.'},
   {'role': 'user',
    'content': 'are there any notable cyclists in this image? what are their names?'}],
  'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 100}},
 {'id': 3,
  'api': 'azure-openai',
  'model_name': 'reginald-gpt-4o',
  'prompt': [{'role': 'user',
    'content': [{'type': 'text', 'text': 'What’s in this image?'},
     {'type': 'image_url',
      'image_url': {'url': 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg'}}]}],
  'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 100}}]

In the first prompt ("id": 0), we have a "prompt" key which specifies a prompt where we ask the model to "describe what is happening in this image" and we pass in an image which is defined using a dictionary with "type" and "image_url" keys pointing to a file in the media folder
In the second prompt ("id": 1), we have a "prompt" key which specifies a prompt where we first pass in an image defined using a dictionary with "type" and "image_url" keys pointing to a file in the media folder and then we ask the model "what is this?"
In the third prompt ("id": 2), we have a "prompt" key which is a list of dictionaries. Each of these dictionaries have a "role" and "content" key and we specify a user/model interaction. First we ask the model "what is in this image?" along with an image defined by a dictionary with "type" and "image_url" keys to point to a file in the media folder. We then have a model response and another user query
In the fourth prompt ("id": 3), we have the prompt example above where we pass in a URL link to an image. This example is taken from the OpenAI documentation.

For each of these prompts, we specify "model_name": "reginald-gpt-4o" which refers to a specific GPT-4o deployment that we have on our Azure subscription when developing this notebook.

Running the experiment¶

We now can run the experiment using the async method process which will process the prompts in the input file asynchronously. Note that a new folder named timestamp-openai-example (where "timestamp" is replaced with the actual date and time of processing) will be created in the output directory and we will move the input file to the output directory. As the responses come in, they will be written to the output file and there are logs that will be printed to the console as well as being written to a log file in the output directory.

In [10]:

Copied!

responses, avg_query_processing_time = await experiment.process()
responses, avg_query_processing_time = await experiment.process()

Sending 4 queries at 30 QPM with RI of 2.0s (attempt 1/3):   0%|          | 0/4 [00:00<?, ?query/s]

Sending 4 queries at 30 QPM with RI of 2.0s (attempt 1/3): 100%|██████████| 4/4 [00:08<00:00,  2.00s/query]
Waiting for responses (attempt 1/3): 100%|██████████| 4/4 [00:05<00:00,  1.43s/query]

We can see that the responses are written to the output file, and we can also see them as the returned object. From running the experiment, we obtain prompt dicts where there is now a "response" key which contains the response(s) from the model.

For the case where the prompt is a list of strings, we see that the response is a list of strings where each string is the response to the corresponding prompt.

In [11]:

Copied!

responses
responses

Out[11]:

[{'id': 0,
  'api': 'azure-openai',
  'model_name': 'reginald-gpt-4o',
  'prompt': [{'role': 'user',
    'content': ['describe what is happening in this image',
     {'type': 'image_url', 'image_url': 'pantani_giro.jpg'}]}],
  'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 100},
  'timestamp_sent': '29-10-2024-13-10-48',
  'response': 'The image shows a group of cyclists participating in a road cycling race. They are riding closely together in a single file along a paved path, next to a stone wall. The cyclists are wearing colorful team uniforms and helmets, and the bicycles are designed for racing, with drop handlebars and thin tires. The position of their bodies and the intense looks suggest they are pushing themselves, possibly during a climb or a sprint segment of the race. The background indicates they are in an urban or village setting, as evidenced'},
 {'id': 1,
  'api': 'azure-openai',
  'model_name': 'reginald-gpt-4o',
  'prompt': [{'role': 'user',
    'content': [{'type': 'image_url', 'image_url': 'mortadella.jpg'},
     'what is this?']}],
  'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 100},
  'timestamp_sent': '29-10-2024-13-10-50',
  'response': 'This is mortadella, an Italian sausage or cold cut made of finely hashed or ground, heat-cured pork. Mortadella is typically cylindrical in shape and wrapped in a casing tied with string. It often contains small cubes of pork fat distributed throughout the meat, and may include other ingredients such as pistachios or black pepper. It is commonly sliced thin and served as a deli meat or in sandwiches.'},
 {'id': 2,
  'api': 'azure-openai',
  'model_name': 'reginald-gpt-4o',
  'prompt': [{'role': 'user',
    'content': ['what is in this image?',
     {'type': 'image_url', 'image_url': 'pantani_giro.jpg'}]},
   {'role': 'assistant',
    'content': 'This is image shows a group of cyclists.'},
   {'role': 'user',
    'content': 'are there any notable cyclists in this image? what are their names?'}],
  'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 100},
  'timestamp_sent': '29-10-2024-13-10-52',
  'response': 'I do not recognize any notable cyclists in the image.'},
 {'id': 3,
  'api': 'azure-openai',
  'model_name': 'reginald-gpt-4o',
  'prompt': [{'role': 'user',
    'content': [{'type': 'text', 'text': 'What’s in this image?'},
     {'type': 'image_url',
      'image_url': {'url': 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg'}}]}],
  'parameters': {'n': 1, 'temperature': 1, 'max_tokens': 100},
  'timestamp_sent': '29-10-2024-13-10-54',
  'response': "The image depicts a serene natural landscape with a wooden boardwalk extending into the distance through a lush green field. The sky above is bright and blue with scattered, wispy clouds. The scene suggests a peaceful, inviting environment, likely part of a nature reserve or park. Trees and bushes are visible in the background, enhancing the image's natural beauty."}]

Also notice how with the OpenAI API, we record some additional information related to the safety attributes.

Running the experiment via the command line¶

We can also run the experiment via the command line. The command is as follows (assuming that your working directory is the current directory of this notebook, i.e. examples/openai):

prompto_run_experiment --file data/input/openai-multimodal-example.jsonl --max-queries 30