Grouping prompts and specifying rate limits¶
When running the pipeline or an experiment, there are certain settings to define how to run the experiments which are described in the pipeline documentation. In the Specifying rate limits documentation, we have seen how we can specify rate limits for the pipeline in the command line interfaces for running the pipeline with prompto_run_pipeline
and for running a particular experiment file with prompto_run_experiment
. In this notebook, we will walkthrough the examples in the documentation to see how we can specify rate limits for the pipeline and for experiments.
We will consider three examples of experiment files which are found in the input folder of the parallel_data_example
directory. The experiment files are:
from prompto import Settings, Experiment
data_folder = "parallel_data_example"
Using parallel processing¶
As noted in the Specifying rate limits documentation, parallel processing of the prompts (meaning that different groups of prompts are processed and sent to APIs in parallel) should be enabled in most settings where the experiment file includes more than one model. Typically this is even true in cases where we are querying the same API type but for different models since the rate limits are usually per model.
To use parallel processing of the prompts, we need to first split the prompts into different queues (or groups). For most settings, splitting by API and certain models is sufficient. However, in some cases, we may want to split the prompts into different groups manually and this can be done by using the "group"
key in the experiment file.
When we obtain the groups of prompts for parallel processing, what really is happening in the code (see source code for the prompto.experiment_processing.Experiment.group_prompts
method) is that we loop over the prompts in the experiment file and assign them to different queues/groups based on:
- the
"group"
key if it is present in the prompt dictionary - the
"api"
key
Since we also allow for splitting according to the "model_name"
key, we also look at if a rate limit has been specified for a particular model within the group or API. We specify rate limits via the --max-queries-json
or -mqj
flag in the commands line interfaces or via the max_queries_dict
argument in the Settings
object in the prompto
library.
In the examples below, we will see how we can specify rate limits for different APIs and models in the experiment files. We will also see how we can use the "group"
key in the experiment file to group prompts manually and specify rate limits for each group.
Examples¶
First, we will look at the documentation_example.jsonl experiment file which has prompts for three different APIs (gemini
, openai
and ollama
) for 6 different models (gemini-1.0-pro
, gemini-1.5-pro
, gpt3.5-turbo
, gpt4
, llama3
and mistral
):
with open(f"{data_folder}/input/documentation_example.jsonl", "r") as f:
print(f.read())
{"id": 0, "api": "gemini", "model_name": "gemini-1.0-pro", "prompt": "What is the capital of France?"} {"id": 1, "api": "gemini", "model_name": "gemini-1.0-pro", "prompt": "What is the capital of Germany?"} {"id": 2, "api": "gemini", "model_name": "gemini-1.5-pro", "prompt": "What is the capital of France?"} {"id": 3, "api": "gemini", "model_name": "gemini-1.5-pro", "prompt": "What is the capital of Germany?"} {"id": 4, "api": "openai", "model_name": "gpt3.5-turbo", "prompt": "What is the capital of France?"} {"id": 5, "api": "openai", "model_name": "gpt3.5-turbo", "prompt": "What is the capital of Germany?"} {"id": 6, "api": "openai", "model_name": "gpt4", "prompt": "What is the capital of France?"} {"id": 7, "api": "openai", "model_name": "gpt4", "prompt": "What is the capital of Germany?"} {"id": 8, "api": "ollama", "model_name": "llama3", "prompt": "What is the capital of France?"} {"id": 9, "api": "ollama", "model_name": "llama3", "prompt": "What is the capital of Germany?"} {"id": 10, "api": "ollama", "model_name": "mistral", "prompt": "What is the capital of France?"} {"id": 11, "api": "ollama", "model_name": "mistral", "prompt": "What is the capital of Germany?"}
Same rate limit for all APIs¶
Recall for each Experiment
, we need to pass in the path to a jsonl file and a Settings
object which stores paths to relevant data folders and also some parameter settings for how to run the particular experiment. For an overview of the Settings
and Experiment
classes see the Running experiments notebook.
By default, the Settings
object has the parallel
attribute set to False
. Recall we can simply print the settings object to see the current settings:
settings = Settings(data_folder=data_folder, max_queries=5)
print(settings)
Settings: data_folder=parallel_data_example, max_queries=5, max_attempts=3, parallel=False Subfolders: input_folder=parallel_data_example/input, output_folder=parallel_data_example/output, media_folder=parallel_data_example/media
We can simply initialise an Experiment
object for this experiment and the prompts in that experiment are stored in the experiment_prompts
attribute:
experiment = Experiment(file_name="documentation_example.jsonl", settings=settings)
experiment.experiment_prompts
[{'id': 0, 'api': 'gemini', 'model_name': 'gemini-1.0-pro', 'prompt': 'What is the capital of France?'}, {'id': 1, 'api': 'gemini', 'model_name': 'gemini-1.0-pro', 'prompt': 'What is the capital of Germany?'}, {'id': 2, 'api': 'gemini', 'model_name': 'gemini-1.5-pro', 'prompt': 'What is the capital of France?'}, {'id': 3, 'api': 'gemini', 'model_name': 'gemini-1.5-pro', 'prompt': 'What is the capital of Germany?'}, {'id': 4, 'api': 'openai', 'model_name': 'gpt3.5-turbo', 'prompt': 'What is the capital of France?'}, {'id': 5, 'api': 'openai', 'model_name': 'gpt3.5-turbo', 'prompt': 'What is the capital of Germany?'}, {'id': 6, 'api': 'openai', 'model_name': 'gpt4', 'prompt': 'What is the capital of France?'}, {'id': 7, 'api': 'openai', 'model_name': 'gpt4', 'prompt': 'What is the capital of Germany?'}, {'id': 8, 'api': 'ollama', 'model_name': 'llama3', 'prompt': 'What is the capital of France?'}, {'id': 9, 'api': 'ollama', 'model_name': 'llama3', 'prompt': 'What is the capital of Germany?'}, {'id': 10, 'api': 'ollama', 'model_name': 'mistral', 'prompt': 'What is the capital of France?'}, {'id': 11, 'api': 'ollama', 'model_name': 'mistral', 'prompt': 'What is the capital of Germany?'}]
Note that the experiment_prompts
attribute is read only (which is implemented using a @property
decorator) and so we cannot change the prompts directly:
experiment.experiment_prompts = "something else"
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[6], line 1 ----> 1 experiment.experiment_prompts = "something else" File ~/Library/CloudStorage/OneDrive-TheAlanTuringInstitute/prompto/src/prompto/experiment_processing.py:115, in Experiment.experiment_prompts(self, value) 113 @experiment_prompts.setter 114 def experiment_prompts(self, value: list[dict]) -> None: --> 115 raise AttributeError("Cannot set the experiment_prompts attribute") AttributeError: Cannot set the experiment_prompts attribute
For this experiment file, we notice that we have three "api"
keys present (gemini
, openai
and ollama
). If no max_queries_dict
is passed into the Settings
object, then as we noted above, the prompts will be grouped first by their "group"
key and then by their "api"
key. In this example, we have no prompts with a "group"
key and so the prompts will be grouped by their "api"
key.
The groups of prompts are stored in the grouped_experiment_prompts
attribute which again is a read only attribute and it's only initialised when we try to access the attribute. We can see by default the underlying attribute _grouped_experiment_prompts
is an empty dictionary:
experiment._grouped_experiment_prompts
{}
Now when we access the grouped_experiment_prompts
attribute, the prompts are grouped by their "api"
key:
experiment.grouped_experiment_prompts
WARNING:root:The 'parallel' attribute in the Settings object is set to False, so grouping will not be used when processing the experiment prompts. Set 'parallel' to True to use grouping and parallel processing of prompts.
{'gemini': {'prompt_dicts': [{'id': 0, 'api': 'gemini', 'model_name': 'gemini-1.0-pro', 'prompt': 'What is the capital of France?'}, {'id': 1, 'api': 'gemini', 'model_name': 'gemini-1.0-pro', 'prompt': 'What is the capital of Germany?'}, {'id': 2, 'api': 'gemini', 'model_name': 'gemini-1.5-pro', 'prompt': 'What is the capital of France?'}, {'id': 3, 'api': 'gemini', 'model_name': 'gemini-1.5-pro', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 5}, 'openai': {'prompt_dicts': [{'id': 4, 'api': 'openai', 'model_name': 'gpt3.5-turbo', 'prompt': 'What is the capital of France?'}, {'id': 5, 'api': 'openai', 'model_name': 'gpt3.5-turbo', 'prompt': 'What is the capital of Germany?'}, {'id': 6, 'api': 'openai', 'model_name': 'gpt4', 'prompt': 'What is the capital of France?'}, {'id': 7, 'api': 'openai', 'model_name': 'gpt4', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 5}, 'ollama': {'prompt_dicts': [{'id': 8, 'api': 'ollama', 'model_name': 'llama3', 'prompt': 'What is the capital of France?'}, {'id': 9, 'api': 'ollama', 'model_name': 'llama3', 'prompt': 'What is the capital of Germany?'}, {'id': 10, 'api': 'ollama', 'model_name': 'mistral', 'prompt': 'What is the capital of France?'}, {'id': 11, 'api': 'ollama', 'model_name': 'mistral', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 5}}
Notice the warning message that is logged when accessing this attribute. We got this message since the parallel
attribute in the Settings
object is set to False
. We still get the groups of prompts but if we run the experiment, the prompts will not be processed in these different groups/queues in parallel.
experiment.grouped_experiment_prompts == experiment._grouped_experiment_prompts
WARNING:root:The 'parallel' attribute in the Settings object is set to False, so grouping will not be used when processing the experiment prompts. Set 'parallel' to True to use grouping and parallel processing of prompts.
True
We can see now that _grouped_experiment_prompts
is now a dictionary with keys as the different APIs and the values are dictionaries with keys "prompt_dicts"
which is a list of the prompts for that API and "rate_limit"
which is the rate limit for that API. We can see that the rate limit here for each API is set to the default rate limit of 5
which is given by the Settings
object for the experiment and which we set above:
experiment.grouped_experiment_prompts.keys()
WARNING:root:The 'parallel' attribute in the Settings object is set to False, so grouping will not be used when processing the experiment prompts. Set 'parallel' to True to use grouping and parallel processing of prompts.
dict_keys(['gemini', 'openai', 'ollama'])
experiment.settings.max_queries
5
As mentioned, this attribute is read only so we will not be able to change the groups directly:
experiment.grouped_experiment_prompts = "something else"
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[12], line 1 ----> 1 experiment.grouped_experiment_prompts = "something else" File ~/Library/CloudStorage/OneDrive-TheAlanTuringInstitute/prompto/src/prompto/experiment_processing.py:136, in Experiment.grouped_experiment_prompts(self, value) 134 @grouped_experiment_prompts.setter 135 def grouped_experiment_prompts(self, value: dict[str, list[dict]]) -> None: --> 136 raise AttributeError("Cannot set the grouped_experiment_prompts attribute") AttributeError: Cannot set the grouped_experiment_prompts attribute
A useful method in the Experiment
class is the grouped_experiment_prompts_summary
method which returns a dictionary where the keys are the API names and the values are a string which summarises the number of prompts for that API and the rate limit for that API. This is useful to see a summary of the groups of prompts:
experiment.grouped_experiment_prompts_summary()
WARNING:root:The 'parallel' attribute in the Settings object is set to False, so grouping will not be used when processing the experiment prompts. Set 'parallel' to True to use grouping and parallel processing of prompts.
{'gemini': '4 queries at 5 queries per minute', 'openai': '4 queries at 5 queries per minute', 'ollama': '4 queries at 5 queries per minute'}
Different rate limits for each API type¶
To build on the above example, we can set different rate limits for each API type by passing in a dictionary which specifies the rate limits for each API type. We can do this by passing in a max_queries_dict
argument to the Settings
object (or passing a json to the --max-queries-json
or -mqj
flag in the commands line interfaces) where the keys are the API names and the values are the rate limits for that API. We can see how this is done below:
max_queries_dict = {"openai": 20, "gemini": 10}
settings = Settings(
data_folder=data_folder,
max_queries=5,
max_queries_dict=max_queries_dict,
)
print(settings)
WARNING:root:max_queries_dict is provided and not empty, but parallel is set to False, so max_queries_dict will not be used. Set parallel to True to use max_queries_dict
Settings: data_folder=parallel_data_example, max_queries=5, max_attempts=3, parallel=False Subfolders: input_folder=parallel_data_example/input, output_folder=parallel_data_example/output, media_folder=parallel_data_example/media
Notice the warning provided here! We passed in a max_queries_dict
to the Settings
object but the parallel
attribute is still set to False
. We can remove this warning by setting the parallel
attribute to True
as the warning suggests:
max_queries_dict = {"openai": 20, "gemini": 10}
settings = Settings(
data_folder=data_folder,
max_queries=5,
max_queries_dict=max_queries_dict,
parallel=True,
)
print(settings)
Settings: data_folder=parallel_data_example, max_queries=5, max_attempts=3, parallel=True, max_queries_dict={'openai': 20, 'gemini': 10} Subfolders: input_folder=parallel_data_example/input, output_folder=parallel_data_example/output, media_folder=parallel_data_example/media
Also notice above how the max_queries_dict
is only printed when the parallel
attribute is set to True
.
Let's now see how the prompts are grouped and what the grouped_experiment_prompts
attribute looks like:
experiment = Experiment(file_name="documentation_example.jsonl", settings=settings)
experiment.grouped_experiment_prompts
{'openai': {'prompt_dicts': [{'id': 4, 'api': 'openai', 'model_name': 'gpt3.5-turbo', 'prompt': 'What is the capital of France?'}, {'id': 5, 'api': 'openai', 'model_name': 'gpt3.5-turbo', 'prompt': 'What is the capital of Germany?'}, {'id': 6, 'api': 'openai', 'model_name': 'gpt4', 'prompt': 'What is the capital of France?'}, {'id': 7, 'api': 'openai', 'model_name': 'gpt4', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 20}, 'gemini': {'prompt_dicts': [{'id': 0, 'api': 'gemini', 'model_name': 'gemini-1.0-pro', 'prompt': 'What is the capital of France?'}, {'id': 1, 'api': 'gemini', 'model_name': 'gemini-1.0-pro', 'prompt': 'What is the capital of Germany?'}, {'id': 2, 'api': 'gemini', 'model_name': 'gemini-1.5-pro', 'prompt': 'What is the capital of France?'}, {'id': 3, 'api': 'gemini', 'model_name': 'gemini-1.5-pro', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 10}, 'ollama': {'prompt_dicts': [{'id': 8, 'api': 'ollama', 'model_name': 'llama3', 'prompt': 'What is the capital of France?'}, {'id': 9, 'api': 'ollama', 'model_name': 'llama3', 'prompt': 'What is the capital of Germany?'}, {'id': 10, 'api': 'ollama', 'model_name': 'mistral', 'prompt': 'What is the capital of France?'}, {'id': 11, 'api': 'ollama', 'model_name': 'mistral', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 5}}
experiment.grouped_experiment_prompts.keys()
dict_keys(['openai', 'gemini', 'ollama'])
experiment.grouped_experiment_prompts_summary()
{'openai': '4 queries at 20 queries per minute', 'gemini': '4 queries at 10 queries per minute', 'ollama': '4 queries at 5 queries per minute'}
We can see now that we have the same grouping (by API type) as above, but the rate limits for gemini
and openai
have been set to 20
and 10
respectively. When processing the experiment, we will send the "gemini" prompts at a rate of 20 prompts per minute and the "openai" prompts at a rate of 10 prompts per minute. We have not specified the ollama
rate limit and so it will be set to the default rate limit of 5
which was passed into the Settings
object via the max_queries
argument.
Different rate limits for each API type and model¶
For this example, we have different models within each API. For gemini
, we have "gemini-1.0-pro"
and "gemini-1.5-pro"
, for openai
, we have "gpt3.5-turbo"
and "gpt4"
and for ollama
, we have "llaam3"
and "mistral"
.
To specify model-specific rate limits, instead of passing in an integer value for an API type like above, we can actually pass in another dictionary where the keys are model names and the values are the rate limits for that model, i.e. max_queries_dict
can be a nested dictionary. Note that we do not need to specify rates for every model but only for the models we want to specify rates for. Everything else will be set to the default rate limit.
We can see how this is done below:
max_queries_dict = {
"gemini": {"gemini-1.5-pro": 20},
"openai": {"gpt4": 10, "gpt3.5-turbo": 20},
}
settings = Settings(
data_folder=data_folder,
max_queries=5,
max_queries_dict=max_queries_dict,
parallel=True,
)
print(settings)
Settings: data_folder=parallel_data_example, max_queries=5, max_attempts=3, parallel=True, max_queries_dict={'gemini': {'gemini-1.5-pro': 20}, 'openai': {'gpt4': 10, 'gpt3.5-turbo': 20}} Subfolders: input_folder=parallel_data_example/input, output_folder=parallel_data_example/output, media_folder=parallel_data_example/media
In this example, we are specifying that "gemini-1.5-pro"
from the gemini
API should have a rate limit of 20
, the "gpt4"
and "gpt3.5-turbo"
models from the openai
API should have rate limits of 10
and 20
respectively. Everything else will be set to the default rate limit of 5
.
Let's now see how the prompts are grouped and what the grouped_experiment_prompts
attribute looks like:
experiment = Experiment(file_name="documentation_example.jsonl", settings=settings)
experiment.grouped_experiment_prompts
{'gemini-gemini-1.5-pro': {'prompt_dicts': [{'id': 2, 'api': 'gemini', 'model_name': 'gemini-1.5-pro', 'prompt': 'What is the capital of France?'}, {'id': 3, 'api': 'gemini', 'model_name': 'gemini-1.5-pro', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 20}, 'openai-gpt4': {'prompt_dicts': [{'id': 6, 'api': 'openai', 'model_name': 'gpt4', 'prompt': 'What is the capital of France?'}, {'id': 7, 'api': 'openai', 'model_name': 'gpt4', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 10}, 'openai-gpt3.5-turbo': {'prompt_dicts': [{'id': 4, 'api': 'openai', 'model_name': 'gpt3.5-turbo', 'prompt': 'What is the capital of France?'}, {'id': 5, 'api': 'openai', 'model_name': 'gpt3.5-turbo', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 20}, 'gemini': {'prompt_dicts': [{'id': 0, 'api': 'gemini', 'model_name': 'gemini-1.0-pro', 'prompt': 'What is the capital of France?'}, {'id': 1, 'api': 'gemini', 'model_name': 'gemini-1.0-pro', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 5}, 'openai': {'prompt_dicts': [], 'rate_limit': 5}, 'ollama': {'prompt_dicts': [{'id': 8, 'api': 'ollama', 'model_name': 'llama3', 'prompt': 'What is the capital of France?'}, {'id': 9, 'api': 'ollama', 'model_name': 'llama3', 'prompt': 'What is the capital of Germany?'}, {'id': 10, 'api': 'ollama', 'model_name': 'mistral', 'prompt': 'What is the capital of France?'}, {'id': 11, 'api': 'ollama', 'model_name': 'mistral', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 5}}
experiment.grouped_experiment_prompts_summary()
{'gemini-gemini-1.5-pro': '2 queries at 20 queries per minute', 'openai-gpt4': '2 queries at 10 queries per minute', 'openai-gpt3.5-turbo': '2 queries at 20 queries per minute', 'gemini': '2 queries at 5 queries per minute', 'openai': '0 queries at 5 queries per minute', 'ollama': '4 queries at 5 queries per minute'}
As noted above, when we group the prompts, we are actually just looping over the prompts in the experiment and looking at the "api"
key (if the "group"
key is not present). We then look at the "model_name"
key if it is present and if a rate limit has been specified for that model, we add it to a model-specific group. If no rate limit has been specified for that model, we add it to the default group for that API.
For gemini
, we can see that we have a model-specific group for "gemini-1.5-pro"
called "gemini-gemini-1.5-pro"
and we have two queries for that model and this has a rate limit of 20 queries per minute as specified by the max_queries_dict
above. We also have a default group for gemini
called "gemini"
which catches all other gemini
prompts. We did not specify any default rate limit for gemini
and so it will be set to the default rate limit of 5
.
For openai
, we can see that we have two model-specific groups for "gpt4"
and "gpt3.5-turbo"
called "openai-gpt4"
and "openai-gpt3.5-turbo"
respectively which have the correct rate limits as specified by the max_queries_dict
above. We also have a default group for openai
called "openai"
which catches all other openai
prompts. For this experiment file, there are no other openai
prompts and so this group is empty.
Finally, we still have the group of ollama
prompts which is called "ollama"
and this has the default rate limit of 5
.
Default rate limits for APIs¶
If we want to specify the default rate limit for a given API type, we can do this by specifying a rate limit for "default"
in the max_queries_dict
. This will set the default rate limit for the API which will include all prompts that do not have a model-specific rate limit. We can see how this is done below:
Note for specifying the ollama
API, writing "ollama": 4
is equivalent to writing "ollama": {"default": 4}
.
max_queries_dict = {
"gemini": {"default": 30, "gemini-1.5-pro": 20},
"openai": {"gpt4": 10, "gpt3.5-turbo": 20},
"ollama": 4,
}
settings = Settings(
data_folder=data_folder,
max_queries=5,
max_queries_dict=max_queries_dict,
parallel=True,
)
print(settings)
Settings: data_folder=parallel_data_example, max_queries=5, max_attempts=3, parallel=True, max_queries_dict={'gemini': {'default': 30, 'gemini-1.5-pro': 20}, 'openai': {'gpt4': 10, 'gpt3.5-turbo': 20}, 'ollama': 4} Subfolders: input_folder=parallel_data_example/input, output_folder=parallel_data_example/output, media_folder=parallel_data_example/media
We can see now that the rate limits for gemini
and ollama
have been specified as 20
and 4
respectively:
experiment = Experiment(file_name="documentation_example.jsonl", settings=settings)
experiment.grouped_experiment_prompts_summary()
{'gemini': '2 queries at 30 queries per minute', 'gemini-gemini-1.5-pro': '2 queries at 20 queries per minute', 'openai-gpt4': '2 queries at 10 queries per minute', 'openai-gpt3.5-turbo': '2 queries at 20 queries per minute', 'ollama': '4 queries at 4 queries per minute', 'openai': '0 queries at 5 queries per minute'}
Specifying models or APIs that don't exist in the experiment file¶
Note that if you specify a API/group or model that does not exist in the experiment file, there will be a group/queue created for that API/group or model but it will be empty:
max_queries_dict = {
"gemini": {"default": 30, "gemini-1.5-pro": 20},
"openai": {"gpt4": 10, "gpt3.5-turbo": 20},
"ollama": {
"llama3": 3,
"mistral": 3,
"unknown-model": 4,
},
"unknown-group-or-api": 25,
}
settings = Settings(
data_folder=data_folder,
max_queries=5,
max_queries_dict=max_queries_dict,
parallel=True,
)
print(settings)
Settings: data_folder=parallel_data_example, max_queries=5, max_attempts=3, parallel=True, max_queries_dict={'gemini': {'default': 30, 'gemini-1.5-pro': 20}, 'openai': {'gpt4': 10, 'gpt3.5-turbo': 20}, 'ollama': {'llama3': 3, 'mistral': 3, 'unknown-model': 4}, 'unknown-group-or-api': 25} Subfolders: input_folder=parallel_data_example/input, output_folder=parallel_data_example/output, media_folder=parallel_data_example/media
experiment = Experiment(file_name="documentation_example.jsonl", settings=settings)
experiment.grouped_experiment_prompts_summary()
{'gemini': '2 queries at 30 queries per minute', 'gemini-gemini-1.5-pro': '2 queries at 20 queries per minute', 'openai-gpt4': '2 queries at 10 queries per minute', 'openai-gpt3.5-turbo': '2 queries at 20 queries per minute', 'ollama-llama3': '2 queries at 3 queries per minute', 'ollama-mistral': '2 queries at 3 queries per minute', 'ollama-unknown-model': '0 queries at 4 queries per minute', 'unknown-group-or-api': '0 queries at 25 queries per minute', 'openai': '0 queries at 5 queries per minute', 'ollama': '0 queries at 5 queries per minute'}
Full control: Using the "groups" key to define user-specified groups of prompts¶
In some cases, we may want to group prompts manually. This can be done by using the "group"
key in the experiment file. We now look at the documentation_example_groups_1.jsonl experiment file which has prompts for three different APIs (gemini
, openai
and ollama
) for 6 different models (gemini-1.0-pro
, gemini-1.5-pro
, gpt3.5-turbo
, gpt4
, llaam3
and mistral
). We have manually grouped the prompts into three groups: "group1"
, "group2"
and "group3"
.
Note that when specifying the "group"
key, the prompts will be grouped by this key and not by the "api"
key. We can see how the prompts are grouped below:
with open(f"{data_folder}/input/documentation_example_groups_1.jsonl", "r") as f:
print(f.read())
{"id": 0, "api": "gemini", "model_name": "gemini-1.0-pro", "prompt": "What is the capital of France?", "group": "group1"} {"id": 1, "api": "gemini", "model_name": "gemini-1.0-pro", "prompt": "What is the capital of Germany?", "group": "group2"} {"id": 2, "api": "gemini", "model_name": "gemini-1.5-pro", "prompt": "What is the capital of France?", "group": "group1"} {"id": 3, "api": "gemini", "model_name": "gemini-1.5-pro", "prompt": "What is the capital of Germany?", "group": "group2"} {"id": 4, "api": "openai", "model_name": "gpt3.5-turbo", "prompt": "What is the capital of France?", "group": "group1"} {"id": 5, "api": "openai", "model_name": "gpt3.5-turbo", "prompt": "What is the capital of Germany?", "group": "group2"} {"id": 6, "api": "openai", "model_name": "gpt4", "prompt": "What is the capital of France?", "group": "group1"} {"id": 7, "api": "openai", "model_name": "gpt4", "prompt": "What is the capital of Germany?", "group": "group2"} {"id": 8, "api": "ollama", "model_name": "llama3", "prompt": "What is the capital of France?", "group": "group3"} {"id": 9, "api": "ollama", "model_name": "llama3", "prompt": "What is the capital of Germany?", "group": "group3"} {"id": 10, "api": "ollama", "model_name": "mistral", "prompt": "What is the capital of France?", "group": "group3"} {"id": 11, "api": "ollama", "model_name": "mistral", "prompt": "What is the capital of Germany?", "group": "group3"}
Setting rate limits for groups works in the exact same way as setting rate limits for APIs. We simply pass in a dictionary where the keys are the group names and the values are the rate limits for that group. We can see how this is done below:
max_queries_dict = {"group1": 5, "group2": 10, "group3": 15}
settings = Settings(data_folder=data_folder, max_queries=5, parallel=True)
print(settings)
Settings: data_folder=parallel_data_example, max_queries=5, max_attempts=3, parallel=True, max_queries_dict={} Subfolders: input_folder=parallel_data_example/input, output_folder=parallel_data_example/output, media_folder=parallel_data_example/media
experiment = Experiment(
file_name="documentation_example_groups_1.jsonl", settings=settings
)
experiment.grouped_experiment_prompts
{'group1': {'prompt_dicts': [{'id': 0, 'api': 'gemini', 'model_name': 'gemini-1.0-pro', 'prompt': 'What is the capital of France?', 'group': 'group1'}, {'id': 2, 'api': 'gemini', 'model_name': 'gemini-1.5-pro', 'prompt': 'What is the capital of France?', 'group': 'group1'}, {'id': 4, 'api': 'openai', 'model_name': 'gpt3.5-turbo', 'prompt': 'What is the capital of France?', 'group': 'group1'}, {'id': 6, 'api': 'openai', 'model_name': 'gpt4', 'prompt': 'What is the capital of France?', 'group': 'group1'}], 'rate_limit': 5}, 'group2': {'prompt_dicts': [{'id': 1, 'api': 'gemini', 'model_name': 'gemini-1.0-pro', 'prompt': 'What is the capital of Germany?', 'group': 'group2'}, {'id': 3, 'api': 'gemini', 'model_name': 'gemini-1.5-pro', 'prompt': 'What is the capital of Germany?', 'group': 'group2'}, {'id': 5, 'api': 'openai', 'model_name': 'gpt3.5-turbo', 'prompt': 'What is the capital of Germany?', 'group': 'group2'}, {'id': 7, 'api': 'openai', 'model_name': 'gpt4', 'prompt': 'What is the capital of Germany?', 'group': 'group2'}], 'rate_limit': 5}, 'group3': {'prompt_dicts': [{'id': 8, 'api': 'ollama', 'model_name': 'llama3', 'prompt': 'What is the capital of France?', 'group': 'group3'}, {'id': 9, 'api': 'ollama', 'model_name': 'llama3', 'prompt': 'What is the capital of Germany?', 'group': 'group3'}, {'id': 10, 'api': 'ollama', 'model_name': 'mistral', 'prompt': 'What is the capital of France?', 'group': 'group3'}, {'id': 11, 'api': 'ollama', 'model_name': 'mistral', 'prompt': 'What is the capital of Germany?', 'group': 'group3'}], 'rate_limit': 5}}
experiment.grouped_experiment_prompts_summary()
{'group1': '4 queries at 5 queries per minute', 'group2': '4 queries at 5 queries per minute', 'group3': '4 queries at 5 queries per minute'}
Mixing using the "api" and "group" keys to define groups¶
It is possible to have an experiment file where only some of the prompts have a "group"
key. We consider one here in the documentation_example_groups_2.jsonl experiment file:
with open(f"{data_folder}/input/documentation_example_groups_2.jsonl", "r") as f:
print(f.read())
{"id": 0, "api": "gemini", "model_name": "gemini-1.0-pro", "prompt": "What is the capital of France?"} {"id": 1, "api": "gemini", "model_name": "gemini-1.0-pro", "prompt": "What is the capital of Germany?"} {"id": 2, "api": "gemini", "model_name": "gemini-1.5-pro", "prompt": "What is the capital of France?"} {"id": 3, "api": "gemini", "model_name": "gemini-1.5-pro", "prompt": "What is the capital of Germany?"} {"id": 4, "api": "openai", "model_name": "gpt3.5-turbo", "prompt": "What is the capital of France?"} {"id": 5, "api": "openai", "model_name": "gpt3.5-turbo", "prompt": "What is the capital of Germany?"} {"id": 6, "api": "openai", "model_name": "gpt4", "prompt": "What is the capital of France?"} {"id": 7, "api": "openai", "model_name": "gpt4", "prompt": "What is the capital of Germany?"} {"id": 8, "api": "ollama", "model_name": "llama3", "prompt": "What is the capital of France?", "group": "group1"} {"id": 9, "api": "ollama", "model_name": "llama3", "prompt": "What is the capital of Germany?", "group": "group1"} {"id": 10, "api": "ollama", "model_name": "mistral", "prompt": "What is the capital of France?", "group": "group1"} {"id": 11, "api": "ollama", "model_name": "mistral", "prompt": "What is the capital of Germany?", "group": "group1"} {"id": 12, "api": "ollama", "model_name": "gemma", "prompt": "What is the capital of France?", "group": "group2"} {"id": 13, "api": "ollama", "model_name": "gemma", "prompt": "What is the capital of Germany?", "group": "group2"} {"id": 14, "api": "ollama", "model_name": "phi3", "prompt": "What is the capital of France?", "group": "group2"} {"id": 15, "api": "ollama", "model_name": "phi3", "prompt": "What is the capital of Germany?", "group": "group2"}
As noted above, we first try to place prompts into the right groups based on the "group"
key and then based on the "api"
key. We will specify rate limits for two groups here:
max_queries_dict = {
"group1": 5,
"group2": 10,
}
settings = Settings(
data_folder=data_folder,
max_queries=5,
max_queries_dict=max_queries_dict,
parallel=True,
)
print(settings)
Settings: data_folder=parallel_data_example, max_queries=5, max_attempts=3, parallel=True, max_queries_dict={'group1': 5, 'group2': 10} Subfolders: input_folder=parallel_data_example/input, output_folder=parallel_data_example/output, media_folder=parallel_data_example/media
We can see that the prompts with "group" keys are placed within their respective groups and the remaining prompts are grouped by their "api" key (either gemini
or openai
in this case).
experiment = Experiment(
file_name="documentation_example_groups_2.jsonl", settings=settings
)
experiment.grouped_experiment_prompts
{'group1': {'prompt_dicts': [{'id': 8, 'api': 'ollama', 'model_name': 'llama3', 'prompt': 'What is the capital of France?', 'group': 'group1'}, {'id': 9, 'api': 'ollama', 'model_name': 'llama3', 'prompt': 'What is the capital of Germany?', 'group': 'group1'}, {'id': 10, 'api': 'ollama', 'model_name': 'mistral', 'prompt': 'What is the capital of France?', 'group': 'group1'}, {'id': 11, 'api': 'ollama', 'model_name': 'mistral', 'prompt': 'What is the capital of Germany?', 'group': 'group1'}], 'rate_limit': 5}, 'group2': {'prompt_dicts': [{'id': 12, 'api': 'ollama', 'model_name': 'gemma', 'prompt': 'What is the capital of France?', 'group': 'group2'}, {'id': 13, 'api': 'ollama', 'model_name': 'gemma', 'prompt': 'What is the capital of Germany?', 'group': 'group2'}, {'id': 14, 'api': 'ollama', 'model_name': 'phi3', 'prompt': 'What is the capital of France?', 'group': 'group2'}, {'id': 15, 'api': 'ollama', 'model_name': 'phi3', 'prompt': 'What is the capital of Germany?', 'group': 'group2'}], 'rate_limit': 10}, 'gemini': {'prompt_dicts': [{'id': 0, 'api': 'gemini', 'model_name': 'gemini-1.0-pro', 'prompt': 'What is the capital of France?'}, {'id': 1, 'api': 'gemini', 'model_name': 'gemini-1.0-pro', 'prompt': 'What is the capital of Germany?'}, {'id': 2, 'api': 'gemini', 'model_name': 'gemini-1.5-pro', 'prompt': 'What is the capital of France?'}, {'id': 3, 'api': 'gemini', 'model_name': 'gemini-1.5-pro', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 5}, 'openai': {'prompt_dicts': [{'id': 4, 'api': 'openai', 'model_name': 'gpt3.5-turbo', 'prompt': 'What is the capital of France?'}, {'id': 5, 'api': 'openai', 'model_name': 'gpt3.5-turbo', 'prompt': 'What is the capital of Germany?'}, {'id': 6, 'api': 'openai', 'model_name': 'gpt4', 'prompt': 'What is the capital of France?'}, {'id': 7, 'api': 'openai', 'model_name': 'gpt4', 'prompt': 'What is the capital of Germany?'}], 'rate_limit': 5}}
experiment.grouped_experiment_prompts_summary()
{'group1': '4 queries at 5 queries per minute', 'group2': '4 queries at 10 queries per minute', 'gemini': '4 queries at 5 queries per minute', 'openai': '4 queries at 5 queries per minute'}
Model-specific rates within groups¶
Specifying model-specific rates within groups works in the exact same way as specifying model-specific rates for APIs. We can see how this is done below:
max_queries_dict = {
"group1": {"llama3": 10},
"group2": 10,
}
settings = Settings(
data_folder=data_folder,
max_queries=5,
max_queries_dict=max_queries_dict,
parallel=True,
)
print(settings)
Settings: data_folder=parallel_data_example, max_queries=5, max_attempts=3, parallel=True, max_queries_dict={'group1': {'llama3': 10}, 'group2': 10} Subfolders: input_folder=parallel_data_example/input, output_folder=parallel_data_example/output, media_folder=parallel_data_example/media
We now we see we have split up group1
further and have a group1-llama3
grouping:
experiment = Experiment(
file_name="documentation_example_groups_2.jsonl", settings=settings
)
experiment.grouped_experiment_prompts_summary()
{'group1-llama3': '2 queries at 10 queries per minute', 'group2': '4 queries at 10 queries per minute', 'gemini': '4 queries at 5 queries per minute', 'openai': '4 queries at 5 queries per minute', 'group1': '2 queries at 5 queries per minute'}