Rephrasing prompts using prompto
¶
We illustrate how we can use prompto
to rephrase prompts. This is useful if you first want to generate a more diverse set of prompts and then use them to generate a more diverse set of completions.
from prompto.settings import Settings
from prompto.experiment import Experiment
from prompto.rephrasal import Rephraser, load_rephrase_folder
from dotenv import load_dotenv
import json
import os
When using prompto
to query models from the Ollama API, lines in our experiment .jsonl
files must have "api": "ollama"
in the prompt dict.
Setting up Ollama locally¶
In this notebook, we assume that you have a local instance of the Ollama API running. For installing Ollama, please refer to the Ollama documentation. Once you have it installed and have it running, e.g. with ollama serve
in the terminal, you can proceed with the following steps.
By default, the address and port that Ollama uses when running is localhost:11434
. When developing this notebook, we were running Ollama locally so we set the OLLAMA_API_ENDPOINT
to http://localhost:11434
. If you are running the server at a different address or port, you can specify with the OLLAMA_API_ENDPOINT
environment variable accordingly as described below.
Downloading models¶
In this notebook and our example experiment file (example_file.jsonl), we have set to query from gemma2
, but we will first rephrase these prompts using llama3.2
- note that Ollama defaults to the smaller versions of these (8B, 2B). You can download these models using the following commands in the terminal:
ollama pull llama3.2
ollama pull gemma2
If you'd prefer to query other models, you can replace the model names in the experiment file with the models you have downloaded. We simply return an error if the model is not found in the Ollama endpoint that is running.
Environment variables¶
For the Ollama API, there are two environment variables that could be set:
OLLAMA_API_ENDPOINT
: the API endpoint for the Ollama API
As mentioned in the environment variables docs, there are also model-specific environment variables too which can be utilised. In particular, if you specify a model_name
key in a prompt dict, one could also specify a OLLAMA_API_ENDPOINT_model_name
environment variable to indicate the API key used for that particular model (where "model_name" is replaced to whatever the corresponding value of the model_name
key is). We will see a concrete example of this later.
To set environment variables, one can simply have these in a .env
file which specifies these environment variables as key-value pairs:
OLLAMA_API_ENDPOINT=<YOUR-OLLAMA-ENDPOINT>
If you make this file, you can run the following which should return True
if it's found one, or False
otherwise:
load_dotenv(dotenv_path=".env")
True
Now, we obtain those values. We raise an error if the OLLAMA_API_ENDPOINT
environment variable hasn't been set:
OLLAMA_API_ENDPOINT = os.environ.get("OLLAMA_API_ENDPOINT")
if OLLAMA_API_ENDPOINT is None:
raise ValueError("OLLAMA_API_ENDPOINT is not set")
else:
print(f"Using OLLAMA_API_ENDPOINT: {OLLAMA_API_ENDPOINT}")
Using OLLAMA_API_ENDPOINT: http://localhost:11434
If you get any errors or warnings in the above two cells, try to fix your .env
file like the example we have above to get these variables set.
The Rephraser
class¶
The Rephraser
class is a class that can be used to generate rephrased/paraphrased versions of a given prompt. To initialise the Rephraser
class, we need to provide the following arguments:
input_prompts
: a list of input prompt dictionaries (a prompt dictionary with a"prompt"
key along with the other standard keys like"id"
,"api"
,"model_name"
, etc.) - this can just be read in from an input.jsonl
filetemplate_prompts
: a list of templates to use for rephrasing the input prompts. There should be{INPUT_PROMPT}
placeholders for which the prompt will be insertedtemplate_settings
: a dictionary where the keys are the identifiers for a particular model for rephrasal and the values are also dictionaries containing the"api"
,"model_name"
, and"parameters"
to specify the LLM to use for rephrasal
Typically, template_prompts
and template_settings
are stored in a rephrase
folder (see the rephrasals documentation for more details), which we can simply load using the load_rephrase_folder
function from prompto
.
We provide an example of such folder here.
To use load_rephrase_folder
, we simply pass in the path to the folder and a list of template .txt
files that we want to load. Here template.txt
is a file in ./rephrase
:
template_prompts, rephrase_settings = load_rephrase_folder(
"./rephrase", templates="template.txt"
)
We can see that the prompt templates have been loaded as a list of strings from template.txt
where each line from that file is a template:
template_prompts
['Write a paraphrase for the following sentence. Only reply with the paraphrased prompt: "{INPUT_PROMPT}"', 'Write a variation of this sentence (only reply with the variation): "{INPUT_PROMPT}"', 'How would you say the following sentence in a different way? Only reply with the different way: "{INPUT_PROMPT}"', 'Rewrite the following task instruction. Just reply with the rewritten task. Make sure to keep the task the same, but vary the wording and setting.\n"{INPUT_PROMPT}"']
As noted above, these have placeholder {INPUT_PROMPT}
which will be replaced with the input prompt from the input prompt dictionaries.
Looking at the rephrase settings, we have given some examples of models that we might want to use for rephrasals which are given a identifier as the key name and the value is a dictionary with the keys "api"
, "model_name"
, and "parameters"
specifying where the model is from, the model name, and the parameters to use for the model respectively. We only have one here:
rephrase_settings
{'ollama-llama3-2': {'api': 'ollama', 'model_name': 'llama3.2', 'parameters': {'temperature': 0}}}
We load in an experiment file here which we load in as a list of dictionaries:
with open("./example_file.jsonl", "r") as f:
input_prompts = [dict(json.loads(line)) for line in f]
input_prompts
[{'id': 'coke_question', 'api': 'ollama', 'model_name': 'gemma2', 'prompt': 'Where can I buy a can of coke?'}, {'id': 'champions_league_question', 'api': 'ollama', 'model_name': 'gemma2', 'prompt': 'Who won the champions league in the year 2008?'}]
Now we can initialise the Rephraser
class with the input prompts, template prompts, and template settings:
rephraser = Rephraser(
input_prompts=input_prompts,
template_prompts=template_prompts,
rephrase_settings=rephrase_settings,
)
We can create some prompts to a model for rephrasing using the create_rephrased_prompts
method. This method just takes in a single argument rephrase_model
which is the identifier for the model we want to use for rephrasing. This has to be a key in the rephrase_settings
dictionary we passed in during initialisation.
rephrase_inputs = rephraser.create_rephrase_inputs(rephrase_model="ollama-llama3-2")
Creating rephrase inputs for rephrase model 'ollama-llama3-2' and template '0': 100%|██████████| 2/2 [00:00<00:00, 25040.62inputs/s] Creating rephrase inputs for rephrase model 'ollama-llama3-2' and template '1': 100%|██████████| 2/2 [00:00<00:00, 47662.55inputs/s] Creating rephrase inputs for rephrase model 'ollama-llama3-2' and template '2': 100%|██████████| 2/2 [00:00<00:00, 60787.01inputs/s] Creating rephrase inputs for rephrase model 'ollama-llama3-2' and template '3': 100%|██████████| 2/2 [00:00<00:00, 66576.25inputs/s]
Given we have two input prompts and 4 templates, we should expect 8 rephrased prompts:
len(rephrase_inputs)
8
To make this an experiment file to run, we can simply write this to a .jsonl
file, but we have a create_rephrase_file
method to do this:
rephraser.create_rephrase_file(
rephrase_model="ollama-llama3-2", out_filepath="./data/input/rephrase-example.jsonl"
)
Creating rephrase inputs for rephrase model 'ollama-llama3-2' and template '0': 100%|██████████| 2/2 [00:00<00:00, 16644.06inputs/s] Creating rephrase inputs for rephrase model 'ollama-llama3-2' and template '1': 100%|██████████| 2/2 [00:00<00:00, 52428.80inputs/s] Creating rephrase inputs for rephrase model 'ollama-llama3-2' and template '2': 100%|██████████| 2/2 [00:00<00:00, 55924.05inputs/s] Creating rephrase inputs for rephrase model 'ollama-llama3-2' and template '3': 100%|██████████| 2/2 [00:00<00:00, 62601.55inputs/s] Writing rephrase prompts to ./data/input/rephrase-example.jsonl: 100%|██████████| 8/8 [00:00<00:00, 63072.24prompts/s]
[{'id': 'rephrase-ollama-llama3-2-0-coke_question', 'template_index': 0, 'prompt': 'Write a paraphrase for the following sentence. Only reply with the paraphrased prompt: "Where can I buy a can of coke?"', 'api': 'ollama', 'model_name': 'llama3.2', 'parameters': {'temperature': 0}, 'input-id': 'coke_question', 'input-api': 'ollama', 'input-model_name': 'gemma2', 'input-prompt': 'Where can I buy a can of coke?'}, {'id': 'rephrase-ollama-llama3-2-0-champions_league_question', 'template_index': 0, 'prompt': 'Write a paraphrase for the following sentence. Only reply with the paraphrased prompt: "Who won the champions league in the year 2008?"', 'api': 'ollama', 'model_name': 'llama3.2', 'parameters': {'temperature': 0}, 'input-id': 'champions_league_question', 'input-api': 'ollama', 'input-model_name': 'gemma2', 'input-prompt': 'Who won the champions league in the year 2008?'}, {'id': 'rephrase-ollama-llama3-2-1-coke_question', 'template_index': 1, 'prompt': 'Write a variation of this sentence (only reply with the variation): "Where can I buy a can of coke?"', 'api': 'ollama', 'model_name': 'llama3.2', 'parameters': {'temperature': 0}, 'input-id': 'coke_question', 'input-api': 'ollama', 'input-model_name': 'gemma2', 'input-prompt': 'Where can I buy a can of coke?'}, {'id': 'rephrase-ollama-llama3-2-1-champions_league_question', 'template_index': 1, 'prompt': 'Write a variation of this sentence (only reply with the variation): "Who won the champions league in the year 2008?"', 'api': 'ollama', 'model_name': 'llama3.2', 'parameters': {'temperature': 0}, 'input-id': 'champions_league_question', 'input-api': 'ollama', 'input-model_name': 'gemma2', 'input-prompt': 'Who won the champions league in the year 2008?'}, {'id': 'rephrase-ollama-llama3-2-2-coke_question', 'template_index': 2, 'prompt': 'How would you say the following sentence in a different way? Only reply with the different way: "Where can I buy a can of coke?"', 'api': 'ollama', 'model_name': 'llama3.2', 'parameters': {'temperature': 0}, 'input-id': 'coke_question', 'input-api': 'ollama', 'input-model_name': 'gemma2', 'input-prompt': 'Where can I buy a can of coke?'}, {'id': 'rephrase-ollama-llama3-2-2-champions_league_question', 'template_index': 2, 'prompt': 'How would you say the following sentence in a different way? Only reply with the different way: "Who won the champions league in the year 2008?"', 'api': 'ollama', 'model_name': 'llama3.2', 'parameters': {'temperature': 0}, 'input-id': 'champions_league_question', 'input-api': 'ollama', 'input-model_name': 'gemma2', 'input-prompt': 'Who won the champions league in the year 2008?'}, {'id': 'rephrase-ollama-llama3-2-3-coke_question', 'template_index': 3, 'prompt': 'Rewrite the following task instruction. Just reply with the rewritten task. Make sure to keep the task the same, but vary the wording and setting.\n"Where can I buy a can of coke?"', 'api': 'ollama', 'model_name': 'llama3.2', 'parameters': {'temperature': 0}, 'input-id': 'coke_question', 'input-api': 'ollama', 'input-model_name': 'gemma2', 'input-prompt': 'Where can I buy a can of coke?'}, {'id': 'rephrase-ollama-llama3-2-3-champions_league_question', 'template_index': 3, 'prompt': 'Rewrite the following task instruction. Just reply with the rewritten task. Make sure to keep the task the same, but vary the wording and setting.\n"Who won the champions league in the year 2008?"', 'api': 'ollama', 'model_name': 'llama3.2', 'parameters': {'temperature': 0}, 'input-id': 'champions_league_question', 'input-api': 'ollama', 'input-model_name': 'gemma2', 'input-prompt': 'Who won the champions league in the year 2008?'}]
Notice how the "api"
and "model"
keys are set to the values from the rephrase_settings
dictionary we passed in during initialisation as this defines the model we want to use for rephrasal. Each prompt dictionary also has "input-api"
, "input-model_name"
and other things from the original input prompt dictionary too so that we know what model we originally wanted to send that prompt to before rephrasing.
Running the rephrasal experiment¶
We can run the rephrasal experiment as usual (see the Running experiments with prompto notebook for more details on running experiments).
settings = Settings(data_folder="./data", max_queries=30)
experiment = Experiment(file_name="rephrase-example.jsonl", settings=settings)
responses, _ = await experiment.process()
Sending 8 queries at 30 QPM with RI of 2.0s (attempt 1/3): 100%|██████████| 8/8 [00:16<00:00, 2.00s/query] Waiting for responses (attempt 1/3): 100%|██████████| 8/8 [00:00<00:00, 12.62query/s]
The responses of these prompts should be rephrased versions of the input prompts:
[x["response"] for x in responses]
['"Can I find a Coca-Cola in a local store or supermarket?"', '"Which team claimed the Champions League title in 2008?"', '"Can I purchase a cold can of Coca-Cola at your convenience?"', '"Which team claimed the Champions League title in 2008?"', '"Can I purchase a Coca-Cola from around here?"', '"Which team lifted the Champions League trophy that year?"', '"In what retail establishment or convenience store can I procure a single serving of Coca-Cola in a glass bottle?"', '"What was the victor of the prestigious European club football competition in the calendar year 2008?"']
Creating and running a new rephrased input file¶
We can create a new input file with the rephrased prompts with the create_new_input_file
method. This method takes in a list of completed responses from the rephrasing experiment. We create a new input file where we send prompts to the original API and model we wanted to send to before rephrasing.
Note there is also a keep_original
argument. If this is True, the original prompts are kept in the new input file. If False, the original prompts are not included and so only the rephrased prompts are in the new input file.
rephraser.create_new_input_file(
keep_original=True,
completed_rephrase_responses=experiment.completed_responses,
out_filepath="./data/input/post-rephrase-example.jsonl",
)
Writing new input prompts to ./data/input/post-rephrase-example.jsonl: 100%|██████████| 10/10 [00:00<00:00, 85423.71prompts/s]
[{'id': 'rephrase-ollama-llama3-2-0-coke_question', 'prompt': '"Can I find a Coca-Cola in a local store or supermarket?"', 'input-prompt': 'Where can I buy a can of coke?', 'input-id': 'coke_question', 'api': 'ollama', 'model_name': 'gemma2'}, {'id': 'rephrase-ollama-llama3-2-0-champions_league_question', 'prompt': '"Which team claimed the Champions League title in 2008?"', 'input-prompt': 'Who won the champions league in the year 2008?', 'input-id': 'champions_league_question', 'api': 'ollama', 'model_name': 'gemma2'}, {'id': 'rephrase-ollama-llama3-2-1-coke_question', 'prompt': '"Can I purchase a cold can of Coca-Cola at your convenience?"', 'input-prompt': 'Where can I buy a can of coke?', 'input-id': 'coke_question', 'api': 'ollama', 'model_name': 'gemma2'}, {'id': 'rephrase-ollama-llama3-2-1-champions_league_question', 'prompt': '"Which team claimed the Champions League title in 2008?"', 'input-prompt': 'Who won the champions league in the year 2008?', 'input-id': 'champions_league_question', 'api': 'ollama', 'model_name': 'gemma2'}, {'id': 'rephrase-ollama-llama3-2-2-coke_question', 'prompt': '"Can I purchase a Coca-Cola from around here?"', 'input-prompt': 'Where can I buy a can of coke?', 'input-id': 'coke_question', 'api': 'ollama', 'model_name': 'gemma2'}, {'id': 'rephrase-ollama-llama3-2-2-champions_league_question', 'prompt': '"Which team lifted the Champions League trophy that year?"', 'input-prompt': 'Who won the champions league in the year 2008?', 'input-id': 'champions_league_question', 'api': 'ollama', 'model_name': 'gemma2'}, {'id': 'rephrase-ollama-llama3-2-3-coke_question', 'prompt': '"In what retail establishment or convenience store can I procure a single serving of Coca-Cola in a glass bottle?"', 'input-prompt': 'Where can I buy a can of coke?', 'input-id': 'coke_question', 'api': 'ollama', 'model_name': 'gemma2'}, {'id': 'rephrase-ollama-llama3-2-3-champions_league_question', 'prompt': '"What was the victor of the prestigious European club football competition in the calendar year 2008?"', 'input-prompt': 'Who won the champions league in the year 2008?', 'input-id': 'champions_league_question', 'api': 'ollama', 'model_name': 'gemma2'}, {'id': 'coke_question', 'api': 'ollama', 'model_name': 'gemma2', 'prompt': 'Where can I buy a can of coke?', 'input-id': 'coke_question'}, {'id': 'champions_league_question', 'api': 'ollama', 'model_name': 'gemma2', 'prompt': 'Who won the champions league in the year 2008?', 'input-id': 'champions_league_question'}]
Given we have two prompts originally and 8 rephrased prompts, we should expect 10 prompts in the new input file.
We can run this rephrased experiment as usual:
rephrased_experiment = Experiment(
file_name="post-rephrase-example.jsonl", settings=settings
)
rephrased_responses, _ = await rephrased_experiment.process()
Sending 10 queries at 30 QPM with RI of 2.0s (attempt 1/3): 100%|██████████| 10/10 [00:20<00:00, 2.00s/query] Waiting for responses (attempt 1/3): 100%|██████████| 10/10 [00:15<00:00, 1.53s/query]
rephrased_responses
[{'id': 'rephrase-ollama-llama3-2-1-champions_league_question', 'prompt': '"Which team claimed the Champions League title in 2008?"', 'input-prompt': 'Who won the champions league in the year 2008?', 'input-id': 'champions_league_question', 'api': 'ollama', 'model_name': 'gemma2', 'timestamp_sent': '15-11-2024-12-23-32', 'response': 'Manchester United claimed the Champions League title in 2008. 🏆 \n'}, {'id': 'rephrase-ollama-llama3-2-0-champions_league_question', 'prompt': '"Which team claimed the Champions League title in 2008?"', 'input-prompt': 'Who won the champions league in the year 2008?', 'input-id': 'champions_league_question', 'api': 'ollama', 'model_name': 'gemma2', 'timestamp_sent': '15-11-2024-12-23-28', 'response': '**Manchester United** claimed the Champions League title in 2008. They defeated Chelsea 6-5 on penalties after a 1-1 draw in the final. \n'}, {'id': 'rephrase-ollama-llama3-2-2-champions_league_question', 'prompt': '"Which team lifted the Champions League trophy that year?"', 'input-prompt': 'Who won the champions league in the year 2008?', 'input-id': 'champions_league_question', 'api': 'ollama', 'model_name': 'gemma2', 'timestamp_sent': '15-11-2024-12-23-36', 'response': 'Please tell me which year you are referring to so I can answer your question! 🏆 \n'}, {'id': 'rephrase-ollama-llama3-2-1-coke_question', 'prompt': '"Can I purchase a cold can of Coca-Cola at your convenience?"', 'input-prompt': 'Where can I buy a can of coke?', 'input-id': 'coke_question', 'api': 'ollama', 'model_name': 'gemma2', 'timestamp_sent': '15-11-2024-12-23-30', 'response': "As an AI, I don't have a physical body or the ability to interact with the physical world. This means I can't purchase items for you, including a can of Coca-Cola.\n\nYou would need to visit a store or use a delivery service to get a cold can of Coca-Cola. 😊 \n"}, {'id': 'rephrase-ollama-llama3-2-0-coke_question', 'prompt': '"Can I find a Coca-Cola in a local store or supermarket?"', 'input-prompt': 'Where can I buy a can of coke?', 'input-id': 'coke_question', 'api': 'ollama', 'model_name': 'gemma2', 'timestamp_sent': '15-11-2024-12-23-26', 'response': "I can't give you real-time information about what's available at your local stores. \n\nTo find out if they have Coca-Cola:\n\n* **Check the store's website:** Many supermarkets list their inventory online.\n* **Use a grocery delivery app:** Apps like Instacart or Shipt can tell you which stores near you have Coca-Cola in stock.\n* **Call the store directly:** This is the most direct way to ask if they have what you need. \n\n\nGood luck finding your Coke! 🥤 \n"}, {'id': 'rephrase-ollama-llama3-2-3-champions_league_question', 'prompt': '"What was the victor of the prestigious European club football competition in the calendar year 2008?"', 'input-prompt': 'Who won the champions league in the year 2008?', 'input-id': 'champions_league_question', 'api': 'ollama', 'model_name': 'gemma2', 'timestamp_sent': '15-11-2024-12-23-40', 'response': 'The victor of the prestigious European club football competition (UEFA Champions League) in the calendar year 2008 was **Manchester United**. \n\nThey defeated Chelsea on penalties after a 1-1 draw in the final held in Moscow, Russia. \n'}, {'id': 'rephrase-ollama-llama3-2-2-coke_question', 'prompt': '"Can I purchase a Coca-Cola from around here?"', 'input-prompt': 'Where can I buy a can of coke?', 'input-id': 'coke_question', 'api': 'ollama', 'model_name': 'gemma2', 'timestamp_sent': '15-11-2024-12-23-34', 'response': "As an AI, I don't have access to real-world information like store locations or inventory. \n\nTo find out if you can buy a Coca-Cola nearby, I recommend:\n\n* **Checking online maps:** Google Maps or Apple Maps can show you nearby convenience stores, grocery stores, and restaurants that likely sell Coca-Cola.\n* **Using a delivery app:** Apps like Uber Eats, DoorDash, or Grubhub allow you to order food and drinks from local businesses, including Coca-Cola.\n\n\nGood luck finding your Coke! 🥤"}, {'id': 'champions_league_question', 'api': 'ollama', 'model_name': 'gemma2', 'prompt': 'Who won the champions league in the year 2008?', 'input-id': 'champions_league_question', 'timestamp_sent': '15-11-2024-12-23-44', 'response': "**Manchester United** won the Champions League in 2008. \n\nThey defeated Chelsea 6-5 on penalties after a 1-1 draw in the final held at Moscow's Luzhniki Stadium. \n"}, {'id': 'rephrase-ollama-llama3-2-3-coke_question', 'prompt': '"In what retail establishment or convenience store can I procure a single serving of Coca-Cola in a glass bottle?"', 'input-prompt': 'Where can I buy a can of coke?', 'input-id': 'coke_question', 'api': 'ollama', 'model_name': 'gemma2', 'timestamp_sent': '15-11-2024-12-23-38', 'response': "This is tricky! \n\nWhile many places sell cans and multi-packs of Coke, finding single glass bottles can be harder. Here's where you might look:\n\n* **Classic Soda Shops:** Look for retro diners or soda fountains - they often have vintage Coke in glass bottles.\n* **Specialty Stores:** Some gourmet food stores or markets might carry them, especially if they focus on local or regional products.\n* **Convenience Stores:** While less common, some smaller, independently owned convenience stores might still stock single-serve glass bottles. It depends heavily on your location and the store's inventory.\n* **Online Retailers:** Sites like Amazon or specialty soda retailers often sell vintage or collectible Coke glass bottles individually.\n\n\nGood luck with your quest for the classic Coke experience! 🥤 \n"}, {'id': 'coke_question', 'api': 'ollama', 'model_name': 'gemma2', 'prompt': 'Where can I buy a can of coke?', 'input-id': 'coke_question', 'timestamp_sent': '15-11-2024-12-23-42', 'response': "As an AI, I don't have access to real-time information like store inventories. To find out where you can buy a can of Coke, I recommend:\n\n* **Checking nearby convenience stores or gas stations.** These are usually good places to find Coca-Cola products.\n* **Looking at grocery store websites or apps.** Many grocery stores list their inventory online, so you can check if they have Coke in stock before you go.\n* **Using a delivery service like Instacart or Uber Eats.** You can order Coke and other groceries to be delivered to your home.\n\n\nHope this helps! \n"}]
Using prompto
from the command line¶
Creating and running the rephrasal experiment file¶
We can run a rephrasal automatically before when running the experiment by using the prompto_run_experiment
command:
prompto_run_experiment \
--file example_file.jsonl \
--max-queries 30 \
--rephrase-folder rephrase \
--rephrase-templates template.txt \
--rephrase-model ollama-llama3-2
This first runs a rephrasal experiment like we saw above and uses those outputs to generate a new input file with rephrased inputs. It will store the final results in a post-rephrase-example_file
folder in the output folder. The outputs of the rephrase experiment are stored in a rephrase-example_file
folder in the output folder.
There is also a --remove-original
flag which can be used to remove the original prompts from the new input file (and only have the rephrased prompts).
Additionally, there is a --only-rephrase
flag which indicates that only the rephrasal experiment should be run and no further experiments should be run. In that case, only the rephrasal experiment is run and the outputs are stored in a rephrase-example_file
folder in the output folder.