2 Chaining prompts and evaluation

Chaining inputs and outputs, evaluating responses.

# Install the packages
! pip install --upgrade google-cloud-aiplatform
! pip install shapely<2.0.0

# Automatically restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

If you’re on Colab, run the following cell to authenticate

from google.colab import auth
auth.authenticate_user()

import vertexai
from vertexai.preview.language_models import ChatModel, InputOutputTextPair

# Replace the project and location placeholder values below
vertexai.init(project="<..>", location="<..>")
chat_model = ChatModel.from_pretrained("chat-bison@001")
parameters = {
    "temperature": 0.2,
    "max_output_tokens": 256,
    "top_p": 0.8,
    "top_k": 40
}

We will switch to a json file soon. For now, here’s our products text again.

products = """
name: Caffeino Classic
category: Espresso Machines
brand: EliteBrew
model_number: EB-1001
warranty: 2 years
rating: 4.6/5 stars
features:
  15-bar pump for authentic espresso extraction.
  Milk frother for creating creamy cappuccinos and lattes.
  Removable water reservoir for easy refilling.
description: The Caffeino Classic by EliteBrew is a powerful espresso machine that delivers rich and flavorful shots of espresso with the convenience of a built-in milk frother, perfect for indulging in your favorite cafe-style beverages at home.
price: £179.99

name: BeanPresso
category: Single Serve Coffee Makers
brand: FreshBrew
model_number: FB-500
warranty: 1 year
rating: 4.3/5 stars
features:
  Compact design ideal for small spaces or travel.
  Compatible with various coffee pods for quick and easy brewing.
  Auto-off feature for energy efficiency and safety.
description: The BeanPresso by FreshBrew is a compact single-serve coffee maker that allows you to enjoy a fresh cup of coffee effortlessly using your favorite coffee pods, making it the perfect companion for those with limited space or always on the go.
price: £49.99

name: BrewBlend Pro
category: Drip Coffee Makers
brand: MasterRoast
model_number: MR-800
warranty: 3 years
rating: 4.7/5 stars
features:
  Adjustable brew strength for customized coffee flavor.
  Large LCD display with programmable timer for convenient brewing.
  Anti-drip system to prevent messes on the warming plate.
description: The BrewBlend Pro by MasterRoast offers a superior brewing experience with adjustable brew strength, programmable timer, and anti-drip system, ensuring a perfectly brewed cup of coffee every time, making mornings more enjoyable.
price: £89.99

name: SteamGenie
category: Stovetop Coffee Makers
brand: KitchenWiz
model_number: KW-200
warranty: 2 years
rating: 4.4/5 stars
features:
  Classic Italian stovetop design for rich and aromatic coffee.
  Durable stainless steel construction for long-lasting performance.
  Available in multiple sizes to suit different brewing needs.
description: The SteamGenie by KitchenWiz is a traditional stovetop coffee maker that harnesses the essence of Italian coffee culture, crafted with durable stainless steel and delivering a rich, authentic coffee experience with every brew.
price: £39.99

name: AeroBlend Max
category: Coffee and Espresso Combo Machines
brand: AeroGen
model_number: AG-1200
warranty: 2 years
rating: 4.9/5 stars
features:
  Dual-functionality for brewing coffee and espresso.
  Built-in burr grinder for fresh coffee grounds.
  Adjustable temperature and brew strength settings for personalized beverages.
description: The AeroBlend Max by AeroGen is a versatile coffee and espresso combo machine that combines the convenience of brewing both coffee and espresso with a built-in grinder,
allowing you to enjoy the perfect cup of your preferred caffeinated delight with ease.
price: £299.99
"""

As in earlier notebooks, delimiters help us isolate the inputs and responses.

Here, we give the model specific to output recommendations as a python dictionary, which will help with post-processing tasks (eg adding to a shopping cart).

We also give clear guidelines about the products and categories the model can return. This helps minimize the risk of the model hallucinating coffee machines not part of our catalogue.

delimiter = "####"
context = f"""
You will be provided with customer service queries. \
The customer service query will be delimited with \
{delimiter} characters.
Output a python dictionary of objects, where each object has \
the following format:
    'category': <one of Espresso Machines, \
    Single Serve Coffee Makers, \
    Drip Coffee Makers, \
    Stovetop Coffee Makers,
    Coffee and Espresso Combo Machines>,
AND
    'products': <a list of products that must \
    be found in the allowed products below>

For example,
  'category': 'Coffee and Espresso Combo Machines', 'products': ['AeroBlend Max'],

Where the categories and products must be found in \
the customer service query.
If a product is mentioned, it must be associated with \
the correct category in the allowed products list below.
If no products or categories are found, output an \
empty list.

Allowed products:

Espresso Machines category:
Caffeino Classic

Single Serve Coffee Makers:
BeanPresso

Drip Coffee Makers:
BrewBlend Pro

Stovetop Coffee Makers:
SteamGenie

Coffee and Espresso Combo Machines:
AeroBlend Max

Only output the list of objects, with nothing else.
"""

user_message_1 = f"""
I'd like info about the SteamGenie and the BrewBlend Pro. \
"""

chat = chat_model.start_chat(
    context=context,
    examples=[]
)

response = chat.send_message(user_message_1, **parameters)
print(response.text)

Though it looks like a Python dictionary, our response is a TextGenerationResponse object, so we have a few more steps to convert it into a dict we can use.

type(response)

temp_str = str(response)

temp_str

2.0.1 Products json

Switching from our products string to json will allow us to do more with results

products = {
    "Caffeino Classic": {
      "name": "Caffeino Classic",
      "category": "Espresso Machines",
      "brand": "EliteBrew",
      "model_number": "EB-1001",
      "warranty": "2 years",
      "rating": "4.6/5 stars",
      "features": [
        "15-bar pump for authentic espresso extraction.",
        "Milk frother for creating creamy cappuccinos and lattes.",
        "Removable water reservoir for easy refilling."
      ],
      "description": "The Caffeino Classic by EliteBrew is a powerful espresso machine that delivers rich and flavorful shots of espresso with the convenience of a built-in milk frother, perfect for indulging in your favorite cafe-style beverages at home.",
      "price": "£179.99"
    },
    "BeanPresso": {
      "name": "BeanPresso",
      "category": "Single Serve Coffee Makers",
      "brand": "FreshBrew",
      "model_number": "FB-500",
      "warranty": "1 year",
      "rating": "4.3/5 stars",
      "features": [
        "Compact design ideal for small spaces or travel.",
        "Compatible with various coffee pods for quick and easy brewing.",
        "Auto-off feature for energy efficiency and safety."
      ],
      "description": "The BeanPresso by FreshBrew is a compact single-serve coffee maker that allows you to enjoy a fresh cup of coffee effortlessly using your favorite coffee pods, making it the perfect companion for those with limited space or always on the go.",
      "price": "£49.99"
    },
    "BrewBlend Pro": {
      "name": "BrewBlend Pro",
      "category": "Drip Coffee Makers",
      "brand": "MasterRoast",
      "model_number": "MR-800",
      "warranty": "3 years",
      "rating": "4.7/5 stars",
      "features": [
        "Adjustable brew strength for customized coffee flavor.",
        "Large LCD display with programmable timer for convenient brewing.",
        "Anti-drip system to prevent messes on the warming plate."
      ],
      "description": "The BrewBlend Pro by MasterRoast offers a superior brewing experience with adjustable brew strength, programmable timer, and anti-drip system, ensuring a perfectly brewed cup of coffee every time, making mornings more enjoyable.",
      "price": "£89.99"
    },
    "SteamGenie": {
      "name": "SteamGenie",
      "category": "Stovetop Coffee Makers",
      "brand": "KitchenWiz",
      "model_number": "KW-200",
      "warranty": "2 years",
      "rating": "4.4/5 stars",
      "features": [
        "Classic Italian stovetop design for rich and aromatic coffee.",
        "Durable stainless steel construction for long-lasting performance.",
        "Available in multiple sizes to suit different brewing needs."
      ],
      "description": "The SteamGenie by KitchenWiz is a traditional stovetop coffee maker that harnesses the essence of Italian coffee culture, crafted with durable stainless steel and delivering a rich, authentic coffee experience with every brew.",
      "price": "£39.99"
    },
    "AeroBlend Max": {
      "name": "AeroBlend Max",
      "category": "Coffee and Espresso Combo Machines",
      "brand": "AeroGen",
      "model_number": "AG-1200",
      "warranty": "2 years",
      "rating": "4.9/5 stars",
      "features": [
        "Dual-functionality for brewing coffee and espresso.",
        "Built-in burr grinder for fresh coffee grounds.",
        "Adjustable temperature and brew strength settings for personalized beverages."
      ],
      "description": "The AeroBlend Max by AeroGen is a versatile coffee and espresso combo machine that combines the convenience of brewing both coffee and espresso with a built-in grinder, allowing you to enjoy the perfect cup of your preferred caffeinated delight with ease.",
      "price": "£299.99"
    }
}

def get_products():
    return products

2.0.2 Read Python string into Python list of dictionaries

import json

def read_string_to_list(input_string):
    if input_string is None:
        return None

    try:
        input_string = input_string.replace("'", "\"")  # Replace single quotes with double quotes for valid JSON
        data = json.loads(input_string)
        return data
    except json.JSONDecodeError:
        print("Error: Invalid JSON string")
        return None

category_and_product_list = read_string_to_list(temp_str)
print(category_and_product_list)

2.0.3 Helper functions

Now that our products are in json, we can use various helper functions to render responses into a format more useful than text. For example, we can check the model’s outputs are relevant, or pass the items and their details on to a shopping cart.

2.0.3.1 Note:

These helper functions are from DeepLearning AI’s Building Systems with the ChatGPT API course.

def get_product_by_name(name):
    return products.get(name, None)

def get_products_by_category(category):
    return [product for product in products.values() if product["category"] == category]

def generate_output_string(data_list):
    output_string = ""

    if data_list is None:
        return output_string

    for data in data_list:
        try:
            if "products" in data:
                products_list = data["products"]
                for product_name in products_list:
                    product = get_product_by_name(product_name)
                    if product:
                        output_string += json.dumps(product, indent=4) + "\n"
                    else:
                        print(f"Error: Product '{product_name}' not found")
            elif "category" in data:
                category_name = data["category"]
                category_products = get_products_by_category(category_name)
                for product in category_products:
                    output_string += json.dumps(product, indent=4) + "\n"
            else:
                print("Error: Invalid object format")
        except Exception as e:
            print(f"Error: {e}")

    return output_string

product_information_for_user_message_1 = generate_output_string(category_and_product_list)
print(product_information_for_user_message_1)

context = f"""
You're a customer service assistant for a coffee shop's \
e-commerce site. Our product list can be found in {products}. Respond in a friendly and professional \
tone with concise answers. \
Please ask the user relevant follow-up questions.
"""

user_message_1 = f"""
Tell me about the Brew Blend pro and \
the stovetop coffee maker. \
Also do you have an espresso machine?"""

chat = chat_model.start_chat(
    context=context,
    examples=[]
)

assistant_response = chat.send_message(f"""{user_message_1}{product_information_for_user_message_1}""", **parameters)
print(assistant_response)

2.0.4 Check output

Now that we have our outputs as handly lists and strings, we can add them as inputs for the model to check. This step will become less necessary as models become more sophisticated, and is only recommended for extremely highly sensitive applications since adds cost and latency and may be unnecessary

context = f"""
You are an assistant that evaluates whether \
customer service agent responses sufficiently \
answer customer questions, and also validates that \
all the facts the assistant cites from the product \
information are correct.
The product information and user and customer \
service agent messages will be delimited by \
3 backticks, i.e. ```.
Respond with a Y or N character, with no punctuation:
Y - if the output sufficiently answers the question \
AND the response correctly uses product information
N - otherwise

Output a single letter only.
"""
customer_message = f"""
Tell me all about the Brew Blend pro and \
the stovetop coffee maker - features and pricing. \
Also do you have an espresso machine?"""

q_a_pair = f"""
Customer message: ```{customer_message}```
Product information: ```{product_information_for_user_message_1}```
Agent response: ```{assistant_response}```

Does the response use the retrieved information correctly?
Does the response sufficiently answer the question

Output Y or N
"""

chat = chat_model.start_chat(
    context=context,
    examples=[]
)

response = chat.send_message(f"""{q_a_pair}""")
print(response)

product_info_for_user_message_1 = generate_output_string(category_and_product_list)
print(product_info_for_user_message_1)

2.0.5 Evaluation

def eval_with_rubric(customer_message, assistant_response):

    customer_message = f"""
    Tell me all about the Brew Blend pro and \
    the stovetop coffee maker - features and pricing. \
    I'm also interested in an espresso machine."""

    context = """\
    You are an assistant that evaluates how well the customer service agent \
    answers a user question by looking at the context that the customer service \
    agent is using to generate its response.
    Compare the factual content of the submitted answer with the context. \
    Ignore any differences in style, grammar, or punctuation.
    Answer the following questions:
        - Is the Assistant response based only on the context provided? (Y or N)
        - Does the answer include information that is not provided in the context? (Y or N)
        - Is there any disagreement between the response and the context? (Y or N)
        - Count how many questions the user asked. (output a number)
        - For each question that the user asked, is there a corresponding answer to it?
          Question 1: (Y or N)
          Question 2: (Y or N)
          ...
          Question N: (Y or N)
        - Of the number of questions asked, how many of these questions were addressed by the answer? (output a number)
    """

    user_message = f"""\
    You are evaluating a submitted answer to a question based on the context \
    that the agent uses to answer the question.
    Here is the data:
    [BEGIN DATA]
    ************
    [Question]: {customer_message}
    ************
    [Context]: {context}
    ************
    [Submission]: {assistant_response}
    ************
    [END DATA]
"""
    chat = chat_model.start_chat(
    context=context,
    examples=[]
    )

    response = chat.send_message(user_message, max_output_tokens=1024)
    return response

product_info = product_info_for_user_message_1

customer_product_info = {
    "customer_message": customer_message,
    "context": product_info
}
eval_output = eval_with_rubric(customer_product_info, assistant_response)

print(eval_output)

2.0.6 Evaluate based on an expert human answer

We can write our own example of what an excellent human answer would be, then ask the model to compare its responses with our example.

ideal_example = {
    'customer_message': """\
    Tell me all about the Brew Blend pro and \
    the stovetop coffee maker - features and pricing. \
    I'm also interested in an espresso machine?""",

    'ideal_answer': """\
    The BrewBlend pro is a powerhouse of a drip coffee maker. \
    It offers a superior brewing experience with adjustable \
    brew strength, and anti-drip system. \
    Love your coffee first thing when you wake up? Just set the programmable \
    timer. It's priced at 389.99. \
    The stovetop option is the SteamGenie, a coffee maker crafted with \
    durable stainless steel. The SteamGenie delivers a rich, strong and authentic \
    coffee experience with every brew. \
    We do have an espresso machine, the Caffeino Classic. It's a 15-bar \
    pump for authentic espresso extraction, wiht a milk frother and \
    water reservoir for easy refiling. It costs 179.99.
    """
}

2.0.7 Evals

There are scoring systems such as Bleu that researchers have used to check model performance for language tasks. Another approach is to use OpenAI’s evals framework, from which the following grading criteria are used.

Let’s look at our assistant_response again:

assistant_response

def eval_vs_ideal(ideal_example, assistant_response):

    customer_message = ideal_example['customer_message']
    ideal_answer = ideal_example['ideal_answer']
    completion = assistant_response

    context = """\
    You are an assistant that evaluates how well the customer service agent \
    answers a user question by comparing the response to the ideal (expert) response
    Output a single letter and nothing else.
    Compare the factual content of the submitted answer with the expert answer. Ignore any differences in style, grammar, or punctuation.
    The submitted answer may either be a subset or superset of the expert answer, or it may conflict with it. Determine which case applies. Answer the question by selecting one of the following options:
    (A) The submitted answer is a subset of the expert answer and is fully consistent with it.
    (B) The submitted answer is a superset of the expert answer and is fully consistent with it.
    (C) The submitted answer contains all the same details as the expert answer.
    (D) There is a disagreement between the submitted answer and the expert answer.
    (E) The answers differ, but these differences don't matter from the perspective of factuality.
  choice_strings: ABCDE
    """

    user_message = f"""\
You are comparing a submitted answer to an expert answer on a given question. Here is the data:
    [BEGIN DATA]
    ************
    [Question]: {customer_message}
    ************
    [Expert]: {ideal_answer}
    ************
    [Submission]: {completion}
    ************
    [END DATA]
"""

    chat = chat_model.start_chat(
    context=context,
    examples=[]
    )

    response = chat.send_message(user_message, max_output_tokens=1024)
    return response

eval_vs_ideal(ideal_example, assistant_response)

We will look at further evaluation metrics in part 12.