Exploring German LLM Expert Bots

Author

Moritz Gueck

Published

December 11, 2023

In this post we are exploring how we can provide a large language model with context information (“grounding”) and have it answer questions in German about it. We will then explore how different models perform at answering the questions.

Intro

What we need:

A large language model that can answer questions. -> Question Answering LLM (QA-LLM)
A large language model that can map text to vectors. -> Vector-LLM
A database to store the vectors. -> Vector-DB

We will be using the framework Langchain for splitting texts, grounding and connecting to models and sources. Langchain provides us with a couple of benefits over specific APIs from different providers:

Standardised interface for different document-loaders, LLMs and vector-DBs
Easy integration of different LLMs into different vector-DBs
The high level API avoids boilerplate code.

Here is a good intro to the topic: langchain: Retrieval.

How to do it:

Create a grounding database for your QA-LLM.
1. Gather the information you want your QA-llm to answer questions about.
2. Cut the data into snippets that are small enough to be processed by your QA-llm.
3. Map each snippet to a vector using your Vector-llm. The vector represents the meaning/topic of the snippet.
4. Store the vectors in a database.
Answer questions using the grounding database.
1. Given a question, map it to a vector using your Vector-llm. Then search the database for the snippets with the most similar vectors.
2. Prepend the snippet to your question.
3. Feed the augmented question to your QA-llm.

graph TB
  subgraph SB["build grounding database"]
  A(Data Sources) -->|Load| B(Text-Files)
  B -->|Chunk| C(Snippets)
  C -->|"vector-map (Vector-LLM)"| D[(Vector DB)]

  end

  subgraph SU["use grounding database"]
  D -->|retrieve| E(Relevant snippets)
  E -->|insert| F(Augmented Prompt)
  F -->|"query (QA-LLM)"| G(Result)
  end

  style SB fill:#F2F2F2
  style SU fill:#F2F2F2
  linkStyle 0,1,2,3,4,5 stroke:#BFBFBF

Show code: Base libraries to import

# Importing libraries
import os
import urllib.request, urllib.error, urllib.parse
import random
import textwrap

Show code: Print results more nicely

def nice_print(text):
    print(textwrap.fill(text, 120))

1. Creating a grounding database

1.1. Gather the information

In this step we will load the data that we want our model to answer questions about. We will use the data from the public, German Helsana website (largest health insurance in Switzerland).

from urllib.parse import urlparse

# Required functions for loading the data:
def download_webpage(url):
    response = urllib.request.urlopen(url)
    webContent = response.read().decode("UTF-8")

    file_path = "data/html/" + get_page_name(url)
    f = open(file_path, "w")
    f.write(webContent)
    f.close

def get_page_name(url):
    parsed_url = urlparse(url)
    page_name = parsed_url.path.split("/")[-1]
    return page_name

Show code: Web-pages to crawl

# List of web-pages where we will find the information for our knowledge base
urls = [
    "https://www.helsana.ch/de/private/versicherungen/grundversicherung.html",
    "https://www.helsana.ch/de/private/versicherungen/grundversicherung/basis.html",
    "https://www.helsana.ch/de/private/versicherungen/grundversicherung/benefit-plus-hausarzt.html",
    "https://www.helsana.ch/de/private/versicherungen/grundversicherung/benefit-plus-telmed.html",
    "https://www.helsana.ch/de/private/versicherungen/grundversicherung/benefit-plus-flexmed.html",
    "https://www.helsana.ch/de/private/versicherungen/grundversicherung/premed-24.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/ambulant.html",
    "https://www.helsana.ch/de/private/versicherungen/grundversicherung/uebersicht-grundversicherungen.html",
    "https://www.helsana.ch/de/private/versicherungen/spezialversicherungen.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/ambulant/leistungsuebersicht.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/ambulant/top.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/ambulant/sana.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/ambulant/completa.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/ambulant/world.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/ambulant/primeo.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/spitalversicherung.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/spitalversicherung/hospital-eco.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/spitalversicherung/hospital-halbprivat.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/spitalversicherung/hospital-privat.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/spitalversicherung/hospital-flex.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/zahnversicherung.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/weitere/advocare-plus.html",
    "https://www.helsana.ch/de/private/versicherungen/zusatzversicherungen/weitere/advocare-extra.html",
]

# Downloading 
for url in urls:
    download_webpage(url)

1.2. Cut the data into snippets

Now we need to cut the contents from the page into snippets that are small enough to be processed by our QA-llm but large enough to contain the relevant context.

1.2.1. Minimalist approach (not used)

This is probably the simplest approach to cut the webpages into snippets.

from langchain.text_splitter import RecursiveCharacterTextSplitter
from bs4 import BeautifulSoup

website_texts = []
for html_document_path in os.listdir("data/html"):
    soup = BeautifulSoup(
        open("./data/html/" + html_document_path), features="html.parser"
    )

    website_text = soup.get_text()
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=800,
        chunk_overlap=50,
        length_function=len,
        add_start_index=True,
    )
    page_texts = text_splitter.create_documents([website_text])
    website_texts = website_texts + page_texts

Regrettably, the vector-LLM had a hard time to reasonably map the snippets to meaningful vectors and the QA-LLM was not able to answer questions based on the snippets. The reasons were probably, that the snippets started in the middle of paragraphs and titles from the paragraphs were missing. The QA-LLM then used snippets to answer questions from wrong topics.

Therefore, I did not use this approach, but used approach 1.2.2. instead.

1.2.2. Subsection based splitting (used)

To embed the meaning and context clearer, we prepend the title and subtile of the relevant snippet to each text.

# This algorithm might be stupid but it works. :-D
def parse_website_texts(soup):
    """Cut website texts into snippets. 

    Args:
        soup (BeautifulSoup.soup): Soup object from the Beautiful soup webscraper. 
         It contains the different elements of the html-code of the website.

    Returns:
        : list(str): List of strings. 
         Each strings contains the title, subtitle and text of a snippet.
    """
    level_dict = {
        "h1": 0,
        "h2": 1,
        "h3": 2,
        "p": 3,
        "li": 3,
    }  # hierarchical level from html tags
    element_list = soup.find_all(["h1", "h2", "h3", "p", "li"])
    prev_level = 9999
    webpage_snippet_list = []
    snippet_texts_list = []
    # Pattern of each snippet: h1-title, h2-subtitle, h3-subsubtitle, paragraph-text:
    base_element_list = ["","", "", ""]  
    for element in element_list:
        current_level = level_dict[element.name]

        if current_level < prev_level:  # i.e. we are at a new topic
            # save the previous snippet as one string
            webpage_string_snip = " ".join(snippet_texts_list)
            webpage_string_snip = webpage_string_snip.replace("\n", " ")
            if webpage_string_snip != "":
                webpage_snippet_list.append(webpage_string_snip)
            # update base_element_list
            base_element_list[prev_level:current_level] = ""
            base_element_list[current_level] = element.text
            snippet_texts_list = base_element_list

        # update snippet_texts_list
        else:
            snippet_texts_list = snippet_texts_list + [element.text]
        prev_level = current_level
    return webpage_snippet_list

from bs4 import BeautifulSoup

website_texts = []
for html_document_path in os.listdir("data/html"):
    soup = BeautifulSoup(
        open("./data/html/" + html_document_path), features="html.parser"
    )
    website_texts_page = parse_website_texts(soup)

    website_texts = website_texts + website_texts_page

Here is an example of the snippets (heading + subheadings + paragraphs):

for text in random.sample(website_texts, 5):
    print(text)

Helsana Advocare PLUS Häufig gestellte Fragen    Wer kann diese Versicherung abschliessen?    Sie können die Versicherung abschliessen, wenn Sie folgende Voraussetzungen erfüllen: Sie leben in der Schweiz (offizieller Wohnsitz). Sie haben bereits eine der Zusatzversicherungen TOP, OMNIA oder COMPLETA oder beantragen diese zeitgleich mit Helsana Advocare PLUS.
SANA Weitere Zusatzversicherungen TOP  Ihr Zusatz zur Grundversicherung: Wichtige ambulante Leistungen sind gedeckt.
BeneFit PLUS Telmed    Bei gesundheitlichen Problemen rufen Sie immer zuerst das unabhängige Zentrum für Telemedizin an: 0800 800 090. Sie erhalten rund um die Uhr medizinische Unterstützung und einen attraktiven Prämienrabatt.    24/7 kostenlose, verbindliche medizinische Telefonberatung     Digitale Services, wie z. B. Symptom-Checker und Videokonsultation     Attraktiver Prämienrabatt  
BeneFit PLUS Telmed  Prämie berechnen  Ihre Prämie CHF 0 CHF 500 CHF 300 CHF 500 CHF 1000 CHF 1500 CHF 2000 CHF 2500 eingeschlossen ausgeschlossen
BeneFit PLUS Flexmed Weitere Modelle der Grundversicherung BeneFit PLUS Hausarzt  Der Hausarzt oder die HMO-Gruppenpraxis ist Ihre erste Anlaufstelle.

1.3. Map each snippet to a vector

Here we use another large language model to map each snippet to a vector. This vector represents the meaning/topic of the snippet. I used the FastEmbed model for practical reasons. However, I would recommend to use a more powerful model like this one: https://huggingface.co/sentence-transformers/all-mpnet-base-v2

from langchain.embeddings.fastembed import FastEmbedEmbeddings

# Choosing a suitable embedding model can make a big difference in retrieval performance.
# For simplicity, we use the FastEmbedEmbeddings model off-the-shelf.
embedder = FastEmbedEmbeddings() 

# Just to show a vector, we embed the first snippet:
embeddings = embedder.embed_documents(website_texts[0])
print(embeddings[0][0:5])  # The first 5 dimensions of the vector

[-0.013905464671552181, 0.038332026451826096, 0.01669456996023655, 0.010435071773827076, -0.01078716292977333]

1.4. Store embeddings in a vector database

We could just keep the embeddings in memory or store them in a file. It is however more efficient to store them in a database. I used the open-source chroma database for this purpose.

from langchain.vectorstores import Chroma

chroma_db = Chroma.from_texts(website_texts, embedder)

2. Answering questions using the grounding database

2.1. The manual approach (not recommended, just for understanding)

2.1.1. Given a question, map it to a vector and search the database

Now we will use the Vector-llm to map the question to a vector. Then we will search the database for the snippets with the most similar vectors. These contain the information that we need to answer the question.

First we need a question. Here we ask whether the additional insurance model “Completa” of the Helsana health insurance cover the costs of glasses.

question = "Sind Kosten für meine Brille von Completa gedeckt?"

Now we task the chroma-db to find snippets, which are similar to the topic of the question:

n_neighbors = 5
similar_docs = chroma_db.similarity_search(question, k=n_neighbors)

for doc in similar_docs:
    nice_print(doc.page_content)

COMPLETA Weitere Zusatzversicherungen COMPLETA PLUS  COMPLETA PLUS erweitert den Deckungsumfang von COMPLETA.
COMPLETA Häufig gestellte Fragen    Wann lohnt sich ein Upgrade zu COMPLETA PLUS?     COMPLETA PLUS erweitert den
Deckungsumfang von COMPLETA. Die Zusatzversicherung lohnt sich, wenn Ihnen Leistungen rund um Gesundheitsförderung
wichtig sind. Sie erhalten unter anderem zusätzliche Kostenbeiträge für Gesundheitsförderung, Komplementärmedizin,
Prävention sowie für Brillen, Kontaktlinsen und Augenlaserkorrekturen.
COMPLETA Häufig gestellte Fragen    Lohnt sich ein Upgrade zu COMPLETA, wenn ich bereits TOP oder SANA habe?    Wenn Sie
gerne möglichst breit abgesichert sind, dann lohnt sich COMPLETA für Sie. Damit schliessen Sie die meisten
Deckungslücken der Grundversicherung. COMPLETA vereint die Vorteile von TOP und SANA. Besser noch: Viele Vergütungen
sind noch grosszügiger, beispielsweise für medizinische Hilfsmittel oder Präventionsmassnahmen wie Check-ups. Und für
Brillen und Kontaktlinsen erhalten Sie gar doppelt so viel Geld erstattet wie mit TOP. Zudem werden Behandlungen im
Ausland sowie durch Nichtvertragsärzte unterstützt.
COMPLETA Häufig gestellte Fragen    Wer kann COMPLETA abschliessen?    Sie können die Versicherung abschliessen, wenn
Sie in der Schweiz wohnhaft sind (offizieller Wohnsitz) und über eine Gesundheitsdeklaration mit positivem
Aufnahmebescheid verfügen.
COMPLETA Versicherte Leistungen Prämie berechnen  Nachfolgende Leistungen erstatten wir Ihnen ergänzend zu den
gesetzlichen Leistungen der Grundversicherung aus der Zusatzversicherung COMPLETA zurück:

2.1.2. Prepend the relevant snippets to your question

To give our QA-model the information it needs to answer the question, we prepend the relevant snippets to the question. Additionally we add some instructions to the model to make sure it understands what we want from it.

docs_in_prompt = ""
for id, doc in enumerate(similar_docs):
    docs_in_prompt += str(id + 1) + ": " + doc.page_content + "\n"
context_instruction = """Du bist ein hilfreicher Assistent. Benutze die Informationen der Helsana Gesundheitsversicherung um die darauf folgende Frage zu beantworten."""
augmented_prompt = f"{context_instruction} \nDie Informationen: {docs_in_prompt}Frage: {question} Assistant:"
print(augmented_prompt)

Du bist ein hilfreicher Assistent. Benutze die Informationen der Helsana Gesundheitsversicherung um die darauf folgende Frage zu beantworten. 
Die Informationen: 1: COMPLETA Weitere Zusatzversicherungen COMPLETA PLUS  COMPLETA PLUS erweitert den Deckungsumfang von COMPLETA. 
2: COMPLETA Weitere Zusatzversicherungen COMPLETA PLUS  COMPLETA PLUS erweitert den Deckungsumfang von COMPLETA. 
3: COMPLETA Häufig gestellte Fragen    Wann lohnt sich ein Upgrade zu COMPLETA PLUS?     COMPLETA PLUS erweitert den Deckungsumfang von COMPLETA. Die Zusatzversicherung lohnt sich, wenn Ihnen Leistungen rund um Gesundheitsförderung wichtig sind. Sie erhalten unter anderem zusätzliche Kostenbeiträge für Gesundheitsförderung, Komplementärmedizin, Prävention sowie für Brillen, Kontaktlinsen und Augenlaserkorrekturen.
4: COMPLETA Häufig gestellte Fragen    Wann lohnt sich ein Upgrade zu COMPLETA PLUS?     COMPLETA PLUS erweitert den Deckungsumfang von COMPLETA. Die Zusatzversicherung lohnt sich, wenn Ihnen Leistungen rund um Gesundheitsförderung wichtig sind. Sie erhalten unter anderem zusätzliche Kostenbeiträge für Gesundheitsförderung, Komplementärmedizin, Prävention sowie für Brillen, Kontaktlinsen und Augenlaserkorrekturen.
5: COMPLETA Häufig gestellte Fragen    Lohnt sich ein Upgrade zu COMPLETA, wenn ich bereits TOP oder SANA habe?    Wenn Sie gerne möglichst breit abgesichert sind, dann lohnt sich COMPLETA für Sie. Damit schliessen Sie die meisten Deckungslücken der Grundversicherung. COMPLETA vereint die Vorteile von TOP und SANA. Besser noch: Viele Vergütungen sind noch grosszügiger, beispielsweise für medizinische Hilfsmittel oder Präventionsmassnahmen wie Check-ups. Und für Brillen und Kontaktlinsen erhalten Sie gar doppelt so viel Geld erstattet wie mit TOP. Zudem werden Behandlungen im Ausland sowie durch Nichtvertragsärzte unterstützt.
Frage: Sind Kosten für meine Brille von Completa gedeckt? Assistant:

Now we could use this prompt and send it to out QA-llm. But there is a better automated way:

2.2. The automated approach (recommended)

Now we will try out different QA-llms to answer the question and see which model performs the best.

We will try:

OpenAI ChatGPT-3.5
Llama2 70B
Mistral 7B and Mistral 7B tuned on German texts

OpenAI GPT-3.5

Currently one of the strongest but also biggest models. We can only run it via an API-request to the OpenAI-Server. You can find advice for prompting the model here: Promptingguide.ai: chatgpt

from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

To use the OpenAI API, you need to register (OpenAI API) and save the API-key as an environment variable:

opneai_llm = OpenAI(openai_api_key=os.environ["OPENAI_API_KEY"])

Since langchain lets us swap out models easily, we write one query-function for all our models:

def query_llm(model, prompt_template):
    qa_chain = RetrievalQA.from_chain_type(
        model,
        retriever=chroma_db.as_retriever(
            search_type="mmr", search_kwargs={"k": 6, "lambda_mult": 0.25}
        ),
        chain_type_kwargs={"prompt": prompt_template},
    )
    response = qa_chain({"query": question})
    nice_print(response["result"])

Now we can write a prompt-template where we can plug-in our question and context. As with the manual approach, the QA-model gets our context and instructions to answer our question:

openai_prompt_template = PromptTemplate.from_template(
    """Du bist ein hilfreicher Assistent der Fragen beantwortet. Benutze die folgenden Stücke von Context zur Helsana um die darauf folgende Frage zu beantworten. Wenn du die Antwort nicht findest, schreibe, dass du es nicht weisst. Benutze maximal drei Sätze und sei präzise beim Antworten.  
    ### Frage: {question} 
    ### Context: {context} 
    ### Antwort: """
)

query_llm(opneai_llm, openai_prompt_template)

 Ja, Kosten für eine Brille von Completa sind gedeckt, sofern die Angaben über die Brillenstärke auf der Rechnung
ausgewiesen sind.

The answer looks pretty correct to me. Only missing is the information that the insurance covers the costs up to 300 CHF per annum.

LLama-2 70B

This is a large model but still smaller than GPT-3.5. It is still too big to run it on our machine. Therefore we will run it in the cloud (in our case on an Nvidia A100 GPU on replicate.com). You can find more info on LLama here: Hugging Face: Llama2

llama_llm = Replicate(
    model="meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    model_kwargs={"temperature": 0.75},
)

Different models profit from different prompt formats. Thus, we create a different template here:

llama_prompt_template = PromptTemplate.from_template(
    """[INST] Du bist ein hilfreicher Assistent. Benutze den folgenden Context zur Helsana um die darauf folgende Frage zu beantworten. Wenn du die Antwort nicht findest, schreibe, dass du die Antwort nicht weisst. Benutze maximal drei Sätze und sei präzise beim Antworten. 
    Frage: {question} 
    Context: {context} 
    Antwort:[/INST]"""
)

query_llm(llama_llm, llama_prompt_template)

 Ja, die Kosten für Ihre Brille sind von Completa gedeckt. According to the information provided, you will receive 90%
of the costs up to a maximum of 150 francs per calendar year for your glasses or contact lenses. However, it is
important to note that there may be a deductible and coinsurance applicable to the coverage. Additionally, if you have
already exhausted your annual limit for glasses or contact lenses under your TOP policy, you may not be able to claim
the full amount under Completa. It's always best to check

The answer is correct. However it suddenly starts writing in English. Let’s try to fix this by using few shot-learning: We will provide it a few examples of questions and answers in German.

llama_prompt_template = PromptTemplate.from_template(
    """[INST] Du bist ein hilfreicher Assistent. Benutze den folgenden Context zur Helsana um die darauf folgende Frage zu beantworten. Wenn du die Antwort nicht findest, schreibe, dass du die Antwort nicht weisst. Benutze maximal drei Sätze und sei präzise beim Antworten. Antworte ausschliesslich auf Deutsch. Beispiel 
    Frage: Sind Notfälle im Ausland vom Modell BeneFit PLUS abgedeckt? 
    Context: Bei einem Notfall erhalten Sie in den EU/EFTA-Staaten den jeweiligen Sozialtarif. In allen übrigen Ländern erstatten wir maximal das Doppelte des in der Schweiz versicherten Betrages. 
    Antwort:[/INST] Ja, Notfälle im Ausland sind abgedeckt. In nicht-EU/EFTA-Staaten erstatten wir maximal das Doppelte des in der Schweiz versicherten Betrages. [INST]
    Frage: Ich bin Hospital Privat versichert. Ich hatte einen Unfall und war auf eine Haushaltshilfe angewiesen. Erstattet mir die Helsana die Kosten zurück? 
    Context: Kann ein akuter stationärer Spital- oder Kuraufenthalt durch eine ärztlich verordnete Haushaltshilfe verhindert oder zumindest verkürzt werden, erhalten Sie bis max. 200 Franken pro Tag während max. 30 Tagen pro Kalenderjahr. 
    Antwort: [/INST] Ja, die Helsana erstattet Ihnen bis zu 200 Franken pro Tag während max. 30 Tagen pro Kalenderjahr.[INST] 
    Frage: {question} 
    Context: {context} 
    Antwort: [/INST]"""
)

query_llm(llama_llm, llama_prompt_template)

 Ja, die Kosten für Ihre Brille sind von Completa gedeckt. Sie erhalten 90% der Kosten bis max. 150 Franken pro
Kalenderjahr für Ihre Brillengläser und Kontaktlinsen. Wenn Sie jedoch eine Upgrade zu COMPLETA PLUS erwogen, lohnt sich
dies, da Viele Vergütungen noch grosszügiger sind, wie zum Beispiel für medizinische Hilfsmittel oder
Präventionsmassnahmen wie Check-ups. Und für Brillen und Kontaktlinsen erhalten Sie

This looks already much better. For this specific question, the model gives a correct answer with all the relevant context. There is just one wrong word in the answer (erwogen). Let’s go one step smaller.

Mistral 7B

This is a much smaller model than GPT-3.5 but still pretty strong. We could run this model on a standard notebook (16GB RAM required), but it is faster to run it on GPUs in the cloud (in our case on a Nvidia A40 GPU on replicate.com). Using GPUs generated a speedup of up to 50x in our case. More info here: Mistral: Mistral 7B

from langchain.chains import LLMChain
from langchain.llms import Replicate
from langchain.prompts import PromptTemplate

mistral_llm = Replicate(
    model="mistralai/mistral-7b-instruct-v0.1:83b6a56e7c828e667f21fd596c338fd4f0039b46bcfa18d973e8e70e455fda70",
    model_kwargs={"temperature": 0.7, "max_length": 500},
)

Let’s use the few-shot learning approach directly:

mistral_prompt_template = PromptTemplate.from_template(
    """<s>[INST] Du bist ein hilfreicher Assistent. Benutze den folgenden Context zur Helsana um die darauf folgende Frage zu beantworten. Wenn du die Antwort nicht findest, schreibe, dass du die Antwort nicht weisst. Benutze maximal drei Sätze und sei präzise beim Antworten. Beispiel 
    Frage: Sind Notfälle im Ausland vom Modell BeneFit PLUS abgedeckt? 
    Context: Bei einem Notfall erhalten Sie in den EU/EFTA-Staaten den jeweiligen Sozialtarif. In allen übrigen Ländern erstatten wir maximal das Doppelte des in der Schweiz versicherten Betrages. 
    Antwort:[/INST] Ja, Notfälle im Ausland sind abgedeckt. In nicht-EU/EFTA-Staaten erstatten wir maximal das Doppelte des in der Schweiz versicherten Betrages. [INST]
    Frage: Ich bin Hospital Privat versichert. Ich hatte einen Unfall und war auf eine Haushaltshilfe angewiesen. Erstattet mir die Helsana die Kosten zurück? 
    Context: Kann ein akuter stationärer Spital- oder Kuraufenthalt durch eine ärztlich verordnete Haushaltshilfe verhindert oder zumindest verkürzt werden, erhalten Sie bis max. 200 Franken pro Tag während max. 30 Tagen pro Kalenderjahr. 
    Antwort: [/INST] Ja, die Helsana erstattet Ihnen bis zu 200 Franken pro Tag während max. 30 Tagen pro Kalenderjahr. </s> [INST] 
    Frage: {question} 
    Context: {context} 
    Antwort: [/INST]"""
)

query_llm(mistral_llm, mistral_prompt_template)

Ja, Brillen und Kontaktlinsen von Completa sind von COMPLETA gedeckt. Für Brillengläser und Kontaktlinsen erhalten Sie
bis zu 90% der Kosten bis zu 150 Franken pro Jahr. Für Kinder und Jugendliche bis 18 Jahre erhalten Sie 180 Franken pro
Jahr.

This answer is already better but still a bit misleading. (And I have to admit that I generated a few more incomplete answers.) Let us try to use a model that is tuned on German texts.

German Mistral 7B

Let us use a version of Mistral that has been tuned for German texts. Maybe it will perform better. More info on the model here: Hugging Face: em_german_leo_mistral

from langchain.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# This time we will run it on our own device:
german_mistral_llm = Ollama(
    model="germanleo",
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
)

german_mistral_prompt_template = PromptTemplate.from_template(
    """Du bist ein hilfreicher Assistent. Für die folgende Aufgabe stehen dir zwischen den tags BEGININPUT und ENDINPUT mehrere Quellen zur Verfügung. Die eigentliche Aufgabe oder Frage ist zwischen BEGININSTRUCTION und ENDINCSTRUCTION zu finden. Beantworte diese wortwörtlich mit einem Zitat aus den Quellen. Sollten diese keine Antwort enthalten, antworte, dass auf Basis der gegebenen Informationen keine Antwort möglich ist!
    BEGININSTRUCTION Sind Notfälle im Ausland vom Modell BeneFit PLUS abgedeckt? ENDINCSTRUCTION BEGININPUT Bei einem Notfall erhalten Sie in den EU/EFTA-Staaten den jeweiligen Sozialtarif. In allen übrigen Ländern erstatten wir maximal das Doppelte des in der Schweiz versicherten Betrages. ENDINPUT 
    ASSISTANT: "Ja, Notfälle im Ausland sind abgedeckt. In nicht-EU/EFTA-Staaten erstatten wir maximal das Doppelte des in der Schweiz versicherten Betrages."
    BEGININSTRUCTION Ich bin Hospital Privat versichert. Ich hatte einen Unfall und war auf eine Haushaltshilfe angewiesen. Erstattet mir die Helsana die Kosten zurück? ENDINCSTRUCTION BEGININPUT Kann ein akuter stationärer Spital- oder Kuraufenthalt durch eine ärztlich verordnete Haushaltshilfe verhindert oder zumindest verkürzt werden, erhalten Sie bis max. 200 Franken pro Tag während max. 30 Tagen pro Kalenderjahr. ENDINPUT 
    ASSISTANT: "Ja, die Helsana erstattet Ihnen bis zu 200 Franken pro Tag während max. 30 Tagen pro Kalenderjahr. Voraussetzung ist eine ärztliche Verordnung." 
    BEGININSTRUCTION {question} ENDINCSTRUCTION
    BEGININPUT {context} ENDINPUT 
    ASSISTANT: """
)

query_llm(german_mistral_llm, german_mistral_prompt_template)

 "Ja, die Kosten für Ihre Brille von Completa sind gedeckt. Kompleta PLUS erweitert den Deckungsumfang und lohnt sich insbesondere dann, wenn Sie gerne möglichst breit abgesichert sind." "Ja, die Kosten für Ihre Brille von Completa sind gedeckt. Kompleta PLUS erweitert den Deckungsumfang und lohnt sich
insbesondere dann, wenn Sie gerne möglichst breit abgesichert sind."

This looks already a bit better. The answer contains irrelevant information and some important information is missing.

A side-mark:

On my device (without using possible optimizations), the model took 3:34 minutes to answer the questions. So even for a small model like this, it is not really feasible to use it in a chatbot on commodity hardware.

Conclusions

Grounding:
- Grounding your model helps it giving you correct answers to very specific questions.
- Formatting the text in a way that the model can understand it is crucial.
Larger Models perform better:
- GPT3.5 is better at answering questions off-the-shelf.
- Llama-2 70B is able to answer the questions correctly. However, it struggles to provide correct German texts.
- Mistral 7B struggles to answer questions correctly. More effort than I took is required.
Different models need different prompt templates.
With few-shot learning you can improve the quality of the response dramatically.
A high-level framework like langchain makes switching models and sources easy.

Outlook

New Models:
- Just today Mistral released a new multi-lingual model with very impressive reported performance. It’s size is between Mistral 7B and LLama2 70B. This might solve our problems with Mistral: Mistral: Mixtral of experts
- There are LLama-Models that have been pretrained on German texts and might provide a better performance with our German prompts and responses: Huggingface: LLama2 German Assistant
Fine Tuning models: Since the models from Mistral are substantially smaller, they can more easily be retrained (fine-tuned) to achieve higher accuracy. Furthermore, parameter tuning could help lessening hallucinations of the model (i.e. making things up).
Benchmarking: To properly evaluate the performance of the different approaches (Vector-LLM and QA-LLM), a larger test-set and requirements for correct answers would be needed.

Acknowledgments

A special thank you to Moritz Settele and Koen Tersago from morrow ventures for their helpful feedback on this blog-post.