Prompt Engineering Demystified: Understanding LLM Inference

#4 Generative AI with LLMs: Understanding LLM Inference - I

Oct 12, 2023

Introduction

A large language model or LLM is a treasure trove and rich linguistic knowledge and information repository. Since these models are trained on petabytes of text data in different formats, these models possess a solid understanding of the patterns and the semantic details of the language it is pre-trained with, along with capabilities like multi-lingual understanding and contextual awareness. Therefore, specific techniques are developed to interact with the LLM so that we can extract the required information effectively. This process of engineering the input prompts to extract relevant and required answers is called Prompt Engineering. In this blog, let’s explore some of the best prompt engineering techniques.

person using MacBook Pro — Photo by Glenn Carstens-Peters on Unsplash

Prompt Engineering: Getting Started

Prompt Engineering is the process of designing input prompts and queries used to interact with Large language models like chatbots or domain-specific fine-tuned models. The main goal of prompt engineering is to elicit desired responses or behaviours from the LLM. Before we delve into different techniques of prompt engineering, let’s revisit the basic mechanism of LLM Inference.

Every model takes in a text called prompt as input. The LLM generates the required text called completion based on the input prompt. The length of the generated text or completion depends upon the context window of the LLM. The context window is the total number of tokens (or words) the model can process at a time. The context window of the model can be represented as follows,

Context window of LLM = Length of input prompt + Length of completion

Usually, the context window of LLMs is fixed up to 1000 or 2000 tokens, so we should define the input prompt effectively so that the LLM has enough context window to generate the required completion.

Let’s explore a few prompt engineering techniques for effective LLM inference.

Prompt Engineering Techniques

Some of the most widely used prompt engineering techniques are:

Zero-Shot learning
In-Context Learning: One-shot and Few-shot learning.
Chain of thought prompting

Let’s explore each of them in more detail.

Zero-Shot Learning

Zero-shot learning or prompting is based on the principle that LLMs like GPT-3 and GPT-4 are finetuned to follow instructions and are usually trained on large amounts of data. Therefore, these LLMs can perform tasks “zero-shot” or tasks without additional fine-tuning.

For example, consider providing the following prompt to LLMs like GPT-3 or ChatGPT,

Prompt: 
Detect the sentence's sentiment as Positive, Negative or Neutral: 
I love the scenery of the place! 

Sentiment: 

Output: 
Sentiment: Positive

We can see that the model correctly identifies the sentiment of the sentence and completes the prompt with the option Positive. This capability of the large language model to perform inference tasks without additional fine-tuning and just using the knowledge learnt during model training is called zero-shot inference. Zero-shot inference best works for larger models, usually multi-billion parameter models like GPT-4, LLama, Falcon, etc.

But zero-shot inference is lacking in generating coherent completions when it comes to prompts or tasks that require more context, or it is a task not usually the LLM was trained upon directly. This is where we can use In-context learning techniques to customize the model generation by specifying additional context in the input prompt.

In-Context Learning

In-context learning refers to learning from the context provided in the input prompt. One-shot learning and Few-shot learning are some techniques under In-context learning.

One-shot Learning or prompting enables the LLM to perform better by providing additional context in the input prompt. This is done by designing the prompt by including one example of the complex task that needs to be performed. Consider the following example of performing One-shot inference,

Prompt:
Classify this review: I loved the cast and the story !
Sentiment: Positive 

Classify this review: I hated the performance of villian !
Sentiment:

Output:
Sentiment: Negative

In the above example, we have provided an example of the prompt as well as the completion of a sentiment analysis task. Then, we have provided another example for the model to perform inference. By observing the context and structure of output provided in the initial prompt, the model generates completion for the later prompt in the required format.

This prompting method can be further improved by including multiple examples of tasks in the context. This prompting method is called Few-shot learning. Few-shot learning includes multiple examples of that complex task to be performed in the context of the prompt. Then, the model will generate the output based on the patterns and observations from the input prompt’s context.

Let’s explore another interesting prompting technique to improve text generation quality.

Chain-of-Thought Prompting

Chain-of-thought prompting technique enables complex reasoning capabilities in LLMs by providing intermediate reasoning steps in the context of the input prompt. This can be further improved by combining the few-shot prompting technique to get better results on a highly complex task involving much reasoning before responding.

To better illustrate the capabilities of Chain-of-thought prompting, consider the following example,

Source: https://arxiv.org/abs/2201.11903

We can observe that according to the above example, the model was able to answer the question better when additional reasoning was provided with the correct answer. This prompting technique works best when coupled with few-shot learning.

Beyond Prompt Engineering?

Using the above prompt engineering techniques, LLMs can generate coherent and accurate completions for a wide variety of tasks. But sometimes, LLMs tend to hallucinate while generating completion for a prompt or might not produce completions of good quality even after providing multiple examples in few-shot prompting.

In this scenario, task-based fine-tuning of large language models is the best way to handle uncertainty in text generation. The pre-trained model is fine-tuned in this approach with a prompt-based task dataset created for the specific task. This will help the model adapt to the prompt structure easily compared to few-shot learning.

Conclusion

Large language models with billions of parameters trained with petabytes of data will have excellent natural language processing and understanding capabilities. These LLMs will also have a large memory to process and generate text related to any topic. However, this indefinite potential of knowledge and information possessed by LLMs can be utilized by effective querying techniques called prompt engineering. As and when the sizes of LLMs increase, it is necessary for us to use these prompt engineering techniques to query better and mine the knowledge from LLMs effectively.

Summary

To summarise,

Large language models with multiple billions of parameters consist of rich representation and understanding of knowledge and information.
Prompt Engineering is the process of designing prompts to input to LLMs to extract required and meaningful completions.
Some popular prompt engineering techniques are Zero-shot prompting, In-context learning, and Chain-of-thought prompting.
Zero-shot prompting includes the task within the input prompt to be performed, i.e. for the model to generate completion.
In-context learning includes defining the context of the problem to be resolved in the input prompt. This can be done by providing one (one-shot learning) or more examples (few-shot learning) in the input prompt.
Chain-of-thought learning improves the completion quality by including the intermediate reasoning steps and the examples in the input prompt. This improves the model's reasoning capability and helps generate more coherent completion.
If the quality of the completion generated by the model doesn’t improve even after providing additional examples, i.e.few-shot prompting, model fine-tuning must be performed.

Thanks for reading!

NeuraForge: AI Unleashed

Discussion about this post