Utilizing and prompting large language models

Ben Lin

The advent of large language models (LLMs) and chat models that can both understand human language and perform reason based on the semantic directives has overnight opened a Pandora's box of possibilities for software engineers and data scientists alike. Tried and tested computer science techniques and engineering paradigms are no longer the only game in town – the limits of predictable rule-based systems now seem expandable with a few clever prompts to GPT-4. Now, we navigate a realm where machines understand, reason, and converse.

However, an experienced programmer will know that getting your code to run once is a very different thing from getting it to run predictably. The degree of randomness that allows generative AI to work at its most effective necessitates new frameworks and paradigms to better allow the integration of LLM’s into existing and new systems.

This article will address how software engineers can leverage the flexibility of written human language to enhance their LLM applications with advanced prompting techniques.

Prerequisites: programmatic prompt generation and flow composition

In computer science, a “string” or single contiguous unit of text, is essentially a list of characters stored in the program’s memory. A string can be combined with other strings, split into pieces, and rearranged in a myriad of ways, and the characters that make up the string can “spell out” anything from data structures to entire programs.

When interfacing with LLMs, a prompt takes the form of a string, and so every other topic in this article will be dependent on this basic ability to programmatically format data and instructions into a string that will be used to prompt the model.

That’s the straightforward part: every input of data must be reduced to a string that can be passed to the model for generation; from there, complexity opens. The model will also return a string of text that can be manipulated and interpreted just like any other string. This fungibility of the inputs and outputs of LLMs is key, as it allows engineers to design ever more sophisticated systems that are both recursive and non-deterministic in exciting new ways.

Of course, this requires a framework to handle the flow of inputs and outputs, and any mutations or analysis of feedback that happen in between. The Experience Center at Humana has chosen LangChain, an open source library for developing applications that utilize LLMs because of its robust support for a systematic management of programmatically prompting LLMs. In fact, the ability to direct the flow of prompts and follow-up prompts is a core prerequisite for any of the following techniques discussed in this article.

Logical analysis: crafting precision

At the heart of effective prompting is logical analysis. LLMs, like GPT-4, can parse and respond to the written human language. Doing so requires them to discern intent, context, and nuance that structures the semantic meaning of that text, and as such, each of those aspects are vectors of approach for engineering a particular response from an LLM.

Context will be addressed later in the article; our first tool is the precise and logical structuring of prompts - breaking down complex tasks into smaller, simpler logical steps, and specifying what each step should produce, i.e. intent.

For instance, when seeking to categorize text that contains feedback, instead of asking the model to "analyze," specify: Ask the model to assess how well the feedback fits each category – this is an example of where it would be advantageous to orchestrate multiple prompts in parallel. Specifying a predictable response from model could be trickier if you need it to be structured, but simple “yes” or “no” questions are always a solid foundation.

Manipulating data structures: beyond words

Consider data structures: lists, key-value pairs, or even more complex hierarchies. The shape and mechanics of each structure provides a framework for performing millions of complex operations at the speed of electricity. As such, data structures are usually thought of in the context of memory and performance optimization or enabling advanced algorithms.

Now consider bullet/numbered lists, blocks of text with titles, even whole dictionaries and thesauruses: any of these natural occurring components of the human language are examples of structured information that can be understood by LLMs, and presenting data in a structured format can significantly enhance the model's performance – the meaning of a bullet list with a title is clearer and easier to understand than a long sentence of words separated by (Oxford) commas, and a markdown chart describes itself.

Retrieval augmented generation: enhancing creativity with facts

LLMs like GPT-4 were trained over a finite period, and on specific datasets of text – the entire internet the case of its predecessor GPT-3. The data a generative AI model was trained on defines the limits of its generative ability – GPT-4 has specific cutoff dates to its training data, after which it knows nothing, when it comes to real world events and people.

This poses a unique challenge when prompting LLM’s for information they were not trained on – like internal policies, new data, or changes to long established theories. While it is also possible to fine-tune a model with additional data, the Experience Center has found success using Retrieval Augmented Generation (RAG), a technique where the model retrieves external documents to augment its responses. This method falls under engineering the “context” of your prompt and is invaluable when you need up-to-date or specialized information.
To illustrate further, consider implementing RAG for a medical diagnosis tool: the program could first pull in patient charts, then the latest research papers about a certain disease can be accessed and further spliced for relevance. Both pieces of context can then be used to prompt a model for a more accurate and up-to-date determination of whether the patient’s history and current symptoms are consistent with the disease.

It's important to note that RAG places the responsibility for curating the sources of data for retrieval squarely on the team implementing it.

Conclusion

LLMs have revolutionized the possibilities for software engineers and data scientists with their ability to understand and reason based on human language. However, achieving predictability and reliability in LLM applications requires its own approach: Software engineers can leverage the flexibility of written human language to programmatically construct prompts, specify logic structuring generation, utilize and manipulate data structures, and provide information previously not available to the LLM. With these techniques, software engineers can harness the power of LLMs to create sophisticated and dynamic systems that push the boundaries of traditional rule-based approaches.

Prerequisites: programmatic prompt generation and flow composition

Logical analysis: crafting precision

Manipulating data structures: beyond words

Retrieval augmented generation: enhancing creativity with facts

Conclusion

Further reading