Sabrina Ramonov 🍄
Posts
Explained: What Are Emergent Abilities of LLMs?

Explained: What Are Emergent Abilities of LLMs?

Everything You Need To Know About Emergent Abilities

Sabrina Ramonov
May 11, 2024

Emergent Abilities
In-Context Learning
Instruction Following
Step-by-Step Reasoning
Future Work
Conclusion

Large Language Models (LLMs) are a new innovation.

Prior to LLMs, we had small language models, far fewer than 60B parameters.

But it’s not just size that differentiates LLMs from small models…

LLMs demonstrate entirely new superpowers — not at all seen in small models.

Emergent Abilities

Emergent Abilities refers to:

Major step-function capabilities LLMs can do, but small models cannot.

That’s why I call them “superpowers”.

As an analogy, think about kids learning to speak.

They develop language abilities in stages, not continuously.

One day they’re babbling words.

The next, they form a clear sentence.

But how?

Language skills seem to appear suddenly but actually result from gradual, behind-the-scenes changes (which we barely understand).

Parents notice "leaps" in ability when their child unexpectedly moves to a more advanced level of speaking.

This analogy illustrates:

Development in complex systems can occur in step-function leaps rather than smooth linear progress.

Formally, this is known as “phase change” or “phase transition”, like in physics.

LLMs showcase 3 primary step-function Emergent Abilities:

In-Context Learning
Instruction Following
Step-by-Step Reasoning

Researchers have found many more emergent abilities than these 3, since emergent abilities refers to any task an LLM can do, that a smaller model cannot. Here’s a compilation of 100+ emergent abilities discovered so far.

For this post, I focus on the 3 emergent abilities above because they are foundational for many types of tasks.

In-Context Learning

This ability allows LLMs to understand and complete tasks based on a few examples provided in the instructions, without the need for further training.

For example, you want to automate scraping data from websites.

You provide ChatGPT with examples of what the end results should look like, as well as a few sample scrapers you’ve previously built.

Then you ask ChatGPT to write a similar script for a different website.

With in-context learning, ChatGPT uses the structure and logic of your examples to generate a new script tailored to your new requirements.

This is all done without any additional training specific to the new task.

GPT-3, particularly its larger version with 175 billion parameters, showcases impressive in-context learning across various tasks, though effectiveness can vary depending on the task.

For example, it performs well in arithmetic.

But it struggles with complex tasks like Persian language Q&A.

Instruction Following

LLMs can follow complex instructions across varied tasks they’ve never seen.

This is achieved through instruction tuning:

Models are trained with a variety of tasks described in natural language.

Here’s a diagram showing how OpenAI does instruction following training:

https://openai.com/index/instruction-following/

Here’s a realistic example in enterprise:

You want LLMs to handle customer service questions across different product lines without explicit training on each type of question.
Use instruction tuning to train on a diverse set of customer service interactions across various industries and product types. This includes complaints, processing returns, FAQs, troubleshooting, etc.
Once trained, LLMs can generalize to handle many types of customer questions and complaints, even if they have never seen many of them before.
For example, you launch a new product line. LLMs apply general knowledge to provide helpful customer service, while you simply feed in context on your new product line in the prompt.

Before LLMs, you’d need to train a separate ML model for each task.

After LLMs, a single model can handle many tasks surprisingly well — even better with a sprinkle of prompt engineering and finetuning!

Unsurprisingly, larger models (i.e. many billions of parameters) perform much better on unseen tasks compared to smaller models.

LaMDA-PT model excels at new tasks when it reaches 68 billion parameters.

PaLM model needs 62 billion parameters to consistently perform well across multiple benchmarks.

Step-by-Step Reasoning

Smaller models struggle to solve complex problems with multiple logical steps.

However, LLMs leverage the technique chain-of-thought prompting (CoT) to effectively nail such tasks.

CoT prompting guides LLMs through a sequence of reasoning steps to reach a conclusion. You can also add verification steps throughout the sequence to force LLMs to “check their work so far”, further boosting accuracy.

In the charts below, notice Model Size vs. Solve Rate.

Even when small models try to use CoT, they still perform poorly.

CoT prompting is substantially more effective at larger model sizes.

Here’s a simple way to apply CoT in your prompt:

Before CoT:

You are shopping for a party. You buy 3 pizzas at $7.50 each, 2 packs of soda for $3.00 each, and 5 packs of napkins at $2.00 each. How much did you spend in total?

After CoT:

You are shopping for a party. You buy 3 pizzas at $7.50 each, 2 packs of soda for $3.00 each, and 5 packs of napkins at $2.00 each. How much did you spend in total? Take a deep breath and explain your reasoning step by step.

Sabrina Ramonov @ sabrina.dev

One important distinction is zero-shot CoT versus few-shot CoT.

With zero-shot CoT, you provide zero examples.

With few-shot CoT, you provide at least one example.

Here’s an example illustrating the difference in prompts:

https://arxiv.org/pdf/2205.11916

On the whole, step-by-step reasoning is far more powerful in LLMs compared to smaller models.

Thanks to techniques like CoT, LLMs demonstrate substantial performance improvements in tasks that require complex reasoning.

Future Work

Despite all this mind-blowing progress in a very short period of time…

LLMs still face non-trivial limitations and obstacles.

In my opinion, here are the top 10 challenges with LLMs in 2024:

Context
Hallucination
Transparency
Generalization
Multi-modality
Multi-languages
Bias and Fairness
New Architectures
Large-Scale Evaluation
Computational Efficiency

I dive into these challenges in more detail in a separate newsletter series. You can read part 1 here and part 2 here.

Regardless of the limitations, new research continues on multiple fronts.

Already, LLMs tackle everyday tasks like helping find information more effectively, summarizing large volumes of text, generating content for brainstorming and writing, and personalized recommendation systems.

LLMs are getting even smarter by manipulating multiple modalities (images, audio, video, text). I’m personally very excited about this frontier!

This diagram summarizes LLM research directions and domains:

Source

LLMs are already being applied in domains such as healthcare, finance, scientific research, law, tech, and education. Here’s a recent Forbes piece detailing 20 successful real-world LLM applications today:

Assisting With Clinical Diagnoses
Mapping Cybersecurity Regulations To Policies And Controls
Expediting Claims Processing
Tracking And Analyzing Customer Feedback
Fielding Customer Service Questions
Enhancing Developer Productivity
Detecting Signs Of Malware
Providing Customers With Immediate Assistance
Troubleshooting Equipment Issues
Summarizing Application Details
Automating Customer Relationship Tasks
Answering Both Simple And Complex Questions
Scraping Online Data
Powering Market Research
Simplifying Analytics Review
Creating Semantic Maps Of Complex Topics
Curating Scientific Literature
Powering Internal Knowledge Bases
Brainstorming Marketing Content
Generating Test Data

Conclusion

Altogether, these primary emergent abilities (in-context learning, instruction following, step-by-step reasoning) showcase the evolving sophistication of LLMs, enabling them to handle diverse tasks with increasing complexity.

The most impressive part?

We got here pretty fast!

Thanks primarily to OpenAI’s relentless R&D efforts.

When I started building my real-time speech recognition and NLP startup, Qurious, back in 2015…

Most VCs I met had never heard of deep learning!

A few were very skeptical it would go anywhere.

Hard to believe, I know, but 100% true.

I won’t name names 🤣 for now!