Reducing cost and improve performance while using LLM

Reducing cost and improving performance while using LLM

Note

Prompt Adaptation

We don't need all previous context when asking new question.

LLM Approximation

Completion cache: using the answer generated before.

LLM cascade

Pasted image 20230521082711.png
Pasted image 20230521082934.png

Note

In this paper, the first two methods are merely concepts and the focus is on the third idea, LLM cascade.