This project aims to investigate and design a cost-efficient framework for implementing Large Language Models (LLMs), with a focus on optimizing both performance and resource usage. Specifically, we will study the following key aspects:
1. Cost-Accuracy Trade-off:
We will explore how to optimize query routing across a set of LLMs that vary in cost, latency, and response quality. The objective is to:
Develop routing strategies that minimize computational and financial costs.
Ensure that selected strategies meet predefined quality or accuracy thresholds.
Understand the trade-offs between using smaller, cheaper models and larger, more capable (but costlier) ones.
2. Query Decomposition:
This component investigates the possibility of breaking down complex queries into smaller, simpler sub-queries that can be:
Solved more efficiently,
Routed to specialized or smaller LLMs (LLMs have only partial information)
And later recombined into a coherent final response.
We aim to define a general framework for query decomposition, including:
Query analysis and segmentation techniques,
Assignment policies for sub-queries,
Aggregation mechanisms for the final output.
3. Unified End-to-End Framework
The final step is to design a comprehensive end-to-end framework that integrates both:
Cost-aware routing strategies (from part 1), and
Query decomposition techniques (from part 2),
This unified system should:
Dynamically route queries (or sub-queries) to the most suitable models,
Balance performance and cost,
Research Objectives:
The main goals of the project include:
Literature Survey: Analyze and classify existing research related to cost-efficient LLMs, routing strategies, and query decomposition.
Comparative Analysis: Identify strengths and limitations of current approaches.
Framework Proposal: Outline a novel framework (a prototype) with clear design principles
Some Related works:
BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute, ICML 2025
ROUTELLM: LEARNING TO ROUTE LLMS WITH PREFERENCE DATA, ICLR 2025
In the scope of this research, there will be a possible collaboration with Imperial College London and the University of Birmingham.
About Project Supervisors
Emre Özfatura
Çağlar Tunç