Have you ever wondered why the latest artificial intelligence models, like GPT-4 or Claude 3 Opus, are so enormous? We’re talking hundreds of billions or even trillions of parameters. These are digital monsters requiring massive amounts of energy and data-center-level infrastructure.

For years, AI followed a simple rule:
“Bigger means better.”
Want a smarter model? Add more layers, more data, more GPUs.

But — what if this is a dead end?

What if instead of one giant that tries to know everything, it’s better to build a team of experts, agile and specialized?

This is the vision introduced in the recent work from Purdue University:

“Experts are all you need: A Composable Framework for Large Language Model Inference” (arXiv:2511.22955).

The proposed architecture, Comp-LLM, allows combining small and mid-sized models into an intelligent, composable system that:

is faster,
is cheaper,
runs in parallel,
and rivals giant models in quality.

Sounds like a revolution?
This article walks through everything — from analogies to mathematical and architectural details.

Comp-LLM “in simple terms”

The Problem: One-Man Band vs. Professional Crew

Imagine you’re doing a major home renovation. Two options:

Option 1: Monolithic Model (GPT-4, Llama-70B)

You hire one person — Mr. Jack-of-all-Trades.

He knows everything: plumbing, electricity, painting, poetry, quantum physics.
Sounds great, but:

he’s slow,
he’s gigantic,
his “brain” is full of knowledge irrelevant to the current task,
to fix a faucet, he must search his entire “universe of knowledge.”

This is your classic LLM.

Option 2: Agent Systems (AutoGen, ReAct)

You hire a manager who summons specialists one by one:

Plumber arrives → works → leaves.
Electrician arrives → works → leaves.

Quality is good, but it takes forever.
Everyone waits for the previous one.

These are classic agent systems — sequential.

Option 3: Comp-LLM — A New Paradigm

Here comes the innovation.

You’ve got a Super-Manager (Sub-query Generator).

You give a task:
“Fix the faucet and paint the living room.”

How does it work?

Splits the task into two subproblems: plumbing + painting.
Checks dependencies — they’re independent.
Runs both in parallel.
Collects two results.
Merges them into a final answer.

Sounds simple?
That’s Comp-LLM: parallelism + specialization + intelligent routing.

Architecture & Technical Details

Comp-LLM consists of three pillars:

1. Sub-query Generator

Responsible for:

decomposing the original query $Q$ into sub-queries:
$$ Q \rightarrow { q_1, q_2, \dots, q_n } $$
building a dependency graph (DAG),
routing queries to the right experts.

Routing is zero-shot (no training), based on embeddings:

$$ \text{Expert}(q_i) = \arg\max_{E_j} \frac{v_{q_i} \cdot v_{E_j}}{|v_{q_i}| |v_{E_j}|} $$

Similarity threshold: 0.7.

This means:

you can add any model as an expert,
no system-wide retraining is needed,
experts may come from different sources (Meta, Google, HF).

This is true composability.

2. Query Executor (Parallel Engine)

Executes sub-queries in parallel, obeying the DAG:

finds nodes with $in_degree = 0$,
dispatches them to experts,
frees dependent nodes once results arrive.

Speedup observed in the paper:

1.1× – 1.7× faster than sequential agent systems.

3. Response Aggregator

Combines:

the original query $Q$,
expert outputs,
dependency graph context,

into a single coherent answer.

Why does this work?

Because it separates concerns:

Router → general reasoning
Experts → specialized domain knowledge
Aggregator → final logic

Monolithic models must do all three at the same time — and often fail.

Results: The Numbers Speak

Benchmark: MultiExpertQA-P

Model	Parameters	F1 Score
Llama-2 7B	7B	0.56
Llama-2 13B	13B	0.67
Llama-2 34B	34B	0.75
Llama-2 70B	70B	0.85
Comp-LLM	~35B	0.83

Conclusions:

A composite system of ~35B nearly matches 70B.
And it crushes monolithic 34B.
Resource reduction: 1.67× – 3.56× at comparable quality.

Real-World Applications

1. Business (On-Premise AI)

You can deploy:

an HR expert,
a finance expert,
an IT expert,

running in parallel on affordable hardware.

2. Medicine

Experts in:

cardiology,
endocrinology,
neurology,

can analyze a patient simultaneously.
The system is explainable (XAI).

3. Education

Question:
“How did the Industrial Revolution influence Victorian literature?”

The system calls:

a history expert,
a literature expert.

The result is interdisciplinary and accurate.

4. Edge AI — Your Phone as ‘AI with Swap-In Brains’

A smartphone loads on demand:

a cooking expert,
a navigation expert,
a music expert.

Comp-LLM provides the architecture for such modularity.

Summary: Why Comp-LLM Matters

Because it demonstrates that:

Architecture > Parameters
Synergy > Monoliths
Modules > Giants

This is the future of AI:

modular,
composable,
energy-efficient,
open.

Appendix: Mathematical Analysis (for experts)

1. Generating the DAG

The Sub-query Generator maps:

$$ f_{SG}: Q \rightarrow G(V, E) $$

where:

$V = { s_1, \dots, s_k }$ — sub-queries,
$E = { (s_i, s_j) \mid s_i \text{ is required for } s_j }$.

Classic CoT produces a chain:

$$ s_1 \rightarrow s_2 \rightarrow \dots \rightarrow s_k $$

Comp-LLM produces a true DAG, enabling parallel execution.

2. Scheduling

The set of ready tasks:

$$ C_t = { s \in V \setminus Completed \mid \forall p \in Parents(s),, p \in Completed } $$

GPU memory constraint:

$$ \sum_{s \in S_t} M(Expert(s)) \le R_{total} $$

Solution: greedy heuristic (variation of RCPSP).

Comp-LLM: When an Army of Experts Beats a Giant – An Analysis of a Revolution in AI Architecture

Comp-LLM “in simple terms”

The Problem: One-Man Band vs. Professional Crew

Option 1: Monolithic Model (GPT-4, Llama-70B)

Option 2: Agent Systems (AutoGen, ReAct)

Option 3: Comp-LLM — A New Paradigm

Architecture & Technical Details

1. Sub-query Generator

2. Query Executor (Parallel Engine)

3. Response Aggregator

Why does this work?

Results: The Numbers Speak

Benchmark: MultiExpertQA-P

Real-World Applications

1. Business (On-Premise AI)

2. Medicine

3. Education

4. Edge AI — Your Phone as ‘AI with Swap-In Brains’

Summary: Why Comp-LLM Matters

Appendix: Mathematical Analysis (for experts)

1. Generating the DAG

2. Scheduling

Links

Comp-LLM “in simple terms”#

The Problem: One-Man Band vs. Professional Crew#

Option 1: Monolithic Model (GPT-4, Llama-70B)#

Option 2: Agent Systems (AutoGen, ReAct)#

Option 3: Comp-LLM — A New Paradigm#

Architecture & Technical Details#

1. Sub-query Generator#

2. Query Executor (Parallel Engine)#

3. Response Aggregator#

Why does this work?#

Results: The Numbers Speak#

Benchmark: MultiExpertQA-P#

Real-World Applications#

1. Business (On-Premise AI)#

2. Medicine#

3. Education#

4. Edge AI — Your Phone as ‘AI with Swap-In Brains’#

Summary: Why Comp-LLM Matters#

Appendix: Mathematical Analysis (for experts)#

1. Generating the DAG#

2. Scheduling#

Links#

Comp-LLM “in simple terms”

The Problem: One-Man Band vs. Professional Crew

Option 1: Monolithic Model (GPT-4, Llama-70B)

Option 2: Agent Systems (AutoGen, ReAct)

Option 3: Comp-LLM — A New Paradigm

Architecture & Technical Details

1. Sub-query Generator

2. Query Executor (Parallel Engine)

3. Response Aggregator

Why does this work?

Results: The Numbers Speak

Benchmark: MultiExpertQA-P

Real-World Applications

1. Business (On-Premise AI)

2. Medicine

3. Education

4. Edge AI — Your Phone as ‘AI with Swap-In Brains’

Summary: Why Comp-LLM Matters

Appendix: Mathematical Analysis (for experts)

1. Generating the DAG

2. Scheduling

Links