Nov 8, 2025

By Appropri8 Team

Function-Centric Agent Design: Moving Beyond Tool Use into Capability Composition

aiai-agentsfunction-compositioncapability-graphsagent-architecturepythonobservabilityself-optimizationdynamic-discoveryembeddings

Most AI agents work with tools. You define a tool, the agent calls it, and you get a result. It’s straightforward, but it’s also limiting.

Tools are static. They don’t adapt. They don’t learn from context. They don’t compose into larger capabilities. If you need something new, you write a new tool. If you need to combine tools, you hardcode the combination.

There’s a better approach. Instead of tools, think in functions. Functions that have metadata. Functions that can be discovered dynamically. Functions that can be chained together based on what the agent needs right now.

This is function-centric agent design. It’s how modern agents handle complex, evolving tasks. They don’t just call tools. They discover capabilities, compose them into graphs, and optimize execution based on feedback.

This article explains how function-centric systems work, why they matter, and how to build them.

Introduction: From Tools to Functions

Tool-using agents are everywhere. You tell the agent it can use a calculator tool, a search tool, a database tool. The agent picks a tool, calls it with parameters, and uses the result.

This works for simple tasks. But it breaks down when tasks get complex.

First, tools are opaque. The agent doesn’t know what a tool can do beyond its description. It can’t reason about tool capabilities. It can’t discover related tools. It just picks from a list.

Second, tools don’t compose well. If you want to combine tools, you write a new tool that calls the others. That new tool is static. It doesn’t adapt. It doesn’t learn.

Third, tools are hard to observe. You can log that a tool was called, but you can’t see how tools relate to each other. You can’t trace execution paths. You can’t optimize based on patterns.

Function-centric design fixes these problems.

In a function-centric system, capabilities are represented as callable functions with rich metadata. Each function has a signature, constraints, context requirements, and semantic descriptions. The agent can discover functions by similarity. It can compose functions into graphs. It can observe execution and optimize.

The key difference is dynamism. Tools are static. Functions are dynamic. The agent builds capability graphs on the fly based on what it needs.

Consider a research agent. Instead of having a “search papers” tool and a “summarize text” tool, you have functions. A search_academic_papers function. A extract_key_points function. A generate_summary function.

The agent can discover these functions. It can see that extract_key_points might be useful after search_academic_papers. It can chain them together. It can try different combinations and learn which work best.

This is capability composition. The agent composes capabilities into workflows that solve problems. The workflows aren’t hardcoded. They’re discovered and optimized.

Function-centric systems also enable better observability. You can trace which functions were called, in what order, with what results. You can see execution graphs. You can measure performance and optimize.

They also enable self-optimization. The agent can try different function combinations, measure results, and learn which combinations work best for different tasks.

This is the future of agent design. Not static tools, but dynamic function composition.

The Function Graph Concept

In a function-centric system, capabilities are functions. But not just any functions. They’re functions with metadata that makes them discoverable and composable.

Function Representation

A function in this system has several parts:

Signature: Input and output types. What parameters does it take? What does it return?

Capability description: What can this function do? This is usually a natural language description that can be embedded for similarity search.

Constraints: What are the limits? Rate limits? Resource requirements? Context requirements?

Context awareness: What context does this function need? Does it need user preferences? Does it need conversation history? Does it need results from other functions?

Execution metadata: How long does it typically take? What’s the success rate? What are common failure modes?

Here’s what a function definition might look like:

@capability(
    description="Searches academic papers using semantic similarity",
    input_schema={
        "query": {"type": "string", "description": "Search query"},
        "limit": {"type": "integer", "default": 10}
    },
    output_schema={
        "papers": {"type": "array", "items": {"type": "object"}}
    },
    constraints={
        "rate_limit": "10/minute",
        "requires_auth": True
    },
    context_requirements=["user_preferences", "search_history"]
)
def search_academic_papers(query: str, limit: int = 10) -> dict:
    # Implementation
    pass

The decorator captures all the metadata. The function itself is just Python code.

Function Metadata Schema

For a system to work with functions, it needs a standard way to represent metadata. JSON Schema works well:

{
  "function_id": "search_academic_papers",
  "name": "search_academic_papers",
  "description": "Searches academic papers using semantic similarity",
  "capabilities": ["search", "academic", "semantic-search"],
  "input_schema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query"
      },
      "limit": {
        "type": "integer",
        "default": 10,
        "minimum": 1,
        "maximum": 100
      }
    },
    "required": ["query"]
  },
  "output_schema": {
    "type": "object",
    "properties": {
      "papers": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "title": {"type": "string"},
            "authors": {"type": "array"},
            "abstract": {"type": "string"},
            "similarity_score": {"type": "number"}
          }
        }
      }
    }
  },
  "constraints": {
    "rate_limit": "10/minute",
    "requires_auth": true,
    "max_query_length": 500
  },
  "context_requirements": ["user_preferences", "search_history"],
  "execution_metadata": {
    "avg_duration_ms": 1200,
    "success_rate": 0.95,
    "common_failures": ["rate_limit_exceeded", "invalid_query"]
  }
}

This schema captures everything the agent needs to discover and use the function.

Function vs Plugin Comparison

You might wonder: how is this different from plugins?

Plugins are collections of tools. A plugin might have multiple tools, but each tool is still static. The agent picks a tool from the plugin, calls it, and moves on.

Functions are more granular. Each function is a capability. Functions can be discovered individually. They can be composed across different sources. They’re not tied to a plugin structure.

Plugins also tend to be monolithic. You install a plugin, and you get all its tools. Functions can be registered individually. You can have functions from different sources in the same registry.

Plugins are also harder to observe. You can see that a plugin was used, but not how individual capabilities within the plugin relate to each other. With functions, you can see the exact execution graph.

The main difference is composability. Plugins compose at the tool level. Functions compose at the capability level. That makes functions more flexible.

Serialization and Caching

For a function-centric system to work, you need to serialize function schemas. The agent needs to know what functions are available without loading all the code.

This is where the JSON schema comes in. You can store function schemas in a database, a file, or a registry service. The agent loads schemas, not implementations.

Caching is important too. Function discovery involves embedding similarity search. That’s expensive. You want to cache embeddings and search results.

You also want to cache execution results when appropriate. If a function is pure (no side effects), you can cache results based on inputs. If a function has side effects but the results are still valid, you can cache with TTL.

Here’s a simple caching layer:

from functools import lru_cache
from typing import Any
import hashlib
import json

class FunctionCache:
    def __init__(self, ttl_seconds: int = 3600):
        self.cache = {}
        self.ttl = ttl_seconds
    
    def _make_key(self, function_id: str, inputs: dict) -> str:
        # Create a hash key from function ID and inputs
        key_data = json.dumps({
            "function_id": function_id,
            "inputs": inputs
        }, sort_keys=True)
        return hashlib.md5(key_data.encode()).hexdigest()
    
    def get(self, function_id: str, inputs: dict) -> Any:
        key = self._make_key(function_id, inputs)
        if key in self.cache:
            entry = self.cache[key]
            # Check if entry is still valid
            if entry["expires_at"] > time.time():
                return entry["result"]
            else:
                del self.cache[key]
        return None
    
    def set(self, function_id: str, inputs: dict, result: Any):
        key = self._make_key(function_id, inputs)
        self.cache[key] = {
            "result": result,
            "expires_at": time.time() + self.ttl
        }

This is basic, but it shows the idea. You cache function results to avoid redundant computation.

Orchestrating Capability Composition

Discovering functions is one thing. Composing them into useful workflows is another. The agent needs to figure out which functions to call, in what order, with what parameters.

Dynamic Function Discovery

The first step is discovery. The agent has a goal. It needs to find functions that might help.

Embedding similarity works well here. You embed the goal description and function descriptions into the same vector space. Then you find functions with similar embeddings.

from sentence_transformers import SentenceTransformer
import numpy as np
from typing import List, Dict

class FunctionRegistry:
    def __init__(self):
        self.functions = {}
        self.embeddings = {}
        self.model = SentenceTransformer('all-MiniLM-L6-v2')
    
    def register(self, function_schema: dict):
        function_id = function_schema["function_id"]
        self.functions[function_id] = function_schema
        
        # Create embedding for discovery
        description = function_schema["description"]
        capabilities = " ".join(function_schema.get("capabilities", []))
        text = f"{description} {capabilities}"
        self.embeddings[function_id] = self.model.encode(text)
    
    def discover(self, goal: str, top_k: int = 5) -> List[Dict]:
        goal_embedding = self.model.encode(goal)
        
        # Calculate similarities
        similarities = {}
        for function_id, embedding in self.embeddings.items():
            similarity = np.dot(goal_embedding, embedding) / (
                np.linalg.norm(goal_embedding) * np.linalg.norm(embedding)
            )
            similarities[function_id] = similarity
        
        # Get top K functions
        sorted_functions = sorted(
            similarities.items(),
            key=lambda x: x[1],
            reverse=True
        )[:top_k]
        
        return [
            {
                "function_id": func_id,
                "similarity": sim,
                "schema": self.functions[func_id]
            }
            for func_id, sim in sorted_functions
        ]

This is basic, but it works. The agent embeds the goal, finds similar functions, and gets a ranked list.

Capability Router

Discovery gives you candidate functions. But you still need to decide which ones to use and in what order. That’s where a capability router comes in.

The router looks at the goal, the available functions, and the current context. It builds an execution plan.

from typing import List, Dict, Any, Optional
from dataclasses import dataclass

@dataclass
class ExecutionStep:
    function_id: str
    inputs: Dict[str, Any]
    depends_on: List[str]  # IDs of previous steps

class CapabilityRouter:
    def __init__(self, registry: FunctionRegistry):
        self.registry = registry
    
    def plan(self, goal: str, context: Dict[str, Any]) -> List[ExecutionStep]:
        # Discover relevant functions
        candidates = self.registry.discover(goal, top_k=10)
        
        # Build execution plan
        plan = []
        used_functions = set()
        
        # Start with functions that don't need other function outputs
        for candidate in candidates:
            if self._can_execute_now(candidate, context, used_functions):
                step = ExecutionStep(
                    function_id=candidate["function_id"],
                    inputs=self._prepare_inputs(candidate, context),
                    depends_on=[]
                )
                plan.append(step)
                used_functions.add(candidate["function_id"])
        
        # Then add functions that depend on previous results
        remaining = [c for c in candidates if c["function_id"] not in used_functions]
        max_iterations = 10
        iteration = 0
        
        while remaining and iteration < max_iterations:
            iteration += 1
            new_steps = []
            
            for candidate in remaining:
                dependencies = self._find_dependencies(candidate, plan)
                if dependencies is not None:
                    step = ExecutionStep(
                        function_id=candidate["function_id"],
                        inputs=self._prepare_inputs(candidate, context, dependencies),
                        depends_on=[d["function_id"] for d in dependencies]
                    )
                    new_steps.append(step)
                    used_functions.add(candidate["function_id"])
            
            plan.extend(new_steps)
            remaining = [c for c in remaining if c["function_id"] not in used_functions]
        
        return plan
    
    def _can_execute_now(self, candidate: Dict, context: Dict, used_functions: set) -> bool:
        schema = candidate["schema"]
        # Check if function needs outputs from other functions
        # This is simplified - real implementation would analyze input schema
        return True
    
    def _find_dependencies(self, candidate: Dict, plan: List[ExecutionStep]) -> Optional[List[ExecutionStep]]:
        # Check if this function's inputs can be satisfied by previous steps
        # This is simplified
        if not plan:
            return None
        # Return last step as dependency (simplified)
        return [plan[-1]] if plan else None
    
    def _prepare_inputs(self, candidate: Dict, context: Dict, dependencies: Optional[List] = None) -> Dict:
        schema = candidate["schema"]
        inputs = {}
        
        # Extract inputs from context
        input_schema = schema.get("input_schema", {}).get("properties", {})
        for param_name, param_schema in input_schema.items():
            if param_name in context:
                inputs[param_name] = context[param_name]
            elif "default" in param_schema:
                inputs[param_name] = param_schema["default"]
        
        # Extract inputs from dependencies if needed
        if dependencies:
            # Simplified: use output from last dependency
            # Real implementation would match output schema to input schema
            pass
        
        return inputs

This is simplified, but it shows the idea. The router discovers functions, figures out dependencies, and builds an execution plan.

System Prompts for Composition

LLMs are good at reasoning about function composition. You can use a system prompt to help the agent decide which functions to use.

def build_composition_prompt(goal: str, available_functions: List[Dict], context: Dict) -> str:
    function_descriptions = "\n".join([
        f"- {f['function_id']}: {f['schema']['description']}"
        for f in available_functions
    ])
    
    prompt = f"""You are an AI agent that composes functions to achieve goals.

Goal: {goal}

Available functions:
{function_descriptions}

Current context: {json.dumps(context, indent=2)}

Your task:
1. Identify which functions are needed to achieve the goal
2. Determine the execution order (which functions depend on others)
3. Specify inputs for each function

Respond with a JSON array of execution steps, each with:
- function_id: The function to call
- inputs: The inputs for this function
- depends_on: List of function IDs that must complete first (empty if none)

Example:
[
  {{
    "function_id": "search_academic_papers",
    "inputs": {{"query": "machine learning"}},
    "depends_on": []
  }},
  {{
    "function_id": "extract_key_points",
    "inputs": {{"text": "{{result_from_search_academic_papers}}"}},
    "depends_on": ["search_academic_papers"]
  }}
]
"""
    return prompt

The LLM reasons about function composition and returns a plan. You execute the plan, and if something fails, you can ask the LLM to revise.

Interpretability and Auditability

One advantage of function-centric systems is interpretability. You can see exactly which functions were called, in what order, with what results.

@dataclass
class ExecutionLog:
    step_id: str
    function_id: str
    inputs: Dict[str, Any]
    outputs: Any
    duration_ms: float
    success: bool
    error: Optional[str]
    timestamp: float

class ExecutionTracker:
    def __init__(self):
        self.logs = []
    
    def log_execution(self, step: ExecutionStep, result: Any, duration_ms: float, success: bool, error: Optional[str] = None):
        log = ExecutionLog(
            step_id=str(uuid.uuid4()),
            function_id=step.function_id,
            inputs=step.inputs,
            outputs=result,
            duration_ms=duration_ms,
            success=success,
            error=error,
            timestamp=time.time()
        )
        self.logs.append(log)
    
    def get_execution_graph(self) -> Dict:
        # Build a graph representation of execution
        nodes = []
        edges = []
        
        for log in self.logs:
            nodes.append({
                "id": log.step_id,
                "function_id": log.function_id,
                "success": log.success,
                "duration_ms": log.duration_ms
            })
            
            # Find dependencies
            step = self._find_step(log.function_id)
            if step and step.depends_on:
                for dep_id in step.depends_on:
                    dep_log = self._find_log_by_function_id(dep_id)
                    if dep_log:
                        edges.append({
                            "from": dep_log.step_id,
                            "to": log.step_id
                        })
        
        return {"nodes": nodes, "edges": edges}

This gives you a complete audit trail. You can see what happened, why it happened, and how long it took.

Implementation Example

Let’s build a simple capability registry and show how an agent uses it.

Function Registry Implementation

import json
import time
from typing import Dict, List, Any, Optional
from dataclasses import dataclass, asdict
from sentence_transformers import SentenceTransformer
import numpy as np

@dataclass
class FunctionSchema:
    function_id: str
    name: str
    description: str
    capabilities: List[str]
    input_schema: Dict
    output_schema: Dict
    constraints: Dict
    context_requirements: List[str]
    execution_metadata: Dict

class CapabilityRegistry:
    def __init__(self):
        self.functions: Dict[str, FunctionSchema] = {}
        self.embeddings: Dict[str, np.ndarray] = {}
        self.model = SentenceTransformer('all-MiniLM-L6-v2')
        self._build_embeddings()
    
    def register(self, schema: FunctionSchema):
        """Register a function with its schema."""
        self.functions[schema.function_id] = schema
        self._update_embedding(schema)
    
    def _update_embedding(self, schema: FunctionSchema):
        """Create or update embedding for a function."""
        text = f"{schema.description} {' '.join(schema.capabilities)}"
        self.embeddings[schema.function_id] = self.model.encode(text)
    
    def _build_embeddings(self):
        """Rebuild all embeddings (call after bulk registration)."""
        # This would be called after registering multiple functions
        pass
    
    def discover(self, goal: str, top_k: int = 5) -> List[Dict[str, Any]]:
        """Discover functions relevant to a goal using semantic similarity."""
        if not self.functions:
            return []
        
        goal_embedding = self.model.encode(goal)
        similarities = {}
        
        for function_id, embedding in self.embeddings.items():
            # Cosine similarity
            similarity = np.dot(goal_embedding, embedding) / (
                np.linalg.norm(goal_embedding) * np.linalg.norm(embedding)
            )
            similarities[function_id] = similarity
        
        # Sort by similarity and get top K
        sorted_functions = sorted(
            similarities.items(),
            key=lambda x: x[1],
            reverse=True
        )[:top_k]
        
        return [
            {
                "function_id": func_id,
                "similarity": float(sim),
                "schema": asdict(self.functions[func_id])
            }
            for func_id, sim in sorted_functions
        ]
    
    def get_schema(self, function_id: str) -> Optional[FunctionSchema]:
        """Get schema for a function by ID."""
        return self.functions.get(function_id)
    
    def list_all(self) -> List[str]:
        """List all registered function IDs."""
        return list(self.functions.keys())

Dynamic Function Invocation

Now let’s see how an agent uses the registry to discover and invoke functions:

import uuid
from typing import Callable, Any

class FunctionExecutor:
    def __init__(self, registry: CapabilityRegistry):
        self.registry = registry
        self.function_implementations: Dict[str, Callable] = {}
        self.execution_log: List[Dict] = []
    
    def register_implementation(self, function_id: str, implementation: Callable):
        """Register the actual Python function implementation."""
        self.function_implementations[function_id] = implementation
    
    def execute(self, function_id: str, inputs: Dict[str, Any]) -> Dict[str, Any]:
        """Execute a function with given inputs."""
        schema = self.registry.get_schema(function_id)
        if not schema:
            raise ValueError(f"Function {function_id} not found")
        
        implementation = self.function_implementations.get(function_id)
        if not implementation:
            raise ValueError(f"Implementation for {function_id} not found")
        
        # Validate inputs against schema
        self._validate_inputs(inputs, schema.input_schema)
        
        # Execute
        start_time = time.time()
        try:
            result = implementation(**inputs)
            duration_ms = (time.time() - start_time) * 1000
            success = True
            error = None
        except Exception as e:
            duration_ms = (time.time() - start_time) * 1000
            result = None
            success = False
            error = str(e)
        
        # Log execution
        log_entry = {
            "execution_id": str(uuid.uuid4()),
            "function_id": function_id,
            "inputs": inputs,
            "outputs": result,
            "duration_ms": duration_ms,
            "success": success,
            "error": error,
            "timestamp": time.time()
        }
        self.execution_log.append(log_entry)
        
        return {
            "success": success,
            "result": result,
            "error": error,
            "duration_ms": duration_ms
        }
    
    def _validate_inputs(self, inputs: Dict, schema: Dict):
        """Validate inputs against JSON schema (simplified)."""
        properties = schema.get("properties", {})
        required = schema.get("required", [])
        
        for field in required:
            if field not in inputs:
                raise ValueError(f"Required field '{field}' missing")
        
        for field, value in inputs.items():
            if field in properties:
                expected_type = properties[field].get("type")
                # Basic type checking (simplified)
                if expected_type == "string" and not isinstance(value, str):
                    raise ValueError(f"Field '{field}' must be string")
                elif expected_type == "integer" and not isinstance(value, int):
                    raise ValueError(f"Field '{field}' must be integer")

Example Usage

Here’s how you’d use this system:

# Create registry
registry = CapabilityRegistry()

# Register a function schema
search_schema = FunctionSchema(
    function_id="search_papers",
    name="search_papers",
    description="Search academic papers by topic",
    capabilities=["search", "academic", "research"],
    input_schema={
        "type": "object",
        "properties": {
            "query": {"type": "string"},
            "limit": {"type": "integer", "default": 10}
        },
        "required": ["query"]
    },
    output_schema={
        "type": "object",
        "properties": {
            "papers": {
                "type": "array",
                "items": {"type": "object"}
            }
        }
    },
    constraints={"rate_limit": "10/minute"},
    context_requirements=[],
    execution_metadata={"avg_duration_ms": 1200}
)

registry.register(search_schema)

# Register implementation
def search_papers_impl(query: str, limit: int = 10) -> dict:
    # Mock implementation
    return {
        "papers": [
            {"title": f"Paper about {query}", "authors": ["Author 1"]}
            for _ in range(limit)
        ]
    }

executor = FunctionExecutor(registry)
executor.register_implementation("search_papers", search_papers_impl)

# Discover functions for a goal
goal = "Find research papers about machine learning"
candidates = registry.discover(goal, top_k=3)

print("Discovered functions:")
for candidate in candidates:
    print(f"  - {candidate['function_id']}: {candidate['similarity']:.3f}")

# Execute a function
result = executor.execute("search_papers", {"query": "machine learning", "limit": 5})
print(f"\nExecution result: {result['success']}")
print(f"Duration: {result['duration_ms']:.2f}ms")

This shows the basic flow: register functions, discover relevant ones, and execute them.

Observability and Optimization

Function-centric systems generate a lot of data. Every function call produces metadata: inputs, outputs, duration, success or failure. This data is valuable for observability and optimization.

Measuring Function Performance

The first step is collecting metrics. You want to know how functions perform over time.

from collections import defaultdict
from typing import Dict, List
import statistics

class PerformanceMetrics:
    def __init__(self):
        self.metrics: Dict[str, List[float]] = defaultdict(list)
        self.success_counts: Dict[str, int] = defaultdict(int)
        self.failure_counts: Dict[str, int] = defaultdict(int)
    
    def record_execution(self, function_id: str, duration_ms: float, success: bool):
        """Record execution metrics."""
        self.metrics[function_id].append(duration_ms)
        if success:
            self.success_counts[function_id] += 1
        else:
            self.failure_counts[function_id] += 1
    
    def get_stats(self, function_id: str) -> Dict[str, float]:
        """Get performance statistics for a function."""
        if function_id not in self.metrics:
            return {}
        
        durations = self.metrics[function_id]
        total = self.success_counts[function_id] + self.failure_counts[function_id]
        success_rate = self.success_counts[function_id] / total if total > 0 else 0
        
        return {
            "avg_duration_ms": statistics.mean(durations),
            "median_duration_ms": statistics.median(durations),
            "p95_duration_ms": self._percentile(durations, 95),
            "p99_duration_ms": self._percentile(durations, 99),
            "min_duration_ms": min(durations),
            "max_duration_ms": max(durations),
            "success_rate": success_rate,
            "total_executions": total
        }
    
    def _percentile(self, data: List[float], percentile: int) -> float:
        """Calculate percentile."""
        sorted_data = sorted(data)
        index = int(len(sorted_data) * percentile / 100)
        return sorted_data[min(index, len(sorted_data) - 1)]

This gives you performance metrics per function. You can see which functions are slow, which fail often, and which are reliable.

Auto-Tuning Execution Chains

With performance data, you can optimize execution chains. If a function is slow, maybe there’s a faster alternative. If a chain fails often, maybe there’s a better sequence.

class ChainOptimizer:
    def __init__(self, metrics: PerformanceMetrics, registry: CapabilityRegistry):
        self.metrics = metrics
        self.registry = registry
        self.chain_history: Dict[str, List[Dict]] = defaultdict(list)
    
    def record_chain(self, chain_id: str, steps: List[str], success: bool, total_duration_ms: float):
        """Record execution of a function chain."""
        self.chain_history[chain_id].append({
            "steps": steps,
            "success": success,
            "duration_ms": total_duration_ms,
            "timestamp": time.time()
        })
    
    def optimize_chain(self, goal: str) -> List[str]:
        """Suggest an optimized chain for a goal."""
        # Discover candidate functions
        candidates = self.registry.discover(goal, top_k=10)
        
        # Score candidates based on performance
        scored = []
        for candidate in candidates:
            function_id = candidate["function_id"]
            stats = self.metrics.get_stats(function_id)
            
            # Score based on success rate and speed
            score = 0
            if stats:
                success_weight = 0.6
                speed_weight = 0.4
                
                success_score = stats["success_rate"]
                speed_score = 1.0 / (1.0 + stats["avg_duration_ms"] / 1000.0)  # Normalize
                
                score = success_weight * success_score + speed_weight * speed_score
            
            scored.append({
                "function_id": function_id,
                "similarity": candidate["similarity"],
                "performance_score": score,
                "combined_score": 0.7 * candidate["similarity"] + 0.3 * score
            })
        
        # Sort by combined score
        scored.sort(key=lambda x: x["combined_score"], reverse=True)
        
        # Build chain (simplified - just return top functions)
        return [s["function_id"] for s in scored[:3]]

This is simplified, but it shows the idea. You use performance data to optimize function selection and chaining.

Logging and Graph Persistence

For debugging and analysis, you want to persist execution logs and graphs.

import json
from datetime import datetime

class ExecutionLogger:
    def __init__(self, log_dir: str = "./logs"):
        self.log_dir = log_dir
        os.makedirs(log_dir, exist_ok=True)
    
    def log_execution_graph(self, graph: Dict, execution_id: str):
        """Persist an execution graph to disk."""
        filename = f"{self.log_dir}/execution_{execution_id}_{datetime.now().isoformat()}.json"
        with open(filename, 'w') as f:
            json.dump(graph, f, indent=2)
    
    def load_execution_graph(self, execution_id: str) -> Optional[Dict]:
        """Load an execution graph from disk."""
        # Find file matching execution_id
        for filename in os.listdir(self.log_dir):
            if execution_id in filename:
                with open(os.path.join(self.log_dir, filename), 'r') as f:
                    return json.load(f)
        return None

This lets you save and load execution graphs for analysis.

Best Practices

A few things to keep in mind:

Log everything: Inputs, outputs, durations, errors. You never know what you’ll need to debug.

Use structured logging: JSON logs are easier to query and analyze than text logs.

Track context: Log the context that was available when functions were called. That helps understand why certain functions were chosen.

Monitor in real-time: Don’t just log. Set up alerts for slow functions, high failure rates, or unusual patterns.

Analyze patterns: Look for common execution patterns. Maybe certain function combinations always fail. Maybe certain sequences are always slow.

Version function schemas: When you update a function, version the schema. That way you can track how changes affect performance.

Conclusion

Function-centric agent design is the next step beyond tool-using agents. Instead of static tools, you have dynamic functions that can be discovered, composed, and optimized.

The key advantages are composability, observability, and self-optimization. Functions compose into capability graphs. Execution is observable and auditable. Performance data enables automatic optimization.

This approach fits modern agent systems well. Agents need to handle complex, evolving tasks. They need to adapt. They need to learn. Function-centric design supports all of that.

The implementation isn’t trivial. You need function registries, discovery mechanisms, execution engines, and observability systems. But the building blocks are straightforward. Function schemas, embedding similarity, execution tracking, performance metrics.

As agent systems get more complex, function-centric design will become standard. It’s already happening in research and early production systems. The tools and patterns are emerging.

Multi-agent environments will benefit too. Agents can share function registries. They can discover capabilities from other agents. They can compose workflows across agent boundaries.

The future is composable capabilities, not static tools. Function-centric design gets us there.

Function-Centric Agent Design: Moving Beyond Tool Use into Capability Composition

Introduction: From Tools to Functions

The Function Graph Concept

Function Representation

Function Metadata Schema

Function vs Plugin Comparison

Serialization and Caching

Orchestrating Capability Composition

Dynamic Function Discovery

Capability Router

System Prompts for Composition

Interpretability and Auditability

Implementation Example

Function Registry Implementation

Dynamic Function Invocation

Example Usage

Observability and Optimization

Measuring Function Performance

Auto-Tuning Execution Chains

Logging and Graph Persistence

Best Practices

Conclusion

Discussion

Discussion

Confirm Action

Sign In

Function-Centric Agent Design: Moving Beyond Tool Use into Capability Composition

Stay Updated

Discussion

Discussion

Sign In