Agentic framework and structure

Last updated: 2026-05-08

Agentic framework and structure

The structure of the service1 agent. This diagram is automatically generated by the graph definition in api/agents/service1/graph.py.
Figure 1

Implementation details

The main agent logic is implemented using langgraph and is located in the api/agents/service1/ directory. The structure is modular, breaking down the agent’s functionality into distinct components:

  • graph.py: Defines the langgraph StateGraph, connecting all the nodes and edges that constitute the agent’s logic. It also includes code to automatically generate a Mermaid diagram of the graph’s structure.
  • core/: Contains the core components of the agent.
    • state.py: Defines the Service1State TypedDict, which tracks the agent’s state throughout the conversation, including messages, context, and actions.
    • llm_client.py: Manages the lazy-loaded LLM client (ChatGoogleGenerativeAI) and other services like RAG and Analytics.
    • registry.py: Holds the tool registry, mapping tool names (used in YAML manifests) to their callable implementations.
  • nodes/: Each file in this directory corresponds to a specific node in the graph, encapsulating a particular piece of logic (e.g., giving advice, collecting context).
    • nodes/manifests/: YAML manifest files, one per node, declaring the prompt structure, tools, and output schema for that node (see NodeEnv below).
  • memory/: Contains the MemoryLake component for asynchronous, persistent agent memory (see Persistent Memory Lake below).
  • routers/: Contains the conditional routing logic that directs the flow of the conversation between different nodes based on the current state.
  • tools/: LangGraph tool definitions (research_tools.py, contact_tools.py) that nodes can invoke via the research_tools tool node.
  • utils/: Provides helper functions, the NodeEnv compiler (node_env.py), context extraction helpers (context.py), and output formatters (formatters.py).

The agent is designed as a state machine where each node transition is determined by the output of the previous node and the conditional logic in the routers.

Node descriptions

  • agent1

    A central router node that uses the LLM to determine the next high-level action (e.g., give_advice, collect_context) based on the current conversation history. Its prompt and output schema are declared in nodes/manifests/agent1.yaml.

  • collect_context

    This node manages the initial phase of the conversation, guiding the user through a series of questions defined in the i18n configuration (see Dynamic Context Collection). It continues to ask questions until all required context (context_complete flag) has been collected.

  • ask_for_context

    A supplementary node to collect_context. The conversation is routed here if agent1 determines that the user’s situation requires further clarification outside the standard initial questions.

  • give_advice

    The core node responsible for generating a helpful, empathetic response to the user’s situation. It can trigger the research_tools tool node to enrich its answer with information from ingested RAG documents.

  • research_tools

    A LangGraph ToolNode that executes tool calls emitted by give_advice. Currently integrates research_educational_strategies for RAG-backed document search. The node generates multiple diverse RAG queries, performs parallelised vector searches, and filters results by an LLM-based relevance score before returning them to give_advice.

  • ongoing_support

    After the initial advice is given, this node handles the continuing conversation. It provides follow-up support, answers additional questions, and maintains context from the conversation summary stored in the state.

  • summarize_conversation

    Triggered at the end of a conversational loop. It generates a concise summary of the interaction. The summary is persisted asynchronously via the MemoryLake (see below), allowing context to be retained across sessions.

  • user_feedback

    A terminal node in the main advice-giving flow. It allows the conversation to end gracefully, awaiting further input from the user. If the user continues the conversation, the flow restarts through the appropriate router.

Routings

Routing is managed by conditional edges that evaluate the agent’s state (Service1State) to determine the next node.

  • should_collect_context

    The main entry-point router. Routes to agent1 if context is already complete or if the session is in ongoing-support mode; otherwise defaults to collect_context.

  • give_advice_after_context_collection

    Fired after collect_context completes. Routes directly to give_advice when all context questions have been answered, bypassing agent1 for speed.

  • router

    The primary action router after agent1. Takes the action field set by agent1 and routes to the corresponding node (give_advice, ongoing_support, user_feedback, or summarize_conversation). Also handles the fast-path to ongoing_support when the session is already in support mode and the last message is from the human.

  • advice_router

    Manages the flow after give_advice. Uses tools_condition to detect pending tool calls and route to research_tools; proceeds to summarize_conversation when summarisation is flagged (should_summarize); otherwise returns the action from state.

Advanced Concepts

NodeEnv — Declarative Prompt System

Nodes no longer hard-code their prompts or tool bindings in Python. Instead, each node declares its requirements in a YAML manifest stored in nodes/manifests/:

# Example: nodes/manifests/give_advice.yaml
node: give_advice
prompts:
  base: give_advice_system_prompt
  snippets:
    - first_advice_extension
tools:
  catalog:
    - tool: research_educational_strategies
      condition: first_advice
response_schema: GiveAdviceAnswer

The NodeEnv class (utils/node_env.py) reads the manifest at runtime and:

  1. Resolves the base system prompt from the i18n layer for the current language and user type.
  2. Appends any declared supplementary snippets (e.g., first_advice_extension).
  3. Filters the tools catalog against optional state conditions (e.g., only bind the research tool for the very first advice turn).
  4. Binds the resolved tool list or response schema to the LLM using bind_tools / with_structured_output.

This design means prompt content can be adjusted in the YAML files or i18n layer without any Python changes. A refresh_manifest_schema.py utility keeps the JSON Schema (node_manifest.schema.json) in sync with the manifest format for IDE validation.

Persistent Memory Lake

Conversation summaries are now persisted asynchronously via the MemoryLake (memory/lake.py), decoupling the agent’s hot path from database I/O:

  1. After summarize_conversation generates a summary, it drops a QueuedSummary into the lake and returns immediately — the agent does not wait for the write.
  2. The lake maintains a two-layer worker pool (3 + 3 async workers). The first layer deduplicates pending writes per (user_id, session_id) key, keeping only the most recent summary. The second layer flushes these to the MemoryService (which writes to PostgreSQL).
  3. On failure the key is re-queued, providing at-least-once delivery semantics without blocking the conversation.

The MemoryLake is initialised once on application startup and accessed as a singleton via get_memory_lake().

Internationalization (i18n)

The agent is multi-lingual from the ground up. The api/i18n/ directory and i18n_manager.py are central to this capability.

  • YAML-Only Content: All user-facing strings — system prompts, UI messages, and context-collection questions — are now stored in YAML files (e.g., context_questions/FR.yaml, prompts/teenager/FR/). The older JSON format has been fully retired.
  • Multi-Tenant Prompt Structure: Prompts are organised by user_type (teenager / parent) and language, allowing each audience to receive appropriately tailored language and framing.
  • Dynamic Database Overrides: The system supports dynamic overrides via the LocalizedContent database table. This allows updating prompts, UI messages, and snippets in production without requiring a code redeployment. Database content always takes precedence over filesystem YAML content.
  • Two-Layer Caching: The i18nManager employs a file-level cache layered on top of a memory-based database override cache, with additional LRU caching on public getter methods to minimise repeated database or file reads.
  • Synchronization Tool: A sync_to_db.py CLI script is provided to synchronize YAML content into the database, facilitating bulk updates and CI/CD integration.
  • State-Driven Language: The language field in Service1State drives all translation lookups, making it straightforward to add new languages without changing the agent’s Python code.
  • XML-Style Tagging: Dynamic content injected into prompts (collected context, previous conversations, privacy guidelines) is wrapped in XML tags (e.g., <collected_context>, <previous_conversations>) so the LLM can reliably distinguish information types.

Dynamic Context Collection

The collect_context node uses a sophisticated mechanism to extract structured information from a user’s initial message.

  • Structured Output: It uses the LLM’s structured output capability (with_structured_output).
  • Dynamic Pydantic Models: The utils/context.py file contains a QAFactory function that dynamically creates a Pydantic BaseModel from the list of context questions for the user’s language. For multiple-choice questions, Literal types constrain the output to valid values.
  • Confident Extraction: Each answer is accompanied by the model’s reasoning and confidence level, so only explicitly stated information is used — reducing hallucinations.
  • AI-Driven Dynamic Questions (feature-flagged): When the enable_dynamic_question_generation feature flag is enabled, the node can ask the LLM to generate additional follow-up questions beyond the static set, collecting richer situational context. This is controlled at runtime without code changes.

Observability with Langfuse

The agent is integrated with Langfuse for tracing and monitoring, configured in api/agents/service1/utils/observability.py.

  • Callback Handler: A custom AsyncCallbackHandler, ErrorFlagger, is attached to the LLM client.
  • Error Flagging: The on_llm_end method inspects the LLM response metadata. If it finds a block_reason (response blocked by safety filters), it updates the corresponding trace in Langfuse.
  • Debugging: This provides immediate visibility into when and why the LLM refuses to respond, which is important given the sensitive domain.

Service and LLM Client Configuration

The api/agents/service1/core/llm_client.py file centralises LLM and service initialisation.

  • Lazy Loading: get_llm(), get_rag_service(), and get_analytics_service() are lazy-loaded — initialised on first call rather than at startup.
  • LLM Safety Settings: The ChatGoogleGenerativeAI client deliberately disables all default safety filters (HarmBlockThreshold.BLOCK_NONE) to handle sensitive topics. Content safety is managed through prompt design and the ErrorFlagger observability layer.
  • User-Type-Aware RAG Service: get_rag_service initialises a RAGService aware of the user_type (teenager / parent), routing it to the correct vector-database collection.