SelfAi-NPU-AGENT CLAUDE.md
This is a sophisticated **AI-powered terminal chatbot with multi-backend inference support** designed for Windows on ARM with Snapdragon X Elite NPU acceleration. The project implements a three-phase intelligent pipeline (SelfAI) with fallback mechanisms, memory management, and agent-based task...
AI NPU Agent Project - Architecture & Documentation
Project Overview
This is a sophisticated AI-powered terminal chatbot with multi-backend inference support designed for Windows on ARM with Snapdragon X Elite NPU acceleration. The project implements a three-phase intelligent pipeline (SelfAI) with fallback mechanisms, memory management, and agent-based task execution.
Key Purpose: Enable efficient local AI inference with automatic fallback from NPU hardware acceleration to CPU execution, all managed through a configuration-driven system with optional planning and merge phases.
Architecture Overview
High-Level System Design
The system implements a three-phase pipeline:
┌─────────────────────────────────────────────────────────────────┐
│ SelfAI Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. PLANNING PHASE (Ollama-based) │
│ ├─ Accepts user goal/request │
│ ├─ Generates DPPM plan (Distributed Planning Problem Model) │
│ └─ Creates subtasks with dependencies & merge strategy │
│ │
│ 2. EXECUTION PHASE (Multi-backend LLM inference) │
│ ├─ Executes subtasks sequentially/parallel (per plan) │
│ ├─ Uses AgentManager to route to specialized agents │
│ ├─ Falls back between backends: AnythingLLM → QNN → CPU │
│ └─ Saves results and tracks status │
│ │
│ 3. MERGE PHASE (Result synthesis) │
│ ├─ Collects all subtask outputs │
│ ├─ Synthesizes into coherent final answer │
│ └─ Falls back gracefully with internal summary │
│ │
└─────────────────────────────────────────────────────────────────┘
Multi-Backend Inference Strategy
The system supports three execution backends in priority order:
-
AnythingLLM (NPU) - Primary: Hardware-accelerated inference via Snapdragon X NPU
- Communicates via HTTP API to AnythingLLM server
- Configured in
config.yamlundernpu_provider - Supports streaming output
-
QNN (Qualcomm Neural Network) - Secondary: Direct NPU model execution
- Automatically discovered from
models/directory - Uses QAI Hub models (e.g., Phi-3.5-Mini-Instruct)
- Optimized for on-device inference
- Automatically discovered from
-
CPU Fallback - Tertiary: Local CPU inference via llama-cpp-python
- Uses GGUF quantized models (e.g., Phi-3-mini-4k-instruct.Q4_K_M.gguf)
- Pure CPU execution, no GPU/NPU required
- Guarantees functionality even without specialized hardware
Automatic Failover: If AnythingLLM fails, system automatically tries QNN, then CPU.
Directory Structure
AI_NPU_AGENT_Projekt/
├── CLAUDE.md # This file - architecture documentation
├── README.md # User-facing project overview
├── UI_GUIDE.md # Terminal UI features & customization
├── config.yaml.template # Configuration template
├── config_extended.yaml # Extended configuration example
├── .env.example # Environment variables template
├── requirements.txt # Main dependencies
├── requirements-core.txt # Core CPU dependencies
├── requirements-npu.txt # NPU-specific dependencies
│
├── config_loader.py # Configuration loading & validation
├── main.py # Entry point: Agent initialization
├── llm_chat.py # QNN-based chat interface
│
├── selfai/ # Main SelfAI package
│ ├── __init__.py
│ ├── selfai.py # Main CLI loop with full pipeline
│ ├── core/
│ │ ├── agent.py # Basic agent with tool-calling loop
│ │ ├── agent_manager.py # AgentManager: manages multiple agents
│ │ ├── model_interface.py # Base interface for LLM models
│ │ ├── anythingllm_interface.py # AnythingLLM HTTP client
│ │ ├── npu_llm_interface.py # QNN/NPU model interface
│ │ ├── local_llm_interface.py # CPU fallback (llama-cpp-python)
│ │ ├── planner_ollama_interface.py# Ollama planner client
│ │ ├── merge_ollama_interface.py # Ollama merge provider
│ │ ├── execution_dispatcher.py # Subtask execution orchestrator
│ │ ├── memory_system.py # Conversation & plan storage
│ │ ├── context_filter.py # Smart context relevance filtering
│ │ ├── planner_validator.py # Plan schema validation
│ │ └── smolagents_runner.py # Smolagents integration
│ │
│ ├── tools/
│ │ ├── tool_registry.py # Tool catalog & management
│ │ ├── filesystem_tools.py # File/directory operations
│ │ └── shell_tools.py # Shell command execution
│ │
│ └── ui/
│ └── terminal_ui.py # Terminal UI with animations
│
├── models/ # Model storage directory
│ ├── Phi-3-mini-4k-instruct.Q4_K_M.gguf # CPU fallback model
│ └── [other GGUF/QNN models]
│
├── memory/ # Conversation & plan storage
│ ├── plans/ # Saved execution plans
│ └── [memory categories]/ # Memory organized by agent categories
│
├── agents/ # Agent configurations
│ ├── [agent_key]/
│ │ ├── system_prompt.md # Agent system prompt
│ │ ├── memory_categories.txt # Memory categories for this agent
│ │ ├── workspace_slug.txt # AnythingLLM workspace
│ │ └── description.txt # Agent description
│ └── [other agents]
│
├── data/ # Additional data/resources
├── docs/ # Extended documentation
├── scripts/ # Setup & utility scripts
├── archive/ # Old/archived code
└── Learings_aus_Problemen/ # Learning notes & problems
Configuration System
Configuration Loading Flow
┌─────────────────────────────────────────────────────────┐
│ config_loader.py::load_configuration() │
├─────────────────────────────────────────────────────────┤
│ │
│ 1. Load .env file (secrets) │
│ 2. Load config.yaml (main settings) │
│ 3. Normalize config (support both formats) │
│ 4. Resolve environment variables (${VAR_NAME}) │
│ 5. Validate required fields │
│ 6. Create structured dataclasses │
│ │
└─────────────────────────────────────────────────────────┘
Configuration Structure
config.yaml contains:
-
npu_provider - AnythingLLM backend
npu_provider: base_url: "http://localhost:3001/api/v1" workspace_slug: "main" api_key: "loaded-from-.env" -
cpu_fallback - Local GGUF model
cpu_fallback: model_path: "Phi-3-mini-4k-instruct.Q4_K_M.gguf" n_ctx: 4096 # Context window size n_gpu_layers: 0 # GPU offload layers -
system - General settings
system: streaming_enabled: true # Enable word-by-word output stream_timeout: 60.0 # Streaming timeout in seconds -
agent_config - Agent management
agent_config: default_agent: "code_helfer" # Default agent to load -
planner - Optional Ollama-based planning
planner: enabled: false # Enable/disable planning execution_timeout: 120.0 # Timeout per subtask providers: # Multiple planner backends - name: "local-ollama" type: "local_ollama" base_url: "http://localhost:11434" model: "gemma3:1b" timeout: 180.0 max_tokens: 768 -
merge - Optional result synthesis
merge: enabled: false providers: # Multiple merge backends - name: "merge-ollama" type: "local_ollama" base_url: "http://localhost:11434" model: "gemma3:3b" timeout: 180.0 max_tokens: 2048
Environment Variables
Required (in .env file):
API_KEY: AnythingLLM API key (required if using AnythingLLM)
Optional:
OLLAMA_CLOUD_API_KEY: For cloud-based Ollama providers- Any variables referenced in config.yaml as
${VAR_NAME}
Key Components
1. Configuration Loader (config_loader.py)
Purpose: Centralized, validated configuration management
Key Classes:
NPUConfig: AnythingLLM backend settingsCPUConfig: Local model configurationSystemConfig: General system settingsPlannerConfig: Planning phase configurationMergeConfig: Merge phase configurationAppConfig: Complete application config
Key Functions:
load_configuration(): Load, validate, and structure config_normalize_config(): Support both simple and extended formats_resolve_env_template(): Replace${VAR_NAME}with env values
2. Agent System (selfai/core/agent_manager.py)
Purpose: Manage multiple specialized AI agents with memory
Agent Properties:
key: Unique identifier (e.g., "code_helfer")display_name: Human-readable namedescription: What this agent doessystem_prompt: Agent personality/instructionsmemory_categories: Conversation storage categoriesworkspace_slug: AnythingLLM workspace
AgentManager Responsibilities:
- Load agents from disk
- Switch between agents at runtime
- Provide agent to execution/memory systems
3. Model Interfaces (Base + Implementations)
Base: ModelInterface in model_interface.py
- Defines common interface for all LLM backends
- Methods:
chat_completion(),generate_response(),stream_generate_response()
Implementations:
-
AnythingLLMInterface (
anythingllm_interface.py)- HTTP client for AnythingLLM API
- Supports streaming via Server-Sent Events (SSE)
- Handles workspace management
- Configuration: base_url, workspace_slug, API key
-
NpuLLMInterface (
npu_llm_interface.py)- Direct QAI Hub models (Phi-3.5-Mini, etc.)
- NPU inference via QNN runtime
- Auto-discovers .qnn files in models directory
- Optimized for on-device execution
-
LocalLLMInterface (
local_llm_interface.py)- Wraps llama-cpp-python for CPU inference
- Loads GGUF quantized models
- Fallback when NPU unavailable
- Pure Python implementation
4. Planning System (planner_ollama_interface.py)
Purpose: Generate task decomposition plans (DPPM format)
PlannerOllamaInterface:
- Calls Ollama API with structured prompt
- Validates plan JSON schema
- Returns structured plan data
Plan Structure:
{
"subtasks": [
{
"id": "S1",
"title": "Task title",
"objective": "What to do",
"agent_key": "agent_name",
"engine": "anythingllm",
"parallel_group": 1,
"depends_on": []
}
],
"merge": {
"strategy": "How to combine results",
"steps": [...]
}
}
Planning Flow:
- User enters goal with
/plan <goal> - Planner decomposes into subtasks
- Validation checks plan structure
- User confirms before execution
- Plan saved to
memory/plans/
5. Execution System (execution_dispatcher.py)
Purpose: Execute planned subtasks with fault tolerance
ExecutionDispatcher:
- Loads plan from JSON
- Executes each subtask via LLM backends
- Manages retry logic (2 attempts by default)
- Tracks task status: pending → running → completed/failed
- Saves results to memory
Execution Pipeline:
For each subtask:
1. Try Backend 1 (AnythingLLM)
2. On failure → Try Backend 2 (QNN)
3. On failure → Try Backend 3 (CPU)
4. On all failure → Abort plan with error
5. Save result to memory
6. Update plan JSON with result path
Retry Strategy:
retry_attempts: Number of retries (default 2)retry_delay: Wait between retries (default 5s)- Exponential backoff for network errors
6. Memory System (memory_system.py)
Purpose: Persistent conversation and plan storage
Structure:
memory/
├── plans/ # Saved execution plans (JSON)
│ └── 20250101-120000_goal-name.json
├── code_helfer/ # Agent memory (categories)
│ ├── agent1_20250101-120000.txt
│ └── agent1_20250101-120001.txt
├── projektmanager/
│ └── ...
└── general/ # Default category
└── ...
File Format (text-based conversations):
---
Agent: Code Helper
AgentKey: code_helfer
Workspace: main
Timestamp: 2025-01-01 12:00:00
Tags: python, debugging
---
System Prompt:
[system instructions]
---
User:
[user question]
---
SelfAI:
[ai response]
Key Features:
- Automatic category assignment
- Tag extraction from content
- Context filtering (relevance-based)
- Plan serialization to JSON
7. Context Filtering (context_filter.py)
Purpose: Smart retrieval of relevant conversation history
Algorithms:
- Task Classification: Categorize user input (coding, planning, etc.)
- Relevance Scoring: Calculate similarity to past conversations
- Context Selection: Retrieve top N most relevant past interactions
Integration: Used by load_relevant_context() to populate chat history
8. Terminal UI (ui/terminal_ui.py)
Purpose: Rich terminal interface with animations
Features:
- ASCII banner and spinners
- Progress bars for long operations
- Color-coded status messages (green=success, yellow=warning, red=error)
- Typing animation for AI responses
- Stream prefix labels (showing which backend)
- Plan visualization with tree structure
- Interactive menu selection
Status Levels:
"success"- Green ✓"info"- Blue ⓘ"warning"- Yellow ⚠"error"- Red ✗
Entry Points & Execution Flow
Main Entry Point: selfai/selfai.py (Recommended)
Complete 3-phase pipeline:
python /path/to/selfai/selfai.py
Flow:
- Initialize configuration, agents, memory, UI
- Load LLM backends in priority order (AnythingLLM → QNN → CPU)
- Load optional planner providers (Ollama)
- Load optional merge providers (Ollama)
- Enter interactive loop:
/plan <goal>→ Planning phase- Normal message → Chat (execution phase)
/memory→ Manage memory/switch <agent>→ Switch agentsquit→ Exit
Key Commands:
/plan <goal>- Create and execute task decomposition plan/planner list- List available planner backends/planner use <name>- Switch planner provider/memory- List memory categories/memory clear <category>- Clear memory/switch <agent_name|number>- Switch active agentquit- Exit program
Alternative Entry Point: main.py
Simple agent initialization:
python main.py
Flow:
- Initialize Agent with local-ollama provider
- Simple chat loop without planning/merge
- Tool-based execution (via
smolagents)
Use Case: Basic testing without complex infrastructure
Alternative Entry Point: llm_chat.py
Direct QNN/NPU chat:
python llm_chat.py
Features:
- Direct QAI Hub model loading
- Phi-3.5-Mini on Snapdragon X Elite NPU
- Simple interactive chat
- No configuration needed
Component Interaction Diagram
┌──────────────────────────────────────────────────────────────────┐
│ selfai.py (Main Loop) │
└────────────────────────┬─────────────────────────────────────────┘
│
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼
┌────────┐ ┌─────────┐ ┌──────────┐
│Planning│ │Execution│ │ Merge │
│ Phase │ │ Phase │ │ Phase │
└────┬───┘ └────┬────┘ └────┬─────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│PlannerOllama │ │ExecutionDisp │ │MergeOllama │
│Interface │ │atcher │ │Interface │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└────────────────┼────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│AnythingLLM │ │NpuLLM │ │LocalLLM │
│Interface │ │Interface │ │Interface │
└────────────┘ └────────────┘ └────────────┘
▲ ▲ ▲
│ │ │
┌───┴──────────────────┼──────────────────┴───┐
│ │ │
│ ┌────────────────────┘ │
│ │ Automatic Fallback in Priority Order │
│ │ │
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│AnythingLLM│ │ QNN Models │ │ GGUF Models │
│Server │ │ (.qnn) │ │ (CPU) │
│(NPU) │ │ (NPU) │ │ │
└──────────┘ └────────────┘ └──────────────┘
┌─────────────────────────────────────────────────┐
│ Supporting Systems │
├─────────────────────────────────────────────────┤
│ AgentManager → Agent instances & switching │
│ MemorySystem → Conversation & plan persistence│
│ ConfigLoader → Centralized configuration │
│ TerminalUI → Rich terminal interface │
│ ContextFilter→ Smart context retrieval │
└─────────────────────────────────────────────────┘
Dependencies
Core Dependencies (requirements-core.txt)
PyYAML # Config file parsing
python-dotenv # Environment variable loading
openai # OpenAI API compatibility
llama-cpp-python # CPU model inference (GGUF)
numpy # Numerical computing
pyarrow # Data serialization
tabulate # Table formatting
smmap # Fast file mapping
psutil # System monitoring
qai-hub-models # Qualcomm AI Hub models
smolagents # Agent toolkit
NPU Dependencies (requirements-npu.txt)
httpx==0.28.1 # HTTP client for API calls
qai_hub_models # QNN model support
System Requirements
Hardware:
- Windows on ARM (Snapdragon X Elite or compatible)
- Minimum 8GB RAM for CPU inference
- 16GB+ recommended for NPU optimization
Software:
- Python 3.12 (ARM64 build)
- AnythingLLM Desktop (ARM64) - Optional for NPU backend
- Ollama - Optional for planning/merge phases
Setup Instructions
1. Initial Setup
# Clone repository
git clone <repository-url>
cd AI_NPU_AGENT_Projekt
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows CMD: .\.venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
2. Configuration
# Copy and configure
cp config.yaml.template config.yaml
cp .env.example .env
# Edit config.yaml with your settings
# Edit .env with your AnythingLLM API key
3. Prepare Models
# Create models directory
mkdir -p models
# Download GGUF model for CPU fallback
# Place in models/ directory
# e.g., Phi-3-mini-4k-instruct.Q4_K_M.gguf
4. Set Up Agents
# Create agents directory
mkdir -p agents
# Create agent directories with:
# agents/agent_key/
# ├── system_prompt.md
# ├── memory_categories.txt
# ├── workspace_slug.txt
# └── description.txt
5. Optional: Start Ollama (for planning/merge)
# If using Ollama planner/merge
ollama serve
# In another terminal, pull models
ollama pull gemma3:1b
ollama pull gemma3:3b
6. Optional: Start AnythingLLM
# If using AnythingLLM for primary inference
# Launch AnythingLLM Desktop and configure workspace
Common Workflows
Workflow 1: Simple Chat
python selfai/selfai.py
> You: What is Python?
AI: [Response from available backend]
Workflow 2: Task Decomposition with Planning
python selfai/selfai.py
> You: /plan Create a Python web crawler for news sites
[Planner decomposes into subtasks]
[System executes each subtask]
[Merge synthesizes final solution]
Workflow 3: Agent Switching
python selfai/selfai.py
> You: /switch projektmanager
Switched to: Project Manager
> You: Analyze the project requirements
AI: [Response from project manager agent]
Workflow 4: Memory Management
python selfai/selfai.py
> You: /memory
Aktive Memory-Kategorien:
- code_helfer
- projektmanager
> You: /memory clear code_helfer
Memory 'code_helfer' komplett geleert (15 Einträge).
How to Extend
Adding a New Agent
-
Create directory:
agents/my_agent/ ├── system_prompt.md (Agent personality) ├── memory_categories.txt (One per line) ├── workspace_slug.txt (AnythingLLM workspace) └── description.txt (What agent does) -
Reference in config.yaml:
agent_config: default_agent: "my_agent"
Adding a New Tool
-
Create in
selfai/tools/:class MyTool: @property def name(self): return "my_tool" @property def description(self): return "Tool description" @property def inputs(self): return { "param1": {"description": "..."} } def run(self, param1: str) -> str: # Implementation return result -
Register in
selfai/tools/tool_registry.py:from selfai.tools.my_tool import MyTool # Add to registry
Adding New LLM Backend
-
Create new interface in
selfai/core/:class MyLLMInterface: def generate_response(self, ...): ... def stream_generate_response(self, ...): ... -
Instantiate in
selfai/selfai.py:interface, label = _load_my_llm(models_root, ui) execution_backends.append({ "interface": interface, "label": label, "name": "my_backend" })
Troubleshooting
Issue: "API_KEY is not set"
Solution:
- Copy
.env.exampleto.env - Add your AnythingLLM API key
- Ensure
config.yamlis created from template
Issue: "AnythingLLM not available"
Solution:
- Verify AnythingLLM server running on configured host:port
- Check
npu_provider.base_urlin config.yaml - System will automatically fall back to QNN or CPU
Issue: CPU inference very slow
Solution:
- Reduce
max_output_tokensin config - Use quantized models (Q4_K_M)
- Reduce
n_ctx(context window) - Consider using AnythingLLM + NPU for acceleration
Issue: Memory growing unbounded
Solution:
- Use
/memory clear <category>to manage - Or:
/memory clear <category> 5to keep only last 5 - Memory is organized by agent memory_categories
Issue: Planner not working
Solution:
- Ensure Ollama running:
ollama serve - Set
planner.enabled: truein config.yaml - Verify Ollama models installed:
ollama pull gemma3:1b - Check
planner.providers[0].base_urlpoints to Ollama
Performance Considerations
Streaming vs. Blocking
-
Streaming (default): Better UX, lower latency perception
- Enable:
system.streaming_enabled: true - Supported by: AnythingLLM, some Ollama versions
- Enable:
-
Blocking: Simple, predictable latency
- Use if streaming unavailable
- System falls back automatically
Backend Selection
| Backend | Speed | Quality | Hardware | Notes | |---------|-------|---------|----------|-------| | AnythingLLM (NPU) | Fast | High | Snapdragon X Elite | Recommended primary | | QNN | Very Fast | High | Snapdragon X Elite | Direct NPU access | | CPU (GGUF) | Slow | Medium | Any | Fallback guarantee |
Token Limits
- Planner
max_tokens: 768 (plan generation) - Merge
max_tokens: 1536 (result synthesis) - Chat
max_output_tokens: 512 (regular response) - Increase for longer responses, decrease for speed
Technical Details
DPPM (Distributed Planning Problem Model) Format
The planner generates plans in DPPM format:
{
"subtasks": [
{
"id": "S1",
"title": "Analyze Requirements",
"objective": "Understand what user needs",
"agent_key": "analyst",
"engine": "anythingllm",
"parallel_group": 1,
"depends_on": [],
"result_path": "memory/plans/results/S1.txt"
},
{
"id": "S2",
"title": "Design Solution",
"objective": "Create architecture",
"agent_key": "architect",
"engine": "anythingllm",
"parallel_group": 2,
"depends_on": ["S1"],
"result_path": "memory/plans/results/S2.txt"
}
],
"merge": {
"strategy": "Combine analysis and design",
"steps": [
{
"title": "Synthesis",
"description": "Unite results",
"depends_on": ["S2"]
}
]
},
"metadata": {
"planner_provider": "local-ollama",
"planner_model": "gemma3:1b",
"goal": "Create a web application",
"merge_agent": "projektmanager"
}
}
Streaming Protocol
AnythingLLM and Ollama use Server-Sent Events (SSE):
event: message
data: {"content": "Hello"}
event: message
data: {"content": " world"}
event: end
data: {"done": true}
System automatically decodes and displays streaming chunks.
Configuration Validation
- Type Checking: Dataclass validation
- Required Fields: Missing keys raise ValueError
- URL Validation: Tested with health checks
- Model Existence: GGUF files checked before use
Development Notes
Code Organization Principles
-
Separation of Concerns: Each module has single responsibility
*_interface.py→ Backend communication*_system.py→ Persistent stateexecution_*→ Task orchestrationui/→ User interface
-
Dependency Injection: Core business logic independent of I/O
- Interfaces passed as parameters
- Easy to mock for testing
- Flexible backend switching
-
Graceful Degradation: System continues with reduced capability
- Missing optional features don't crash
- Fallback mechanisms at each level
- Clear status messages about limitations
-
Configuration-Driven: Behavior changes without code modification
- All settings in config.yaml
- Environment variable interpolation
- Validated at startup
Key Design Patterns
- Strategy Pattern: Multiple interchangeable LLM backends
- Chain of Responsibility: Backend fallback chain
- Observer Pattern: UI status callbacks
- Repository Pattern: Memory system abstraction
- Factory Pattern: Agent and tool instantiation
Security Considerations
Secrets Management
- Never commit
.envfile or API keys - Use
.env.exampleas template - Load secrets via
python-dotenv - Validate all API keys at startup
Input Validation
- User prompts passed through context filters
- Plan validation before execution
- Tool arguments checked before execution
- Configuration values type-checked
File Operations
- All file I/O uses
pathlibfor safety - Paths resolved relative to project root
- No arbitrary shell execution (unless explicit)
- Memory files stored in secure locations
Future Enhancements
Potential improvements:
- Parallel Subtask Execution: Execute independent tasks concurrently
- Custom Tools: User-defined tool integration
- Multi-Model Ensemble: Combine outputs from multiple backends
- RAG Integration: Vector database for document retrieval
- Web UI: Browser-based interface instead of CLI
- Distributed Execution: Execute subtasks on remote machines
- Audio I/O: Voice input/output support
- Plugin System: Dynamic backend/tool loading
References
Related Files
- Configuration:
config.yaml.template,config_extended.yaml - Models: Download from Hugging Face, Ollama, QAI Hub
- Documentation:
README.md,UI_GUIDE.md - Examples: Check
Learings_aus_Problemen/directory
External Resources
Version History
- Current: Multi-backend inference with planning/merge phases
- Previous: Simple chatbot with NPU/CPU fallback
Last Updated: January 2025 Maintained By: AI NPU Agent Project Team License: [Check LICENSE file]