DataFab Studio

Version: 6.0 Last Updated: February 2026

Component Overview

The Studio (Helix Studio) serves as the development environment for the DataFab platform, enabling users to build Data-Driven Agents (DDAs), design workflows, and orchestrate multi-agent solutions for data intelligence use cases. The platform supports five operational modes that enable organizations to balance automation with human oversight, tailored to their policies and use cases.

Core Capabilities:

Capability	Description
Business Domain Discovery	Extract schemas from uploaded documents
Domain Management	Organize schemas, datasets, DDAs, and chains within business domains
DDA Creation	Build Data-Driven Agents with natural language definition and testing
Widget Types	Define visual interface components (system and output widgets)
Dataset Management	Define and manage structured data collections with file uploads
Utilities	Build reusable components combining external APIs and DDAs
Chain of Agents	Orchestrate multiple DDAs with human-in-the-loop support
Graph of Agents	(Planned) Non-linear graph orchestration with branching and cycles
MCP Integrations	Connect to external tools via MCP protocol with credential management
External APIs	Configure and manage external API endpoint connections
Asset Search	Semantic search across all Studio asset types
AI Hybrid Planning	Create DDAs from natural language descriptions automatically
AI Workflow Execution	Execute DDA workflows with query-based invocation
Operational Modes	Five modes from traditional platform (Mode 0) to fully automated (Mode 4)

Asset Types:

Asset Type	API Identifier	Description
DDA	`DDA`	Data-Driven Agent with schema-bound processing
Chain of Agents	`CHAIN_OF_DDA`	Multi-DDA orchestration workflow
Dataset	`DATASET`	Structured data collection with file uploads
Widget	`WIDGET`	Visual interface component
Utility	`UTILITY`	Reusable component combining APIs and DDAs
Schema	`SCHEMA`	Business domain schema definition
MCP Integration	`MCP_INTEGRATION`	External tool connection via MCP protocol
Media Asset	`MEDIA_ASSET`	Media files with content URLs and policies
Model Asset	`MODEL_ASSET`	Machine learning model files

Lifecycle Stages:

Stage	Description
DRAFT	Asset under development; editable, not yet published
PUBLISHED	Asset published and available for use

Asset States:

State	Description
ACTIVE	Asset is operational
INACTIVE	Asset is disabled

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                           API GATEWAY LAYER                             │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │  Authentication │ Rate Limiting │ Request Routing │ Health Check│    │
│  └─────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                          STUDIO CORE SERVICES                           │
│                                                                         │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐          │
│  │  DDA Service    │  │Domain & Schema  │  │  Widget Type    │          │
│  │  (Agents)       │  │  Services       │  │  Service        │          │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘          │
│                                                                         │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐          │
│  │  Dataset        │  │  Utility        │  │  Chain of Agents│          │
│  │  Service        │  │  Service        │  │  Service        │          │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘          │
│                                                                         │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐          │
│  │  MCP Integration│  │  External API   │  │  Script         │          │
│  │  Service        │  │  Service        │  │  Service        │          │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘          │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                    EXECUTION ENGINE                             │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────┐ │    │
│  │  │  Sandbox    │  │  Resource   │  │ Credential  │  │  State  │ │    │
│  │  │  Runtime    │  │  Manager    │  │   Vault     │  │ Manager │ │    │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────┘ │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                  ASSET MANAGEMENT LAYER                         │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────┐ │    │
│  │  │  Unified    │  │  Asset      │  │  Media &    │  │Sources  │ │    │
│  │  │  Assets     │  │  Search     │  │  Model Mgmt │  │ Router  │ │    │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────┘ │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                  AI PLANNING & ORCHESTRATION                    │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────┐ │    │
│  │  │  AI Hybrid  │  │  Workflow   │  │   Error     │  │Recovery │ │    │
│  │  │  Planner    │  │  Executor   │  │  Handler    │  │ Manager │ │    │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────┘ │    │
│  └─────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         PLATFORM INTEGRATION                            │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐          │
│  │ Knowledge Fabric│  │   AI & LLM      │  │    Audit        │          │
│  │  (Data Sources) │  │   Services      │  │   Services      │          │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘          │
└─────────────────────────────────────────────────────────────────────────┘

Business Domains

Business Domains serve as the organizational structure for grouping related schemas, datasets, DDAs, and Chains of DDAs. Each domain represents a business area and contains the entity definitions that control data processing.

Domain Management

Operation	Description
Create	Define a new business domain with name and description
List	View all accessible domains with related entity counts
Update	Modify domain name, description, or metadata
Delete	Remove a domain and its associations

Domain Relationships:

Related Entity	Description
Schemas	Entity definitions assigned to the domain
Datasets	Data collections associated with the domain
DDAs	Data-Driven Agents operating within the domain
Chains of DDAs	Multi-DDA workflows within the domain

Domain Discovery

The Domain Discovery feature enables users to define their business domain by uploading documents, from which the system extracts relevant schemas automatically.

Extraction Process:

Stage	Description
Document Upload	User provides business documents via file upload
Content Analysis	AI extracts text, structure, tables from documents
Concept Extraction	Entities, attributes, relationships discovered
Schema Generation	Structured schemas created from extracted concepts
User Review	Interactive refinement of generated schemas
Domain Assignment	Schemas assigned to target business domain

Supported Document Types:

Document Type	Extraction Focus
Data Specifications	Data models, field definitions
Policies	Workflows, rules, roles
Data Dictionaries	Field definitions, types
Forms & Templates	Input fields, validations
Business Documents	Domain concepts, relationships

Schemas

Schemas define the entity structures used by DDAs, datasets, and other Studio assets for data processing and validation.

Schema Management

Operation	Description
Create	Define a new schema with name, JSON definition, domain
List	View all accessible schemas, filter by domain
Update	Modify schema definition, name, or thumbnail
Delete	Remove a schema

Schema Structure:

Field	Description
Name	Schema identifier
Definition	JSON schema defining entity structure and rules
Thumbnail	Visual preview of the schema
Domain	Associated business domain

Schema-Driven Processing

DDAs use bound schemas to control their data processing behavior.

Binding Type	Purpose
Input Schema	Validates incoming data structure
Output Schema	Ensures output conforms to expected format
Internal Schema	Controls intermediate data transformations
Validation Schema	Enforces business rules on processed data

Schema Enforcement:

Control	Implementation
Type Validation	Data types checked against schema definitions
Constraint Enforcement	Required fields, ranges, formats validated
Reference Validation	Entity references verified against schema
Schema Version Binding	DDAs pinned to specific schema versions

For detailed schema management capabilities, see 09-Schema-Management.

Data-Driven Agents (DDAs)

Data-Driven Agents (DDAs) are the fundamental execution unit in Studio. Each DDA combines a language model with a structured query plan to perform schema-bounded data processing tasks.

DDA Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    DDA STRUCTURE                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  DEFINITION                                             │    │
│  │  Name │ Description │ Instructions │ Model │ Prompt     │    │
│  └─────────────────────────────────────────────────────────┘    │
│                          │                                      │
│                          ▼                                      │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  QUERY PLAN                                             │    │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐    │    │
│  │  │ DATASET  │ │   MCP    │ │   DDA    │ │  SCRIPT  │    │    │
│  │  │  Items   │ │  Items   │ │  Items   │ │  Items   │    │    │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘    │    │
│  └─────────────────────────────────────────────────────────┘    │
│                          │                                      │
│                          ▼                                      │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  PLACEHOLDERS & RUNTIME CONFIG                          │    │
│  │  Configurable component slots with per-user mappings    │    │
│  └─────────────────────────────────────────────────────────┘    │
│                          │                                      │
│                          ▼                                      │
│  ┌───────────────┐  ┌────────────┐  ┌────────────────────┐      │
│  │  Draft/Publish │  │  Execute   │  │  Execution History │      │
│  │  Lifecycle     │  │  with Files│  │  Tracking          │      │
│  └───────────────┘  └────────────┘  └────────────────────┘      │
└─────────────────────────────────────────────────────────────────┘

DDA Definition

Field	Description
Name	Agent identifier
Description	Purpose and capabilities
Instructions	Natural language behavioral guidelines
Model	LLM model configuration for inference
Prompt	System prompt template for the agent
Domain	Associated business domain
Thumbnail	Visual preview
Stage	Lifecycle stage (DRAFT or PUBLISHED)
State	Operational state (ACTIVE or INACTIVE)
Type	DDA type indicator (DDA or CHAIN)

Query Plan

The Query Plan defines the data sources and processing steps a DDA uses during execution. Each plan consists of ordered items that specify where data comes from and how it is processed.

Query Plan Item Types:

Item Type	Description
DATASET	Reference to a Studio dataset as a data source
MCP	Reference to an MCP integration tool
DDA	Reference to another DDA for sub-agent execution
SCRIPT	Reference to a code script for custom processing

Placeholders

Placeholders define configurable component slots within a DDA that can be mapped to specific resources at runtime.

Placeholder Type	Description
MCP_INTEGRATION	Slot for an MCP tool connection
DATASET	Slot for a dataset reference
DDA	Slot for a sub-agent reference

Placeholder Configuration:

Field	Description
Name	Placeholder identifier
Type	Component type (MCP_INTEGRATION, DATASET, DDA)
Description	Purpose of the placeholder
Required	Whether the placeholder must be mapped
Default	Default resource mapping (if any)

Runtime Configuration

Runtime Config provides per-user customization of DDA behavior by mapping placeholders to specific resources.

Field	Description
Placeholder Mappings	Map each placeholder to a specific resource
State	Configuration state (ACTIVE or INACTIVE)
User Scope	Configuration applies per authenticated user

DDA Lifecycle

Operation	Description
Create	Define DDA with name, description, model, prompt
Draft	Save DDA configuration as draft for iteration
Apply Draft	Apply draft changes to the DDA definition
Update	Modify DDA properties, plan, or placeholders
Publish	Set stage from DRAFT to PUBLISHED
Execute	Run DDA with optional file inputs
Executions	View execution history with status tracking
Delete	Remove DDA

Execution Status:

Status	Description
SUCCESS	Execution completed successfully
ERROR	Execution failed with error

DDA Creation Flow

┌─────────────────────────────────────────────────────────────────────────┐
│                       DDA CREATION WORKFLOW                              │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────┐                                                    │
│  │  1. User        │  User initiates DDA creation                       │
│  │  Request        │  "Create new Data-Driven Agent"                    │
│  └────────┬────────┘                                                    │
│           │                                                             │
│           ▼                                                             │
│  ┌─────────────────────────────────────────────────────────────┐        │
│  │  2. Domain Selection                                        │        │
│  │  ┌─────────────────────┐   ┌─────────────────────────────┐  │        │
│  │  │  Select Existing    │   │  Create New Domain          │  │        │
│  │  │  Business Domain    │   │  via Domain Discovery       │  │        │
│  │  │  (entity schemas)   │   │  (upload documents)         │  │        │
│  │  └─────────────────────┘   └─────────────────────────────┘  │        │
│  └────────────────────────────────┬────────────────────────────┘        │
│                                   │                                     │
│                                   ▼                                     │
│  ┌─────────────────────────────────────────────────────────────┐        │
│  │  3. Schema Review & Modification                            │        │
│  │  • View generated/selected domain schemas                   │        │
│  │  • Add, modify, or remove entity definitions                │        │
│  │  • Configure attribute types and constraints                │        │
│  │  • Define relationships between entities                    │        │
│  └────────────────────────────────┬────────────────────────────┘        │
│                                   │                                     │
│                                   ▼                                     │
│  ┌─────────────────────────────────────────────────────────────┐        │
│  │  4. DDA Definition                                          │        │
│  │  • Name: Agent identifier                                   │        │
│  │  • Description: Purpose and capabilities                    │        │
│  │  • Instructions: Behavioral guidelines in plain language    │        │
│  │  • Model: LLM model selection                               │        │
│  │  • Prompt: System prompt template                           │        │
│  │  • Query Plan: Data sources (datasets, MCPs, DDAs, scripts) │        │
│  │  • Placeholders: Configurable component slots               │        │
│  └────────────────────────────────┬────────────────────────────┘        │
│                                   │                                     │
│                                   ▼                                     │
│  ┌─────────────────────────────────────────────────────────────┐        │
│  │  5. Draft & Testing                                         │        │
│  │  • Save as DRAFT stage                                      │        │
│  │  • Interactive testing interface                            │        │
│  │  • Execute with sample files                                │        │
│  │  • Output validation against schema                         │        │
│  │  • Apply draft when satisfied                               │        │
│  └────────────────────────────────┬────────────────────────────┘        │
│                                   │                                     │
│                                   ▼                                     │
│  ┌─────────────────────────────────────────────────────────────┐        │
│  │  6. Publication                                             │        │
│  │  • Set stage to PUBLISHED                                   │        │
│  │  • Configure runtime placeholders                           │        │
│  │  • Set access permissions                                   │        │
│  │  • Make available to other platform users                   │        │
│  └─────────────────────────────────────────────────────────────┘        │
└─────────────────────────────────────────────────────────────────────────┘

DDA Execution Security

┌─────────────────────────────────────────────────────────────────┐
│                    DDA EXECUTION SECURITY                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  User Request ──▶ [Authentication] ──▶ [Authorization]          │
│                          │                   │                  │
│                          ▼                   ▼                  │
│                   (Identity verified)  (DDA access check)       │
│                              │                                  │
│                              ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              EXECUTION SANDBOX                          │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │    │
│  │  │  Process    │  │  Network    │  │  Resource   │      │    │
│  │  │  Isolation  │  │  Isolation  │  │  Limits     │      │    │
│  │  └─────────────┘  └─────────────┘  └─────────────┘      │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │    │
│  │  │  Filesystem │  │  Capability │  │  Time       │      │    │
│  │  │  Isolation  │  │  Restrict   │  │  Limits     │      │    │
│  │  └─────────────┘  └─────────────┘  └─────────────┘      │    │
│  └─────────────────────────────────────────────────────────┘    │
│                              │                                  │
│                              ▼                                  │
│  Request ──▶ [Tool Authorization] ──▶ [Parameter Validation]    │
│                          │                       │              │
│                          ▼                       ▼              │
│                         (Checks)        (Input sanitization)    │
│                              │                                  │
│                              ▼                                  │
│                       [Audit Logging]                           │
└─────────────────────────────────────────────────────────────────┘

Execution Sandbox

All DDA execution occurs within isolated sandboxes.

Sandbox Features:

Feature	Implementation	Purpose
Process Isolation	Container with gVisor	Prevent host access
Filesystem Isolation	Overlay FS, read-only root	Prevent persistence
Time Limits	Process timeout	Prevent runaway execution

Resource Constraints:

Constraint	Limit	Purpose
Execution Time	Configurable (default 60s)	Prevent resource exhaustion
Memory	Configurable (default 512 MB)	Prevent memory exhaustion
Network	Allowlisted endpoints only	Prevent data exfiltration
File System	No persistent access	Prevent local attacks
Tool Calls	Configurable per execution	Prevent infinite loops
LLM Calls	Configurable per execution	Cost control

Widget Types define visual interface components that enable users to interact with DDA capabilities through graphical displays and interactive controls.

Type	API Value	Description
System	`SYSTEM`	Platform-provided built-in widget types
Output	`OUTPUT`	User-defined widgets for DDA output display

View Type	Description
dialog	Widget renders in a dialog/modal overlay
canvas	Widget renders on the main canvas area
both	Widget can render in either dialog or canvas context

Field	Description
Name	Widget type identifier
Type	SYSTEM or OUTPUT
View Type	dialog, canvas, or both
Data Schemas	Schema definitions for widget data binding
Thumbnail	Visual preview of the widget type

Operation	Description
Create	Define a new widget type with schemas and view type
List	View all available widget types
Update	Modify widget type configuration or schemas
Delete	Remove a widget type

Control	Implementation
Schema Validation	Widget data validated against bound data schemas
Access Control	Widget access governed by DDA permissions
Output Filtering	Widget output limited to authorized data
Audit Logging	All widget interactions logged

Datasets

Datasets enable users to define, manage, and share structured data collections that DDAs and widgets can access for processing and analysis. Datasets support file uploads and follow a draft/publish lifecycle.

Dataset Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                        DATASET ARCHITECTURE                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────────────────────────────────────────────┐            │
│  │                    DATASET DEFINITION                   │            │
│  │  ┌───────────────┐  ┌───────────────┐  ┌──────────────┐ │            │
│  │  │  Schema       │  │  File         │  │  Runtime     │ │            │
│  │  │  Binding      │  │  Mapping      │  │  Config      │ │            │
│  │  └───────────────┘  └───────────────┘  └──────────────┘ │            │
│  └─────────────────────────────────────────────────────────┘            │
│                                │                                        │
│                                ▼                                        │
│  ┌─────────────────────────────────────────────────────────┐            │
│  │                    DATA SOURCES                         │            │
│  │  ┌─────────┐         ┌─────────┐        ┌────────────┐  │            │
│  │  │Knowledge│         │ Uploaded│        │ MCP        │  │            │
│  │  │ Fabric  │         │  Files  │        │ Sources    │  │            │
│  │  └─────────┘         └─────────┘        └────────────┘  │            │
│  └─────────────────────────────────────────────────────────┘            │
│                                │                                        │
│                                ▼                                        │
│  ┌─────────────────────────────────────────────────────────┐            │
│  │                    CONSUMERS                            │            │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌────────────┐  │            │
│  │  │  DDAs   │  │ Widgets │  │Utilities│  │  Chains    │  │            │
│  │  └─────────┘  └─────────┘  └─────────┘  └────────────┘  │            │
│  └─────────────────────────────────────────────────────────┘            │
└─────────────────────────────────────────────────────────────────────────┘

Dataset Structure

Field	Description
Name	Dataset identifier
Description	Purpose and contents description
Files	Uploaded data files
File Mapping	Column/field mappings from files to schema
Schemas	Bound schemas for data validation
Domain	Associated business domain
Stage	Lifecycle stage (DRAFT or PUBLISHED)
State	Operational state (ACTIVE or INACTIVE)
Runtime Config	Per-user configuration with placeholder mappings

Dataset Lifecycle

Operation	Description
Create	Define dataset with name, description, domain
Upload Files	Add data files to the dataset
Map Fields	Configure file-to-schema field mappings
Draft	Save dataset configuration as draft
Apply Draft	Apply draft changes to the dataset definition
Bind Schema	Associate schemas for data validation
Configure	Set runtime configuration and placeholder mappings
Publish	Set stage from DRAFT to PUBLISHED
Delete	Remove dataset

Dataset Security Controls

Control	Implementation
Schema Enforcement	All data validated against defined schema
Access Control	Role-based and user-based dataset permissions
Source Authentication	Secure credentials for external data sources
File Validation	Uploaded files scanned and validated
Audit Logging	All dataset access and modifications logged
Version Control	Dataset definition changes tracked

Dataset Access Patterns

Pattern	Description	Security
Direct Query	DDAs query dataset directly	Permission check per query
Subscription	Use real-time version of the data	Subscription authorization
Snapshot	Point-in-time copy of dataset	Immutable, audited access

Utilities

Utilities are reusable components that combine external API endpoints and DDAs into configurable processing units. They provide a way to compose complex operations from existing building blocks.

Utility Structure

Field	Description
Name	Utility identifier
Description	Purpose and capabilities
API Mapping	Reference to an external API configuration
DDA References	References to DDAs used by the utility
Placeholders	Configurable component slots (API or DDA type)
Runtime Config	Per-user configuration with placeholder mappings
Stage	Lifecycle stage (DRAFT or PUBLISHED)
State	Operational state (ACTIVE or INACTIVE)

Utility Placeholder Types

Type	Description
API	Slot for an external API endpoint configuration
DDA	Slot for a Data-Driven Agent reference

Utility Lifecycle

Operation	Description
Create	Define utility with name, API mapping, DDA references
Draft	Save utility configuration as draft
Apply Draft	Apply draft changes to the utility definition
Configure	Set runtime configuration and placeholder mappings
Execute	Run the utility with provided parameters
Delete	Remove utility

Utility Security Controls

Control	Implementation
API Authorization	External API calls authorized per user permissions
DDA Authorization	Referenced DDAs authorized per user permissions
Input Validation	All parameters validated against schemas
Audit Logging	All utility executions logged

Chain of Agents

The Chain of Agents feature enables composition of complex data workflows from multiple coordinated Data-Driven Agents with optional human-in-the-loop review steps.

Chain Query Plan

Chains use a specialized query plan that supports DDA execution steps and human review gates.

Chain Query Plan Item Types:

Item Type	Description
DDA	Execute a Data-Driven Agent within the chain
HUMAN_IN_THE_LOOP	Insert a human review/approval gate in the workflow

Chain Structure

Field	Description
Name	Chain identifier
Description	Purpose and workflow description
Query Plan	Ordered list of DDA and HUMAN_IN_THE_LOOP items
Domain	Associated business domain
Runtime Config	Per-user configuration with placeholder mappings
Stage	Lifecycle stage (DRAFT or PUBLISHED)
State	Operational state (ACTIVE or INACTIVE)

Chain Lifecycle

Operation	Description
Create	Define chain with name, description, query plan
Draft	Save chain configuration as draft
Apply Draft	Apply draft changes to the chain definition
Configure	Set runtime configuration and placeholder mappings
Execute	Run chain with optional file inputs
Executions	View execution history with status tracking
Delete	Remove chain

Communication Patterns

┌─────────────────────────────────────────────────────────────────┐
│                    CHAIN EXECUTION PATTERNS                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  SEQUENTIAL WITH HUMAN GATE                                      │
│  ┌─────┐ ┌─────┐ ┌──────────┐ ┌─────┐ ┌─────┐                   │
│  │DDA A│→│DDA B│→│  HUMAN   │→│DDA C│→│DDA D│                   │
│  └─────┘ └─────┘ │  REVIEW  │ └─────┘ └─────┘                   │
│                  └──────────┘                                   │
│                                                                 │
│  BRANCHING                     HIERARCHICAL                      │
│  ┌─────┐                       ┌─────┐                           │
│  │DDA A│                       │DDA A│                           │
│  └──┬──┘                       └──┬──┘                           │
│     │                        ┌────┴────┐                         │
│  ┌──┼──────┐                 ▼         ▼                         │
│  ▼  ▼      ▼              ┌─────┐  ┌─────┐                       │
│ ┌──┐┌──┐┌──────────┐      │DDA B│  │DDA C│                       │
│ │B ││C ││  HUMAN   │      └──┬──┘  └──┬──┘                       │
│ └┬─┘└┬─┘│  REVIEW  │         ▼         ▼                         │
│  │   │  └──────────┘      ┌─────┐  ┌─────┐                       │
│  └───┼──────┘             │DDA D│  │DDA E│                       │
│      ▼                    └─────┘  └─────┘                       │
│   ┌─────┐                                                        │
│   │DDA E│                                                        │
│   └─────┘                                                        │
└─────────────────────────────────────────────────────────────────┘

Multi-Agent Security Controls

Control	Implementation
Message Authentication	All inter-DDA messages signed
Context Isolation	Each DDA has isolated execution context
Permission Propagation	Downstream DDAs cannot exceed upstream permissions
Human Gate Enforcement	HUMAN_IN_THE_LOOP items block until approved
Audit Trail	Complete chain of execution logged
Message TTL	Messages expire to prevent replay attacks

Orchestration Security

Pattern	Security Control
Sequential	Each step authorized independently
Parallel	Concurrent executions isolated
Hierarchical	Parent DDA controls child permissions
Human-in-the-Loop	Execution pauses pending human approval

State Management Security

State Type	Storage	Security Controls
Execution State	In-memory	Encrypted, execution-scoped
Checkpoint State	Persistent	Encrypted, tamper-evident
Workflow Context	Distributed	Encrypted, access-controlled

Error Handling and Recovery

Error Classification:

Error Type	Example	Strategy
Transient	Network timeout, rate limit	Retry with backoff
Permanent	Invalid input, missing resource	Fail fast, notify
Partial	Some items failed in batch	Continue with failures logged
Resource	Memory exceeded, timeout	Scale up or abort
Dependency	Upstream DDA failed	Circuit breaker, fallback

Recovery Controls:

Control	Implementation
Retry Policy	Configurable max attempts, backoff
Circuit Breaker	Automatic failover on repeated failures
Fallback DDAs	Alternative execution paths
Notifications	Configurable alerting on failures

Graph of Agents

Status: Planned — Graph of Agents is the next-generation orchestration capability, extending the linear Chain of Agents model into a full directed graph topology.

The Graph of Agents feature will enable composition of complex, non-linear workflows where multiple DDAs can be connected in arbitrary directed graph patterns — including conditional branching, parallel fan-out/fan-in, cycles with exit conditions, and dynamic routing based on intermediate results.

Graph vs. Chain Comparison

Capability	Chain of Agents	Graph of Agents
Topology	Linear sequence	Arbitrary directed graph
Branching	Limited (sequential only)	Full conditional branching
Parallelism	Sequential with human gates	Native parallel fan-out/fan-in
Cycles	Not supported	Supported with exit conditions
Dynamic Routing	Fixed execution order	Runtime-determined paths
Human-in-the-Loop	Gate between steps	Gate at any node in the graph
Convergence	Single output	Multiple convergence points

Planned Graph Structure

┌─────────────────────────────────────────────────────────────────┐
│                    GRAPH OF AGENTS TOPOLOGY                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────┐       ┌─────┐                                          │
│  │DDA A│──────▶│DDA B│──────────┐                               │
│  └─────┘       └──┬──┘          │                               │
│                   │             ▼                               │
│                   │          ┌──────────┐     ┌─────┐           │
│                   │          │  HUMAN   │────▶│DDA E│──┐        │
│                   │          │  REVIEW  │     └─────┘  │        │
│                   │          └──────────┘              │        │
│                   │                                    ▼        │
│                   ▼                                 ┌─────┐     │
│                ┌─────┐     ┌─────┐                  │DDA G│     │
│                │DDA C│────▶│DDA D│─────────────────▶│merge│     │
│                └─────┘     └──┬──┘                  └─────┘     │
│                               │                       ▲         │
│                               │    ┌─────┐            │         │
│                               └───▶│DDA F│────────────┘         │
│                                    └─────┘                      │
│                                    (cycle with exit condition)  │
└─────────────────────────────────────────────────────────────────┘

Planned Node Types

Node Type	Description
DDA	Execute a Data-Driven Agent
HUMAN_IN_THE_LOOP	Human review/approval gate
CONDITIONAL	Route to different branches based on runtime evaluation
FAN_OUT	Split execution into parallel branches
FAN_IN	Merge results from parallel branches
LOOP	Repeated execution with configurable exit condition

Planned Security Controls

Control	Implementation
Graph Validation	Cycle detection with mandatory exit conditions
Permission Propagation	Path-aware permission checking across all graph branches
Parallel Isolation	Concurrent branch executions fully isolated
Convergence Validation	Merged results validated against output schemas
Dynamic Route Audit	All routing decisions logged with decision rationale
Resource Budgeting	Per-graph execution resource limits

MCP Integrations

MCP (Model Context Protocol) Integrations provide connectivity to external tools and services. The Studio manages MCP integration types, instances, and credentials as separate entities.

MCP Integration Architecture

Entity	Description
Types	Definitions of available MCP tools with credential schemas
Instances	Configured connections to specific MCP tool deployments
Credentials	Stored authentication credentials for MCP connections

MCP Integration Types

Field	Description
Name	Integration type name
Description	Tool capabilities description
Command	MCP server command to execute
Args	Command-line arguments for the MCP server
Tools	List of available tools exposed by the server
Credentials Schema	JSON schema defining required authentication fields
Thumbnail	Visual preview of the integration

MCP Instance Management

Operation	Description
Create	Configure a new MCP instance from a type definition
List	View all configured MCP instances
Update	Modify instance configuration
Delete	Remove an MCP instance

MCP Credential Management

Operation	Description
Create	Store credentials for an MCP integration
List	View available credential sets (metadata only)
Update	Rotate or modify stored credentials
Delete	Remove stored credentials

MCP Security Controls

Control	Implementation
Schema Validation	Tool inputs validated against MCP schema
Credential Isolation	Credentials stored encrypted, never exposed in logs
Permission Enforcement	Tool actions authorized per user permissions
Rate Limiting	Per-tool request limits
Audit Logging	All tool invocations logged
Version Pinning	Explicit tool versions prevent supply chain attacks

External APIs

The Studio provides management for external API endpoint configurations that can be referenced by DDAs and Utilities.

API Configuration

Field	Description
Name	API endpoint identifier
Method	HTTP method (GET, POST, PUT, DELETE, etc.)
Endpoint URL	Target API endpoint URL
Parameters	Query and path parameter definitions
Body	Request body template
Auth Header	Authentication header configuration

API Management

Operation	Description
Create	Define a new external API endpoint
List	View all configured API endpoints
Update	Modify API endpoint configuration
Delete	Remove an API endpoint configuration

API Security Controls

Control	Implementation
Endpoint Validation	URLs validated and allowlisted
Auth Management	Authentication headers managed securely
Input Sanitization	Parameters and body content sanitized
Response Validation	API responses validated before processing
Audit Logging	All API calls logged with parameters and status

Scripts

Scripts are code components that can be referenced in DDA query plans for custom data processing logic.

Script Management

Operation	Description
List	View all available scripts

Script Usage in Query Plans

Context	Description
DDA Plan Item	Referenced as SCRIPT type in DDA query plans
Source Router	Available as SCRIPT type through the sources API

Asset Management

The Studio provides unified asset management capabilities across all asset types, including semantic search, media and model asset management, and source routing.

Unified Assets

The Unified Assets API provides a consolidated view across all Studio asset types.

Asset Category	Included Types
DDAs	Data-Driven Agents
Chains	Chain of Agents workflows
Datasets	Structured data collections
Widgets	Visual interface components
Utilities	Reusable API/DDA components
Media	Media files and content assets
Models	Machine learning model assets

Access Filters:

Filter	Description
owned	Assets owned by the authenticated user
accessible	Assets the user has been granted access to
all	All assets the user can view

Asset Search

The semantic search capability enables discovery across all Studio asset types using natural language queries.

Parameter	Description
Query	Natural language search term
Distance Threshold	Maximum semantic distance for matching results
Top K	Maximum number of results to return

Search Modes:

Mode	Description
Standard	Search across all asset types by semantic similarity
Step-Based	Search for individual workflow steps and components

Media Assets

Field	Description
Name	Media asset identifier
Content URL	URL to the media file content
Policy Source URL	URL to associated usage policies
Thumbnail	Preview image

Model Assets

Field	Description
Name	Model asset identifier
Files	Uploaded model files
Thumbnail	Preview image

Sources Router

The Sources Router provides a unified view of all available data sources that can be referenced in DDA query plans.

Source Type	Description
DATASET	Studio datasets
MCP	MCP integration tool connections
DDA	Data-Driven Agents (for sub-agent execution)
SCRIPT	Code scripts for custom processing

Classification

Entity	Description
Industries	Industry categories for asset classification
Tags	User-defined tags for asset organization

AI Hybrid Planning

The AI Hybrid Planning feature enables automatic creation of DDAs from natural language descriptions. Users provide a name, description, and instructions, and the system generates a structured DDA with appropriate configuration.

Planning Parameters

Parameter	Description
Name	Desired DDA name
Description	Purpose description in natural language
Instructions	Behavioral guidelines for the DDA
Is Planned	Whether to generate a query plan automatically
Is Structured	Whether to enforce schema-structured output
Has Widget	Whether to generate an associated widget

Planning Flow

Stage	Description
Input Analysis	Parse natural language description and instructions
Plan Generation	Generate query plan with appropriate data sources
Schema Mapping	Map to relevant domain schemas
DDA Construction	Create DDA definition with model and prompt
Widget Generation	Optionally generate associated widget type
Draft Creation	Save as DRAFT stage for user review

Security Controls

Control	Implementation
Input Validation	Natural language inputs filtered and validated
Permission Scoping	Generated DDA limited to user’s authorized resources
Schema Binding	Generated plans bound to existing domain schemas
Draft Only	AI-generated DDAs always start as DRAFT
Audit Trail	All planning steps logged

AI Workflow Execution

The AI Workflow Execution endpoint enables direct execution of published DDAs via query-based invocation.

Execution Interface

Parameter	Description
Agent UUID	Unique identifier of the DDA to execute
Query	Natural language query for the DDA to process

Execution Security

Control	Implementation
Authentication	HTTPBearer (JWT) token required
Authorization	User must have execution permission on the DDA
Sandbox Isolation	Execution occurs in isolated sandbox
Audit Logging	All executions logged with query and results

Text-to-Pipeline Generation

The Text-to-Pipeline feature enables users to create workflows using natural language descriptions. The system generates a Domain Specific Language (DSL) representation that can be visualized, refined, and executed.

Generation Flow

Stage	Description
User Input	Natural language description of desired workflow
Intent Analysis	Parse intent, identify data sources, operations, and conditions
DSL Generation	Generate structured pipeline definition
Visual Preview	Interactive pipeline display with flow diagram and step details
Refinement	Iterate with natural language commands to modify the pipeline
Execution	Run immediately, schedule, or save as reusable template

Pipeline DSL Structure

Component	Description	Purpose
Pipeline Metadata	Name, version, description, author	Identification, versioning
Inputs	Input parameters with types, validation	Data entry points
Steps	Ordered operations with tool mappings	Execution sequence
Connections	Data flow between steps	Dependency management
Outputs	Result definitions and transformations	Output specification
Constraints	Resource limits and execution policies	Security and governance

DSL Validation

Check	Description	Failure Action
Syntax Validation	DSL conforms to schema	Highlight errors, suggest fixes
Tool Availability	Referenced tools exist and are accessible	Show unavailable tools
Permission Check	User authorized for all tools and data	Flag unauthorized steps
Type Compatibility	Input/output types match across connections	Show type mismatches
Cycle Detection	No circular dependencies	Identify cycle location
Resource Estimation	Estimated resource usage within limits	Warn if limits exceeded

Security Controls

Control	Implementation
Intent Validation	Generated DSL limited to user’s authorized capabilities
Tool Scoping	Only tools the user has access to can be included
Data Source Verification	Data sources validated against user permissions
Preview Sandbox	Pipeline preview runs in isolated environment
Audit Trail	All generation and refinement steps logged
Version Control	DSL changes tracked with full history

Tool Authorization

DDAs can only invoke tools explicitly authorized for the user and use case.

Tool Categories

Category	Authorization Level	Approval Required
Read-Only (Search, Query)	User permission	Automatic
Data Modification	Explicit grant	Automatic with logging
External API Calls	Per-API authorization	Risk-dependent
Administrative Actions	Privileged access	Multi-party approval

Tool Permission Model

Permission	Description
`tools.search.read`	Query knowledge base
`tools.data.write`	Modify user data
`tools.external.call`	Call external APIs
`tools.admin.manage`	Administrative operations
`tools.data.privileged`	Access privileged data

Credential Handling

Credential Vault

Control	Implementation
Encryption	AES-256-GCM with HSM-backed keys
Access Control	User-scoped and DDA-scoped credentials
Audit	All credential access logged
Rotation	Automatic rotation support

Credential Types

Type	Storage	Access Pattern
API Keys	Encrypted vault	DDA-scoped retrieval
OAuth Tokens	Encrypted vault with refresh	Automatic refresh
Database Credentials	Encrypted vault	Just-in-time retrieval
Certificates	Certificate store	Managed lifecycle
MCP Credentials	Encrypted vault	Per-integration retrieval

Credential Injection

Credentials are never exposed in DDA definitions:

DDA references credential by name
At execution time, vault retrieves and decrypts credential
Credential injected into sandbox environment variable
After execution, sandbox destroyed with all credentials

Google OAuth Integration

The platform supports Google OAuth for connecting to Google services.

Stage	Description
Auth URL	Platform generates OAuth authorization URL
Callback	Platform receives OAuth callback with tokens
Storage	Tokens stored securely in credential vault
Refresh	Automatic token refresh on expiration

Operational Modes

The platform provides five operational modes that organizations can configure based on their policies, risk tolerance, and use case requirements. Users can adjust automation levels at any time and are never locked into a single approach.

Mode Overview

┌───────────────────────────────────────────────────────────────────────────┐
│                    OPERATIONAL MODE SPECTRUM                              │
├───────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  MODE 0          MODE 1          MODE 2          MODE 3          MODE 4   |
│  ────────────────────────────────────────────────────────────────────────►|
│  Traditional     AI-Assisted     Routine       Autonomous       Fully     |
│  Platform        Manual          Automation    with Escalation  Automated |
│  (No DDAs)                                                     with Audit |
│                                                                           │
│  ◄──────── Human Control ────────────── AI Automation ────────────────►   │
└───────────────────────────────────────────────────────────────────────────┘

Mode Definitions

Mode 0: Traditional Platform (No AI Agents)

Aspect	Description
Operation	System operates as traditional data platform
User Role	Users manually query data, build graphs, analyze results
AI Capabilities	Limited to basic search, entity extraction, document classification
Decisions	All analytical steps and conclusions are user-driven
Use Case	Organizations requiring full manual control or regulatory constraints

Mode 1: Fully Manual with AI Assistance

Aspect	Description
Operation	Users drive all decisions and processing steps
DDA Role	DDAs operate in “suggest-only” mode
AI Activities	Flagging patterns, gathering supporting data, highlighting anomalies
Decisions	All conclusions and next steps require user approval
Use Case	Building trust in AI capabilities, high-stakes operations

Mode 2: DDAs Handle Routine Tasks

Aspect	Description
Operation	DDAs autonomously perform routine operations
Automated Tasks	Entity resolution, document classification, data collection
Human Focus	Analysis, interpretation, high-value decisions
Validation Required	Significant findings require human review
Use Case	Balanced approach for standard operations

Mode 3: Autonomous with Escalation

Aspect	Description
Operation	DDAs conduct complete workflows independently
Workflow	Follow predefined workflows from start to completion
Escalation Triggers	Contradictions, low-confidence findings, high-impact discoveries
Human Role	Review completed work packages, validate conclusions
Use Case	High-volume processing with human oversight for exceptions

Mode 4: Fully Automated with Audit

Aspect	Description
Operation	DDAs run workflows end-to-end without interruption
Human Role	Post-workflow audits of conclusions and evidence
Provenance	System maintains complete audit trail
Review Timing	Periodic batch review or risk-triggered review
Use Case	Maximum efficiency for low-risk, high-volume operations

Granular Control Options

Control	Description
Selective DDA Activation	Disable specific DDAs while keeping others active
Confidence Thresholds	Set thresholds requiring human review when certainty drops
Case Type Policies	Apply different modes to different case types or stages
Override Capability	Override DDA recommendations with user judgment
Pause/Resume	Pause and resume automated workflows as needed

Mode Transition

Transition	Use Case
Higher → Lower Automation	Workflow identifies critical issue, switch to manual
Lower → Higher Automation	Initial review complete, transition to automated processing
Per-Stage Configuration	Different modes for different workflow phases
Emergency Override	Immediate switch to full manual control

Mode Security Controls

Control	Implementation
Mode Authorization	Only authorized users can change operational modes
Mode Audit	All mode changes logged with justification
Default Mode	Organization-wide default mode setting
Mode Inheritance	Child workflows inherit parent mode unless overridden
Compliance Lock	Certain modes can be locked for regulatory compliance

Human-in-the-Loop Controls

High-risk operations require human approval before execution. These controls operate in conjunction with operational modes and Chain of Agents HUMAN_IN_THE_LOOP items, providing safeguards even in higher-automation modes.

Risk Classification

Risk Level	Criteria	Approval Flow
Low	Read-only, no external effects	Automatic
Medium	Data modification, limited scope	User confirmation
High	External actions, bulk operations	Explicit approval + MFA
Critical	Administrative, irreversible	Multi-party approval

Approval Controls

Stage	Action	Timeout
Request	DDA requests approval	N/A
Presentation	User shown action details	N/A
Confirmation	User approves or rejects	5 minutes
Execution	Approved action executed	N/A
Audit	Decision and outcome logged	N/A

Testing Framework Security

Test Isolation

Control	Implementation
Environment Isolation	Tests run in separate environment
Data Isolation	Test data separated from production
Credential Isolation	Test credentials separate from production
Result Isolation	Test results not exposed to production

Test Types

Test Type	Purpose	Security Focus
Unit Tests	Individual step validation	Input validation
Integration Tests	End-to-end flow validation	Authorization flow
Security Tests	Vulnerability detection	Injection, bypass
Performance Tests	Latency and throughput	Resource exhaustion

Simulation Security

Capability	Security Control
Mock Data Sources	Generated data, no production access
Mock External Services	Simulated responses, no real calls
Load Simulation	Rate-limited, isolated environment
Failure Injection	Controlled, no production impact

Authentication

All Studio API endpoints require HTTPBearer (JWT) authentication. The authenticated user context determines asset access, execution permissions, and runtime configurations.

User Context

Field	Description
ID	Unique user identifier
Email	User email address

System API Access

For system-level operations such as attaching or detaching users from assets, a separate system API key authentication is required via the X-System-API-Key header.

Operation	Description
Attach User	Grant a user access to a specific asset
Detach User	Remove a user’s access to a specific asset

Audit Logging

Logged Events

Event Category	Logged Data	Retention
DDA Execution	DDA ID, user, inputs hash, duration, status	1 year
Tool Invocation	Tool ID, parameters, result status	1 year
Chain Execution	Chain ID, DDAs involved, flow path	1 year
Approval Decisions	Request, decision, approver, timestamp	2 years
Credential Access	Credential ID, accessor, purpose	2 years
Security Events	Violation type, details, action taken	2 years
Asset Changes	Asset ID, operation, user, before/after state	1 year
Search Queries	Query text, results count, user	1 year

Log Security

Control	Implementation
Integrity	Cryptographic hash chain prevents tampering
Confidentiality	Logs encrypted at rest
Access Control	Auditor role required; no delete capability
Retention	Configurable per regulation

API Reference Summary

Endpoint Group	Base Path	Operations
Health Check	`/api/health-check`	GET
Users	`/api/users`	GET (me)
Domain Discovery	`/api/domain_discover`	POST
Domains	`/api/domains`	CRUD
Schemas	`/api/schemas`	CRUD
Datasets	`/api/datasets`	CRUD + draft + files
DDAs	`/api/ddas`	CRUD + draft + execute
Chains of DDAs	`/api/chain-of-ddas`	CRUD + draft + execute
MCP Integration Types	`/api/mcp-integrations-types`	CRUD
MCP Instances	`/api/mcp-integrations`	CRUD
MCP Credentials	`/api/mcp-integrations-credentials`	CRUD
Widget Types	`/api/widget-types`	CRUD
Utilities	`/api/utilities`	CRUD + draft + execute
APIs	`/api/apis`	CRUD
Industries	`/api/industries`	CRUD
Tags	`/api/tags`	CRUD
Scripts	`/api/scripts`	LIST
Asset Search	`/api/asset_search`	POST
Media Assets	`/api/media-assets`	CRUD
Model Assets	`/api/model-assets`	CRUD
Sources	`/api/source-router`	LIST
Unified Assets	`/api/assets`	LIST
AI Workflow Execution	`/api/ai-workflow-execution`	POST
AI Hybrid Planning	`/api/ai-workflow-planning`	POST
Google OAuth	`/api/auth/google`	GET + callback
Asset Users (System)	`/api/asset-users`	POST + DELETE

DataFab Studio

Component Overview

Architecture Overview

Business Domains

Domain Management

Domain Discovery

Schemas

Schema Management

Schema-Driven Processing

Data-Driven Agents (DDAs)

DDA Architecture

DDA Definition

Query Plan

Placeholders

Runtime Configuration

DDA Lifecycle

DDA Creation Flow

DDA Execution Security

Execution Sandbox

Widget Types

Widget Type Classification

Widget View Types

Widget Type Structure

Widget Type Management

Widget Security Controls

Datasets

Dataset Architecture

Dataset Structure

Dataset Lifecycle

Dataset Security Controls

Dataset Access Patterns

Utilities

Utility Structure

Utility Placeholder Types

Utility Lifecycle

Utility Security Controls

Chain of Agents

Chain Query Plan

Chain Structure

Chain Lifecycle

Communication Patterns

Multi-Agent Security Controls

Orchestration Security

State Management Security

Error Handling and Recovery

Graph of Agents

Graph vs. Chain Comparison

Planned Graph Structure

Planned Node Types

Planned Security Controls

MCP Integrations

MCP Integration Architecture

MCP Integration Types

MCP Instance Management

MCP Credential Management

MCP Security Controls

External APIs

API Configuration

API Management

API Security Controls

Scripts

Script Management

Script Usage in Query Plans

Asset Management

Unified Assets

Asset Search

Media Assets

Model Assets

Sources Router

Classification

AI Hybrid Planning

Planning Parameters

Planning Flow

Security Controls

AI Workflow Execution

Execution Interface

Execution Security

Text-to-Pipeline Generation

Generation Flow

Pipeline DSL Structure