DataFab Platform - Documentation

Version: 3.2 Last Updated: February 2026


Executive Summary

DataFab is an AI-powered data intelligence platform built on a metadata-driven architecture that provides unified access to distributed data assets while maintaining strict security controls. The platform enables natural language interaction across all components, supports configurable automation levels from fully manual to fully automated, integrates with 200+ data sources, and provides a data asset marketplace (Exchange) with blockchain-backed transactions.

Platform Architecture Components:

Component Purpose Security Relevance
Knowledge Fabric Active metadata, knowledge graph, entity resolution, 200+ MCP connectors Data access control, credential management, source provenance
Studio DDAs (Data-Driven Agents), widgets, datasets, utilities, Chain of Agents, MCP integrations, operational modes Execution sandboxing, schema-driven processing, human-in-the-loop controls
Exchange Data asset marketplace, catalog, wallet & blockchain (DAAC), metering, access control Transaction security, permission enforcement, ledger integrity
AI & LLM Layer LightLLM-based inference, provider agnostic, output consistency, model provenance Prompt security, schema validation, A/B testing
DevOps Infrastructure CI/CD pipelines, deployment automation Code security, vulnerability management

Key Security Characteristics:

  • Defense-in-depth architecture with multiple security layers
  • Zero-trust access model with continuous verification
  • End-to-end encryption for data in transit and at rest
  • Comprehensive audit logging with tamper-evident storage
  • Privacy-by-design with data minimization principles
  • Compliance-ready controls for GDPR and SOC 2
  • Schema-bounded extraction preventing hallucination and ensuring data quality
  • Five operational modes enabling configurable human oversight

Platform Capabilities

Knowledge Fabric

The Knowledge Fabric serves as the foundational data integration and intelligence layer, implementing a metadata-driven architecture that provides unified access while leaving source data in place.

Capability Description
Persistent Knowledge Graph Corporate memory with schema-bounded extraction and source provenance
Entity Resolution Cross-source entity matching with golden record management
200+ MCP Connectors Federated queries across databases, SaaS, and data systems
Two-Way Data Flow Read from and write back to source systems
Search Sessions Iterative exploration with accumulated context
Data Observability Quality monitoring, freshness tracking, automated alerts

Studio

The Studio (Helix Studio) provides the development environment for building Data-Driven Agents (DDAs), widgets, datasets, utilities, and multi-agent workflows.

Capability Description
DDA Creation Domain-driven flow with schema selection, natural language definition, query plan, and testing
Widget Types Visual interface components (SYSTEM and OUTPUT types) with dialog/canvas views
Datasets Structured data collections with file uploads, schema enforcement, and draft/publish lifecycle
Utilities Reusable components combining external APIs and DDAs with configurable placeholders
Business Domain Discovery Extract schemas from uploaded documents to define entity structures
Chain of Agents Multi-DDA orchestration with human-in-the-loop review gates
Graph of Agents (Planned) Non-linear graph orchestration with branching, parallelism, and cycles
MCP Integrations Managed MCP tool connections with types, instances, and credentials
Asset Search Semantic search across all Studio asset types
AI Hybrid Planning Automatic DDA creation from natural language descriptions
Operational Modes Five modes from traditional platform (Mode 0) to fully automated with audit (Mode 4)
Text-to-Pipeline Natural language workflow generation with DSL output

Exchange

The Exchange component is the platform’s data asset marketplace, enabling organizations to publish, discover, acquire, and monetize data assets with blockchain-backed transactions.

Capability Description
Asset Catalog Publish, search, and manage eight asset types (agents, widgets, datasets, models, etc.)
User Profiles Consumer, provider, and dual-role profiles with verification workflows
Wallet & Blockchain DAAC token on Ethereum for purchases, deposits, withdrawals, and transfers
Metering & Billing Usage tracking with configurable policies and automated billing
Pricing & Subscriptions Tiered access plans, subscription plans, fractional ownership, bulk purchases
Access Control Resource-level permission policies (READ, WRITE, DELETE, ADMIN)
API Gateway Managed endpoints (REST, GraphQL, Webhook, Proxy) with logging and rate limiting
Ledger & Revenue Sharing Double-entry accounting with revenue allocation, settlement, and reconciliation

AI & LLM Layer

The AI layer provides secure, provider-agnostic LLM integration with comprehensive monitoring and quality controls.

Capability Description
LightLLM Gateway Provider-agnostic interface supporting multiple LLM providers
Output Consistency Schema-validated extraction and ontology-based execution
Model Provenance Tracking of model versions and configurations
Quality Assurance Feedback loops, accuracy monitoring, reasoning chain transparency

Document Structure

Document Content
01-Introduction Platform overview and security summary
02-Architecture System architecture, deployment models, security boundaries
03-Knowledge-Fabric Knowledge graph, entity resolution, MCP connectors, data integration
04-Studio DDAs (Data-Driven Agents), widgets, datasets, utilities, Chain of Agents, MCP integrations, operational modes
05-AI-LLM LLM security, output consistency, model provenance, quality assurance
06-CI-CD CI/CD pipeline security
07-Security-Operations SOC, monitoring, and incident response
08-Graph-Operations Graph operations, rule engine, query processing
09-Schema-Management Business domain discovery, schema registry
10-Compliance-Capabilities Platform compliance features, retention management
11-API-Security API gateway, authentication, rate limiting
12-Exchange Data asset marketplace, catalog, wallet & blockchain, metering, access control
13-Graph-RAG Graph-based retrieval augmented generation capabilities

Security Principles

DataFab implements the following core security principles:

Principle Implementation
Defense in Depth Multiple independent security layers across all components
Least Privilege Minimum necessary access granted, permission propagation controls
Zero Trust All requests are authenticated regardless of source, with continuous verification
Data Minimization Only essential data collected; metadata-only architecture leaves source data in place
Encryption Everywhere Data encrypted at rest and in transit with AES-256 and TLS 1.3
Audit Everything Comprehensive logging with tamper-evident storage and full provenance
Schema-Driven Security All data extraction and processing bounded by user-defined schemas
Human-in-the-Loop Configurable automation levels with escalation and approval workflows

Operational Modes

The platform supports five operational modes enabling organizations to balance automation with human oversight:

Mode Name Description
0 Traditional Platform No AI agents; manual investigation and analysis
1 AI-Assisted Manual Agents in suggest-only mode; all decisions require human approval
2 Routine Automation Agents handle routine tasks; humans focus on analysis and decisions
3 Autonomous with Escalation Full investigation automation; escalation on exceptions
4 Fully Automated with Audit End-to-end automation; post-investigation human audits

Regulatory Compliance

The platform is designed to support compliance with:

Regulation Coverage
GDPR Data subject rights, consent management, data minimization
CCPA/CPRA California privacy requirements
SOC 2 Security, availability, processing integrity
FCA Requirements Financial services regulatory compliance
Enterprise Data Standards Industry best practices for data governance

Contact

For security inquiries or to request additional documentation, please contact the DataFab security team through your account representative.