DataFab Platform - Documentation
Version: 3.2 Last Updated: February 2026
Executive Summary
DataFab is an AI-powered data intelligence platform built on a metadata-driven architecture that provides unified access to distributed data assets while maintaining strict security controls. The platform enables natural language interaction across all components, supports configurable automation levels from fully manual to fully automated, integrates with 200+ data sources, and provides a data asset marketplace (Exchange) with blockchain-backed transactions.
Platform Architecture Components:
| Component | Purpose | Security Relevance |
|---|---|---|
| Knowledge Fabric | Active metadata, knowledge graph, entity resolution, 200+ MCP connectors | Data access control, credential management, source provenance |
| Studio | DDAs (Data-Driven Agents), widgets, datasets, utilities, Chain of Agents, MCP integrations, operational modes | Execution sandboxing, schema-driven processing, human-in-the-loop controls |
| Exchange | Data asset marketplace, catalog, wallet & blockchain (DAAC), metering, access control | Transaction security, permission enforcement, ledger integrity |
| AI & LLM Layer | LightLLM-based inference, provider agnostic, output consistency, model provenance | Prompt security, schema validation, A/B testing |
| DevOps Infrastructure | CI/CD pipelines, deployment automation | Code security, vulnerability management |
Key Security Characteristics:
- Defense-in-depth architecture with multiple security layers
- Zero-trust access model with continuous verification
- End-to-end encryption for data in transit and at rest
- Comprehensive audit logging with tamper-evident storage
- Privacy-by-design with data minimization principles
- Compliance-ready controls for GDPR and SOC 2
- Schema-bounded extraction preventing hallucination and ensuring data quality
- Five operational modes enabling configurable human oversight
Platform Capabilities
Knowledge Fabric
The Knowledge Fabric serves as the foundational data integration and intelligence layer, implementing a metadata-driven architecture that provides unified access while leaving source data in place.
| Capability | Description |
|---|---|
| Persistent Knowledge Graph | Corporate memory with schema-bounded extraction and source provenance |
| Entity Resolution | Cross-source entity matching with golden record management |
| 200+ MCP Connectors | Federated queries across databases, SaaS, and data systems |
| Two-Way Data Flow | Read from and write back to source systems |
| Search Sessions | Iterative exploration with accumulated context |
| Data Observability | Quality monitoring, freshness tracking, automated alerts |
Studio
The Studio (Helix Studio) provides the development environment for building Data-Driven Agents (DDAs), widgets, datasets, utilities, and multi-agent workflows.
| Capability | Description |
|---|---|
| DDA Creation | Domain-driven flow with schema selection, natural language definition, query plan, and testing |
| Widget Types | Visual interface components (SYSTEM and OUTPUT types) with dialog/canvas views |
| Datasets | Structured data collections with file uploads, schema enforcement, and draft/publish lifecycle |
| Utilities | Reusable components combining external APIs and DDAs with configurable placeholders |
| Business Domain Discovery | Extract schemas from uploaded documents to define entity structures |
| Chain of Agents | Multi-DDA orchestration with human-in-the-loop review gates |
| Graph of Agents | (Planned) Non-linear graph orchestration with branching, parallelism, and cycles |
| MCP Integrations | Managed MCP tool connections with types, instances, and credentials |
| Asset Search | Semantic search across all Studio asset types |
| AI Hybrid Planning | Automatic DDA creation from natural language descriptions |
| Operational Modes | Five modes from traditional platform (Mode 0) to fully automated with audit (Mode 4) |
| Text-to-Pipeline | Natural language workflow generation with DSL output |
Exchange
The Exchange component is the platform’s data asset marketplace, enabling organizations to publish, discover, acquire, and monetize data assets with blockchain-backed transactions.
| Capability | Description |
|---|---|
| Asset Catalog | Publish, search, and manage eight asset types (agents, widgets, datasets, models, etc.) |
| User Profiles | Consumer, provider, and dual-role profiles with verification workflows |
| Wallet & Blockchain | DAAC token on Ethereum for purchases, deposits, withdrawals, and transfers |
| Metering & Billing | Usage tracking with configurable policies and automated billing |
| Pricing & Subscriptions | Tiered access plans, subscription plans, fractional ownership, bulk purchases |
| Access Control | Resource-level permission policies (READ, WRITE, DELETE, ADMIN) |
| API Gateway | Managed endpoints (REST, GraphQL, Webhook, Proxy) with logging and rate limiting |
| Ledger & Revenue Sharing | Double-entry accounting with revenue allocation, settlement, and reconciliation |
AI & LLM Layer
The AI layer provides secure, provider-agnostic LLM integration with comprehensive monitoring and quality controls.
| Capability | Description |
|---|---|
| LightLLM Gateway | Provider-agnostic interface supporting multiple LLM providers |
| Output Consistency | Schema-validated extraction and ontology-based execution |
| Model Provenance | Tracking of model versions and configurations |
| Quality Assurance | Feedback loops, accuracy monitoring, reasoning chain transparency |
Document Structure
| Document | Content |
|---|---|
| 01-Introduction | Platform overview and security summary |
| 02-Architecture | System architecture, deployment models, security boundaries |
| 03-Knowledge-Fabric | Knowledge graph, entity resolution, MCP connectors, data integration |
| 04-Studio | DDAs (Data-Driven Agents), widgets, datasets, utilities, Chain of Agents, MCP integrations, operational modes |
| 05-AI-LLM | LLM security, output consistency, model provenance, quality assurance |
| 06-CI-CD | CI/CD pipeline security |
| 07-Security-Operations | SOC, monitoring, and incident response |
| 08-Graph-Operations | Graph operations, rule engine, query processing |
| 09-Schema-Management | Business domain discovery, schema registry |
| 10-Compliance-Capabilities | Platform compliance features, retention management |
| 11-API-Security | API gateway, authentication, rate limiting |
| 12-Exchange | Data asset marketplace, catalog, wallet & blockchain, metering, access control |
| 13-Graph-RAG | Graph-based retrieval augmented generation capabilities |
Security Principles
DataFab implements the following core security principles:
| Principle | Implementation |
|---|---|
| Defense in Depth | Multiple independent security layers across all components |
| Least Privilege | Minimum necessary access granted, permission propagation controls |
| Zero Trust | All requests are authenticated regardless of source, with continuous verification |
| Data Minimization | Only essential data collected; metadata-only architecture leaves source data in place |
| Encryption Everywhere | Data encrypted at rest and in transit with AES-256 and TLS 1.3 |
| Audit Everything | Comprehensive logging with tamper-evident storage and full provenance |
| Schema-Driven Security | All data extraction and processing bounded by user-defined schemas |
| Human-in-the-Loop | Configurable automation levels with escalation and approval workflows |
Operational Modes
The platform supports five operational modes enabling organizations to balance automation with human oversight:
| Mode | Name | Description |
|---|---|---|
| 0 | Traditional Platform | No AI agents; manual investigation and analysis |
| 1 | AI-Assisted Manual | Agents in suggest-only mode; all decisions require human approval |
| 2 | Routine Automation | Agents handle routine tasks; humans focus on analysis and decisions |
| 3 | Autonomous with Escalation | Full investigation automation; escalation on exceptions |
| 4 | Fully Automated with Audit | End-to-end automation; post-investigation human audits |
Regulatory Compliance
The platform is designed to support compliance with:
| Regulation | Coverage |
|---|---|
| GDPR | Data subject rights, consent management, data minimization |
| CCPA/CPRA | California privacy requirements |
| SOC 2 | Security, availability, processing integrity |
| FCA Requirements | Financial services regulatory compliance |
| Enterprise Data Standards | Industry best practices for data governance |
Contact
For security inquiries or to request additional documentation, please contact the DataFab security team through your account representative.