Stop wrong AI definitions, trace data lineage, and govern data access.
Data Dragon gives data teams an automated workflow for AI-era access: scan the customer warehouse locally, propose semantic contracts, generate governance YAML, require human approval, compile database-level guardrails, and audit every governed query.
"The root cause... is that metric definitions tend to emerge bottom-up from whoever built the first report."
Reddit search result, r/BusinessIntelligence: metric definition pain"Poor data quality costs organizations at least $12.9 million a year on average."
Gartner, Data Quality: Why It MattersProfile tables, columns, rows and sensitive fields in place.
Semantic contracts, governance YAML, lineage contracts and manifest.
Draft YAML cannot change the warehouse until a steward approves.
Compile approved policy into secure views, grants, masks and audit logs.
Why this matters before AI touches business data
Bad definitions and weak governance do not usually fail loudly. They show up as plausible dashboards, plausible SQL and plausible executive answers that are wrong enough to change decisions.
Average annual cost of poor data quality
Gartner says poor data quality costs organizations at least $12.9M per year on average and calls out inconsistency, lack of ownership and weak measurement as common blockers. Source
Bad data damaged model-driven revenue
IBM cites Unity Technologies' 2022 bad-data incident as roughly USD 110M in lost revenue tied to corrupted data used by advertising ML systems. Source
AI scaling is blocked by data trust
IBM reports concerns about data accuracy or bias as a leading barrier to scaling AI initiatives, reported by nearly half of business leaders. Source
Late data issues become expensive
Dataversity explains the 1x/10x/100x rule: defects are cheapest at entry, costlier downstream, and most expensive when they reach decisions. Source
Product modules: the AI data readiness layer
Click each state to see the SDK module, generated artifact and why it exists. Governance is highlighted separately below because it is the database control plane.
Short flow: how database data governance is executed
This is the operational loop for a customer. The data team approves, but Data Dragon automates the policy draft, DDL generation, runtime enforcement, audit trail and lineage evidence.
Company requests governed access
An AI builder or app owner states intent, such as support needing order status or finance needing revenue metrics. Data Dragon maps intent to semantic concepts instead of raw tables.
SDK generates governance YAML
The policy is default-deny. It proposes allowed concepts, denied concepts, masked columns, row scopes, owner notes and approval requirements from the semantic catalog and sensitivity scan.
Data steward approves or edits
Nothing touches the database while the policy is draft. The steward approves the YAML after checking roles, purpose, sensitive fields and business definitions.
SDK enforces in the database
After approval, Data Dragon compiles DDL: roles, secure views, grants, revokes, masks and row filters. Runtime queries are allowed, masked or denied, then logged into audit and lineage.
Semantic contract: what data means
The contract tells tools the approved business definition. It is proposed automatically, then reviewed by a steward.
concept_id: approved_revenue business_question: What is our total revenue? status: draft trust_level: inferred sql: | SELECT SUM(net_revenue) FROM finance.orders WHERE order_status = 'recognized' avoid: - Includes payments for canceled orders. - Includes freight which might not be revenue.
Governance policy: who can use it
The policy maps roles to concepts, sensitive columns, masks, row scopes and approvals. Approval is required before DDL applies.
source_name: prod_warehouse status: approved default: deny roles: - role: finance_analyst concepts: [approved_revenue, approved_aov] mask: [customer_email, phone_number] - role: support_agent concepts: [order_status, customer_lookup] denied_concepts: [approved_revenue]
Runtime enforcement: access changes behavior
This is not prompt-only. The SDK checks policy, routes through role-scoped database access and records an audit event.
# Allowed with masks: finance_analyst -> approved_revenue decision: allow_masked masked_columns: - customer_email - phone_number # Denied: support_agent -> approved_revenue decision: deny reason: role is not granted concept
The install-to-enforce SDK flow
A data team can run this locally against DuckDB/Postgres. The customer data stays in their environment; Data Dragon stores metadata/artifacts locally.
# 1. Install and connect. client = DataDragon() source = client.sources.connect(kind="postgres", postgres_url=os.environ["WAREHOUSE_URL"], name="prod") # 2. Generate semantic, governance and lineage drafts. contracts = client.proof.reason_contracts(source=source, proposer="gemini") policy = client.governance.propose_access_policy(source=source, requested_by="ai-builder") lineage = client.governance.draft_lineage(source=source) # 3. Human approval is the gate. client.governance.approve_access_policy(source_name="prod") # 4. SDK compiles and applies database guardrails. ddl = client.governance.apply_access_policy(source_name="prod", source=source) # 5. Every query can be policy-checked and logged. result = client.governance.execute_governed( source=source, role="finance_analyst", concept_id="approved_revenue", sql="SELECT ...", )
What becomes automatic after approval
- The SDK compiles approved YAML into database DDL, secure views and role grants.
- Agents and apps query approved concepts instead of raw unrestricted tables.
- Sensitive columns are masked at query time through role-scoped views.
- Denied requests are blocked and audit-logged, not silently ignored.
- The lineage graph links query events back to roles, concepts, datasets, jobs and columns.