Data Dragon - Governed Data Access Flow

Local-first SDK for data teams

Stop wrong AI definitions, trace data lineage, and govern data access.

Data Dragon gives data teams an automated workflow for AI-era access: scan the customer warehouse locally, propose semantic contracts, generate governance YAML, require human approval, compile database-level guardrails, and audit every governed query.

Semantic contractsDefine business meaning before AI writes SQL or answers executives.

Lineage graphTrace concepts back to datasets, columns, jobs, policies and query events.

Data access policyMap roles to approved concepts, masks, denials and row scopes.

Runtime enforcementConvert approved YAML into database guardrails and audit logs.

What practitioners complain about

"The root cause... is that metric definitions tend to emerge bottom-up from whoever built the first report."

Reddit search result, r/BusinessIntelligence: metric definition pain

What the market already knows

"Poor data quality costs organizations at least $12.9 million a year on average."

Gartner, Data Quality: Why It Matters

deployed_code What the SDK automates

database

Connect locally
Profile tables, columns, rows and sensitive fields in place.

contract

Generate artifacts
Semantic contracts, governance YAML, lineage contracts and manifest.

verified_user

Approval gate
Draft YAML cannot change the warehouse until a steward approves.

shield_lock

Enforce access
Compile approved policy into secure views, grants, masks and audit logs.

Why this matters before AI touches business data

Bad definitions and weak governance do not usually fail loudly. They show up as plausible dashboards, plausible SQL and plausible executive answers that are wrong enough to change decisions.

$12.9M

Average annual cost of poor data quality

Gartner says poor data quality costs organizations at least $12.9M per year on average and calls out inconsistency, lack of ownership and weak measurement as common blockers. Source

$110M

Bad data damaged model-driven revenue

IBM cites Unity Technologies' 2022 bad-data incident as roughly USD 110M in lost revenue tied to corrupted data used by advertising ML systems. Source

45%

AI scaling is blocked by data trust

IBM reports concerns about data accuracy or bias as a leading barrier to scaling AI initiatives, reported by nearly half of business leaders. Source

100x

Late data issues become expensive

Dataversity explains the 1x/10x/100x rule: defects are cheapest at entry, costlier downstream, and most expensive when they reach decisions. Source

Product modules: the AI data readiness layer

Click each state to see the SDK module, generated artifact and why it exists. Governance is highlighted separately below because it is the database control plane.

Short flow: how database data governance is executed

This is the operational loop for a customer. The data team approves, but Data Dragon automates the policy draft, DDL generation, runtime enforcement, audit trail and lineage evidence.

Company requests governed access

An AI builder or app owner states intent, such as support needing order status or finance needing revenue metrics. Data Dragon maps intent to semantic concepts instead of raw tables.

SDK generates governance YAML

The policy is default-deny. It proposes allowed concepts, denied concepts, masked columns, row scopes, owner notes and approval requirements from the semantic catalog and sensitivity scan.

Data steward approves or edits

Nothing touches the database while the policy is draft. The steward approves the YAML after checking roles, purpose, sensitive fields and business definitions.

SDK enforces in the database

After approval, Data Dragon compiles DDL: roles, secure views, grants, revokes, masks and row filters. Runtime queries are allowed, masked or denied, then logged into audit and lineage.

Semantic contract: what data means

The contract tells tools the approved business definition. It is proposed automatically, then reviewed by a steward.

concept_id: approved_revenue
business_question: What is our total revenue?
status: draft
trust_level: inferred
sql: |
  SELECT SUM(net_revenue)
  FROM finance.orders
  WHERE order_status = 'recognized'
avoid:
  - Includes payments for canceled orders.
  - Includes freight which might not be revenue.

Governance policy: who can use it

The policy maps roles to concepts, sensitive columns, masks, row scopes and approvals. Approval is required before DDL applies.

source_name: prod_warehouse
status: approved
default: deny
roles:
  - role: finance_analyst
    concepts: [approved_revenue, approved_aov]
    mask: [customer_email, phone_number]
  - role: support_agent
    concepts: [order_status, customer_lookup]
    denied_concepts: [approved_revenue]

Runtime enforcement: access changes behavior

This is not prompt-only. The SDK checks policy, routes through role-scoped database access and records an audit event.

# Allowed with masks:
finance_analyst -> approved_revenue
decision: allow_masked
masked_columns:
  - customer_email
  - phone_number

# Denied:
support_agent -> approved_revenue
decision: deny
reason: role is not granted concept

The install-to-enforce SDK flow

A data team can run this locally against DuckDB/Postgres. The customer data stays in their environment; Data Dragon stores metadata/artifacts locally.

# 1. Install and connect.
client = DataDragon()
source = client.sources.connect(kind="postgres", postgres_url=os.environ["WAREHOUSE_URL"], name="prod")

# 2. Generate semantic, governance and lineage drafts.
contracts = client.proof.reason_contracts(source=source, proposer="gemini")
policy = client.governance.propose_access_policy(source=source, requested_by="ai-builder")
lineage = client.governance.draft_lineage(source=source)

# 3. Human approval is the gate.
client.governance.approve_access_policy(source_name="prod")

# 4. SDK compiles and applies database guardrails.
ddl = client.governance.apply_access_policy(source_name="prod", source=source)

# 5. Every query can be policy-checked and logged.
result = client.governance.execute_governed(
    source=source,
    role="finance_analyst",
    concept_id="approved_revenue",
    sql="SELECT ...",
)

What becomes automatic after approval

The SDK compiles approved YAML into database DDL, secure views and role grants.
Agents and apps query approved concepts instead of raw unrestricted tables.
Sensitive columns are masked at query time through role-scoped views.
Denied requests are blocked and audit-logged, not silently ignored.
The lineage graph links query events back to roles, concepts, datasets, jobs and columns.