vtex-io-masterdata

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MasterData v2 Integration

MasterData v2 集成

When this skill applies

本技能适用场景

Use this skill when your VTEX IO app needs to store custom data (reviews, wishlists, form submissions, configuration records), query or filter that data, or set up automated workflows triggered by data changes—and when you must justify Master Data versus other VTEX or external stores.
  • Defining data entities and JSON Schemas using the
    masterdata
    builder
  • Performing CRUD operations through MasterDataClient (
    ctx.clients.masterdata
    )
  • Configuring search, scroll, and indexing for efficient data retrieval
  • Setting up Master Data triggers for automated workflows
  • Managing schema lifecycle to avoid the 60-schema limit
  • Deciding whether data belongs in Catalog (fields, specifications, unstructured SKU/product specs), Master Data, native OMS/checkout surfaces, or an external SQL/NoSQL/database
  • Avoiding synchronous Master Data on the purchase critical path (cart, checkout, payment, placement) unless there is a hard performance and reliability case
  • Preferring one source of truth—avoid duplicating order headers or OMS lists in Master Data for convenience; prefer OMS or a BFF with caching (see Related skills)
Do not use this skill for:
  • General backend service patterns (use
    vtex-io-service-apps
    instead)
  • GraphQL schema definitions (use
    vtex-io-graphql-api
    instead)
  • Manifest and builder configuration (use
    vtex-io-app-structure
    instead)
当你的VTEX IO应用需要存储自定义数据(评价、心愿单、表单提交内容、配置记录)、查询/筛选数据、或是配置数据变更触发的自动化工作流,且你必须论证选择Master Data而非其他VTEX或外部存储的合理性时,可使用本技能。
  • 使用
    masterdata
    构建器定义数据实体和JSON Schema
  • 通过MasterDataClient(
    ctx.clients.masterdata
    )执行CRUD操作
  • 配置搜索、滚动查询和索引以实现高效数据检索
  • 配置Master Data触发器实现自动化工作流
  • 管理Schema生命周期以规避60个Schema的上限限制
  • 决策数据应该存放在Catalog(字段、规格、非结构化SKU/商品规格)、Master Data原生OMS/结账模块还是外部SQL/NoSQL数据库
  • 避免在购买关键路径(购物车、结账、支付、下单)中同步调用Master Data,除非有充分的性能和可靠性论证
  • 优先遵循单一数据源原则——不要为了方便就在Master Data中重复存储订单头或OMS列表;优先使用OMS或带缓存的BFF(参考相关技能
本技能不适用于:
  • 通用后端服务模式(请改用
    vtex-io-service-apps
  • GraphQL Schema定义(请改用
    vtex-io-graphql-api
  • 清单和构建器配置(请改用
    vtex-io-app-structure

Decision rules

决策规则

Before you choose Master Data

选择Master Data之前

Architects, developers, and anyone designing or implementing a solution should think deeply and treat this section as a checklist to critique the default: double-check that Master Data is the right persistence layer—not an automatic pick. The skill is written to question convenience-driven choices.
  • Purpose — Master Data is a document-oriented store (similar in spirit to document DBs / DynamoDB-style access patterns). It is one option among many; choosing it because it is “there” or “cheap” without a workload fit review is a design smell.
  • Product-bound data — If the information is fundamentally about products or SKUs, evaluate Catalog first: specifications, unstructured product/SKU specifications, and native catalog fields before creating a parallel MD entity that mirrors catalog truth.
  • Purchase pathDo not place synchronous Master Data reads/writes in the hot path of checkout (cart mutation, payment, order placement) unless you have evidence (latency budget, failure modes). Prefer native commerce stores and async or after-order enrichment.
  • Orders and listsDuplicating OMS or order data into Master Data to power My Orders or similar usually fights single source of truth. Prefer OMS APIs (or marketplace protocols) behind a BFF or IO layer with application caching and correct HTTP/path semantics—not a second order database in MD “because it is easier.”
  • Exposing MD — Master Data is storage only. Any storefront or partner access should go through a service that enforces authentication, authorization, and rate limits—typically VTEX IO or an external BFF following headless-bff-architecture patterns.
  • When MD fits — After storage fit review, if MD remains appropriate, implement CRUD and schema discipline as below; combine with vtex-io-application-performance and vtex-io-service-paths-and-cdn when exposing HTTP or GraphQL from IO.
架构师、开发者以及任何设计或实现解决方案的人员都应该深入思考,并将本节内容作为质疑默认选择的检查清单:反复确认Master Data是合适的持久层,而非不假思索的选择。本技能的编写目的就是为了质疑出于便利做出的选择。
  • 用途 — Master Data是面向文档的存储(本质上和文档数据库 / DynamoDB风格的访问模式类似)。它只是众多选择之一;如果没有评估工作负载适配性,仅仅因为它“现成”或“低成本”就选用,属于设计坏味道。
  • 商品关联数据 — 如果信息本质上是关于商品或SKU的,优先评估Catalog:先考虑使用规格非结构化商品/SKU规格和原生Catalog字段,不要创建平行的MD实体来镜像Catalog的数据。
  • 购买路径不要结账的热路径(购物车变更、支付、下单)中放置同步的Master Data读写操作,除非你有证据(延迟预算、故障模式分析)支撑这么做。优先使用原生电商存储和异步/下单后补数据的方案。
  • 订单和列表 — 将OMS订单数据重复存入Master Data来支撑「我的订单」或类似功能,通常会违反单一数据源原则。优先在BFFIO层后调用OMS API(或marketplace协议),配合应用缓存和正确的HTTP/路径语义,不要“为了省事”在MD中再建一套订单数据库。
  • 暴露MD能力 — Master Data仅用作存储。任何门店前端合作方的访问都必须经过一层服务来强制执行身份认证授权限流——通常是遵循headless-bff-architecture模式的VTEX IO或外部BFF。
  • MD适配场景 — 经过存储适配性评估后,如果MD仍然是合适的选择,可按照下文实现CRUD和Schema规范;当从IO暴露HTTP或GraphQL接口时,可配合vtex-io-application-performancevtex-io-service-paths-and-cdn使用。

Entity governance and hygiene

实体治理和规范

Before creating a new entity or extending an existing one, understand the landscape:
  • Native entities — The platform manages entities like
    CL
    (clients),
    AD
    (addresses),
    OD
    (orders),
    BK
    (bookmarks),
    AU
    (auth), and others. Never create custom entities that duplicate native entity purposes. Know which entities exist before adding new ones.
  • Entity usage audit — In accounts with dozens of custom entities, classify each by purpose: logs/monitoring, cache/temporary, order extension, customer extension, marketing, CMS/content, integration/sync, auth/identity, logistics/geo, or custom business logic. Entities in the logs or cache categories often indicate misuse—IO app logs belong in the logger, not MD; caches belong in VBase, not MD.
  • Critical path flag — Identify whether an entity is used in checkout, cart, payment, or login flows. Entities on the critical path must meet strict latency and availability requirements. If an MD entity is on the critical path, question whether it should be there at all.
  • Document count awareness — Use
    REST-Content-Range
    headers from
    GET /search?_fields=id
    with
    REST-Range: resources=0-0
    to efficiently count documents without fetching them. Large entities (100k+ docs) need
    scrollDocuments
    , pagination strategy, and potentially a BFF caching layer.
在创建新实体或扩展现有实体之前,先了解现有情况:
  • 原生实体 — 平台会管理
    CL
    (客户)、
    AD
    (地址)、
    OD
    (订单)、
    BK
    (书签)、
    AU
    (认证)等实体。绝对不要创建和原生实体用途重复的自定义实体,添加新实体之前先了解已有哪些实体。
  • 实体使用审计 — 存在数十个自定义实体的账号中,要按用途对每个实体分类:日志/监控缓存/临时订单扩展客户扩展营销CMS/内容集成/同步认证/身份物流/地理或是自定义业务逻辑。归属于日志缓存类别的实体通常说明存在误用:IO应用日志应该存在日志器中,而不是MD;缓存应该存在VBase中,而不是MD。
  • 关键路径标记 — 识别实体是否用于结账购物车支付登录流程。关键路径上的实体必须满足严格的延迟和可用性要求。如果某个MD实体在关键路径上,要质疑它是否真的应该放在这里。
  • 文档数量感知 — 使用
    GET /search?_fields=id
    搭配
    REST-Range: resources=0-0
    返回的
    REST-Content-Range
    头,可以在不获取文档的情况下高效统计文档数量。大型实体(10万+文档)需要使用
    scrollDocuments
    、分页策略,可能还需要BFF缓存层。

Bulk operations and data migration

批量操作和数据迁移

When importing, exporting, or migrating large datasets:
  • Validate before import — Cross-reference import data against the authoritative source (e.g. catalog export for SKU validation, CL entity or user management API for email allowlists). Produce exception reports for invalid rows before touching MD.
  • JSONL payloads — Generate one JSON object per MD document in a
    .jsonl
    file for bulk imports. This enables resumable, line-by-line processing.
  • Rate limiting — MD APIs enforce rate limits. Use configurable delays between calls (e.g. 400ms) with exponential backoff on HTTP 429 responses.
  • Checkpoints — For large imports (10k+ documents), persist progress to a checkpoint file (last successful document ID or line index). On failure or timeout, resume from the checkpoint instead of restarting.
  • Parallel with bounded concurrency — Use a concurrency pool (e.g.
    p-queue
    with concurrency 5-10) for parallel
    POST
    or
    PATCH
    operations. Too much parallelism triggers rate limits; too little is slow.
  • Bulk delete before re-import — When replacing all documents in an entity, use scroll + delete before import, or implement a separate delete pass with the same checkpoint and backoff patterns.
  • Schema alignment — Ensure import payloads match the entity's JSON Schema exactly. Missing required fields or type mismatches cause silent validation failures.
导入、导出或迁移大型数据集时:
  • 导入前校验 — 将导入数据和权威源交叉校验(例如用Catalog导出数据做SKU校验,用CL实体或用户管理API做邮箱白名单校验)。在操作MD之前生成无效行的异常报告。
  • JSONL载荷 — 为每个MD文档生成一个JSON对象,存为
    .jsonl
    文件用于批量导入,这样可以实现逐行处理、中断后续传。
  • 限流 — MD API有限流规则。请求之间要配置可调整的延迟(例如400ms),遇到HTTP 429响应时使用指数退避策略。
  • 检查点 — 对于大型导入(1万+文档),要将进度持久化到检查点文件(最后成功的文档ID或行索引)。遇到故障或超时的时候,从检查点恢复而不是从头开始。
  • 有限并发并行处理 — 使用并发池(例如并发数为5-10的
    p-queue
    )来并行执行
    POST
    PATCH
    操作。并发太高会触发限流,太低则速度太慢。
  • 重导入前批量删除 — 替换实体中所有文档时,导入前先用滚动查询+删除的方式清理,或者用同样的检查点和退避模式单独执行删除步骤。
  • Schema对齐 — 确保导入载荷和实体的JSON Schema完全匹配。缺失必填字段或类型不匹配会导致静默的校验失败。

Implementation rules

实现规则

  • A data entity is a named collection of documents (analogous to a database table). A JSON Schema defines structure, validation, and indexing.
  • When using the
    masterdata
    builder, entities are defined by folder structure:
    masterdata/{entityName}/schema.json
    . The builder creates entities named
    {vendor}_{appName}_{entityName}
    .
  • Use
    ctx.clients.masterdata
    or
    masterDataFor
    from
    @vtex/clients
    for all CRUD operations — never direct REST calls.
  • All fields used in
    where
    clauses MUST be declared in the schema's
    v-indexed
    array for efficient querying.
  • Use
    searchDocuments
    for bounded result sets (known small size, max page size 100). Use
    scrollDocuments
    for large/unbounded result sets.
  • The
    masterdata
    builder creates a new schema per app version. Clean up unused schemas to avoid the 60-schema-per-entity hard limit.
MasterDataClient methods:
MethodDescription
getDocument
Retrieve a single document by ID
createDocument
Create a new document, returns generated ID
createOrUpdateEntireDocument
Upsert a complete document
createOrUpdatePartialDocument
Upsert partial fields (patch)
updateEntireDocument
Replace all fields of an existing document
updatePartialDocument
Update specific fields only
deleteDocument
Delete a document by ID
searchDocuments
Search with filters, pagination, and field selection
searchDocumentsWithPaginationInfo
Search with total count metadata
scrollDocuments
Iterate over large result sets
Search
where
clause syntax:
text
where: "productId=12345 AND approved=true"
where: "rating>3"
where: "createdAt between 2025-01-01 AND 2025-12-31"
Architecture:
text
VTEX IO App (node builder)
  ├── ctx.clients.masterdata.createDocument()
  │       │
  │       ▼
  │   Master Data v2 API
  │       │
  │       ├── Validates against JSON Schema
  │       ├── Indexes declared fields
  │       └── Fires triggers (if conditions match)
  │             │
  │             ▼
  │         HTTP webhook / Email / Action
  └── ctx.clients.masterdata.searchDocuments()
      Master Data v2 (reads indexed fields for efficient queries)
数据实体是命名的文档集合(类似数据库表)。JSON Schema定义结构、校验规则和索引。
  • 使用
    masterdata
    构建器时,实体通过文件夹结构定义:
    masterdata/{entityName}/schema.json
    。构建器会创建名为
    {vendor}_{appName}_{entityName}
    的实体。
  • 所有CRUD操作都要使用
    ctx.clients.masterdata
    或者
    @vtex/clients
    导出的
    masterDataFor
    ——绝对不要直接调用REST接口。
  • 所有在
    where
    语句中使用的字段必须在Schema的
    v-indexed
    数组中声明,才能实现高效查询。
  • 结果集大小可控(已知规模很小,最大页大小100)时使用
    searchDocuments
    。大型/无边界结果集使用
    scrollDocuments
  • masterdata
    构建器会为每个应用版本创建一个新Schema。清理未使用的Schema以避免单实体最多60个Schema的硬限制。
MasterDataClient方法:
方法说明
getDocument
通过ID获取单个文档
createDocument
创建新文档,返回生成的ID
createOrUpdateEntireDocument
全量 Upsert 文档
createOrUpdatePartialDocument
部分字段 Upsert(patch)
updateEntireDocument
替换现有文档的所有字段
updatePartialDocument
仅更新指定字段
deleteDocument
通过ID删除文档
searchDocuments
支持筛选、分页和字段选择的搜索
searchDocumentsWithPaginationInfo
带总计数元数据的搜索
scrollDocuments
遍历大型结果集
搜索
where
语句语法:
text
where: "productId=12345 AND approved=true"
where: "rating>3"
where: "createdAt between 2025-01-01 AND 2025-12-31"
架构:
text
VTEX IO App (node builder)
  ├── ctx.clients.masterdata.createDocument()
  │       │
  │       ▼
  │   Master Data v2 API
  │       │
  │       ├── Validates against JSON Schema
  │       ├── Indexes declared fields
  │       └── Fires triggers (if conditions match)
  │             │
  │             ▼
  │         HTTP webhook / Email / Action
  └── ctx.clients.masterdata.searchDocuments()
      Master Data v2 (reads indexed fields for efficient queries)

Hard constraints

硬约束

Constraint: Use MasterDataClient — Never Direct REST Calls

约束:必须使用MasterDataClient,禁止直接调用REST接口

All Master Data operations in VTEX IO apps MUST go through the MasterDataClient (
ctx.clients.masterdata
) or the
masterDataFor
factory from
@vtex/clients
. You MUST NOT make direct REST calls to
/api/dataentities/
endpoints.
Why this matters
The MasterDataClient handles authentication token injection, request routing, retry logic, caching, and proper error handling. Direct REST calls bypass all of these, requiring manual auth headers, pagination, and retry logic. When the VTEX auth token format changes, direct calls break while the client handles it transparently.
Detection
If you see direct HTTP calls to URLs matching
/api/dataentities/
,
api.vtex.com/api/dataentities
, or raw fetch/axios calls targeting Master Data endpoints, warn the developer to use
ctx.clients.masterdata
instead.
Correct
typescript
// Using MasterDataClient through ctx.clients
export async function getReview(ctx: Context, next: () => Promise<void>) {
  const { id } = ctx.query;

  const review = await ctx.clients.masterdata.getDocument<Review>({
    dataEntity: "reviews",
    id: id as string,
    fields: [
      "id",
      "productId",
      "author",
      "rating",
      "title",
      "text",
      "approved",
    ],
  });

  ctx.status = 200;
  ctx.body = review;
  await next();
}
Wrong
typescript
// Direct REST call to Master Data — bypasses client infrastructure
import axios from "axios";

export async function getReview(ctx: Context, next: () => Promise<void>) {
  const { id } = ctx.query;

  // No caching, no retry, no proper auth, no metrics
  const response = await axios.get(
    `https://api.vtex.com/api/dataentities/reviews/documents/${id}`,
    {
      headers: {
        "X-VTEX-API-AppKey": process.env.VTEX_APP_KEY,
        "X-VTEX-API-AppToken": process.env.VTEX_APP_TOKEN,
      },
    },
  );

  ctx.status = 200;
  ctx.body = response.data;
  await next();
}

VTEX IO应用中所有Master Data操作必须通过MasterDataClient(
ctx.clients.masterdata
)或
@vtex/clients
导出的
masterDataFor
工厂方法执行。禁止直接调用
/api/dataentities/
端点的REST接口。
重要性 MasterDataClient会自动处理身份认证token注入、请求路由、重试逻辑、缓存和正确的错误处理。直接调用REST接口会绕过所有这些能力,需要手动处理auth头、分页和重试逻辑。当VTEX auth token格式变更时,直接调用的接口会失效,而客户端会透明处理这种变更。
检测方式 如果看到直接调用匹配
/api/dataentities/
api.vtex.com/api/dataentities
的URL,或是指向Master Data端点的原生fetch/axios调用,要提醒开发者改用
ctx.clients.masterdata
正确示例
typescript
// Using MasterDataClient through ctx.clients
export async function getReview(ctx: Context, next: () => Promise<void>) {
  const { id } = ctx.query;

  const review = await ctx.clients.masterdata.getDocument<Review>({
    dataEntity: "reviews",
    id: id as string,
    fields: [
      "id",
      "productId",
      "author",
      "rating",
      "title",
      "text",
      "approved",
    ],
  });

  ctx.status = 200;
  ctx.body = review;
  await next();
}
错误示例
typescript
// Direct REST call to Master Data — bypasses client infrastructure
import axios from "axios";

export async function getReview(ctx: Context, next: () => Promise<void>) {
  const { id } = ctx.query;

  // No caching, no retry, no proper auth, no metrics
  const response = await axios.get(
    `https://api.vtex.com/api/dataentities/reviews/documents/${id}`,
    {
      headers: {
        "X-VTEX-API-AppKey": process.env.VTEX_APP_KEY,
        "X-VTEX-API-AppToken": process.env.VTEX_APP_TOKEN,
      },
    },
  );

  ctx.status = 200;
  ctx.body = response.data;
  await next();
}

Constraint: Define JSON Schemas for All Data Entities

约束:所有数据实体都必须定义JSON Schema

Every data entity your app uses MUST have a corresponding JSON Schema, either via the
masterdata
builder (recommended) or created via the Master Data API before the app is deployed.
Why this matters
Without a schema, Master Data stores documents as unstructured JSON. This means no field validation, no indexing (making search extremely slow on large datasets), no type safety, and no trigger support. Queries on unindexed fields perform full scans, which can time out or hit rate limits.
Detection
If the app creates or searches documents in a data entity but no JSON Schema exists for that entity (either in the
masterdata/
builder directory or via API), warn the developer to define a schema.
Correct
json
{
  "$schema": "http://json-schema.org/schema#",
  "title": "review-schema-v1",
  "type": "object",
  "properties": {
    "productId": {
      "type": "string"
    },
    "author": {
      "type": "string"
    },
    "rating": {
      "type": "integer",
      "minimum": 1,
      "maximum": 5
    },
    "title": {
      "type": "string",
      "maxLength": 200
    },
    "text": {
      "type": "string",
      "maxLength": 5000
    },
    "approved": {
      "type": "boolean"
    },
    "createdAt": {
      "type": "string",
      "format": "date-time"
    }
  },
  "required": ["productId", "rating", "title", "text"],
  "v-default-fields": [
    "productId",
    "author",
    "rating",
    "title",
    "approved",
    "createdAt"
  ],
  "v-indexed": ["productId", "author", "approved", "rating", "createdAt"]
}
Wrong
typescript
// Saving documents without any schema — no validation, no indexing
await ctx.clients.masterdata.createDocument({
  dataEntity: "reviews",
  fields: {
    productId: "12345",
    rating: "five", // String instead of number — no validation!
    title: 123, // Number instead of string — no validation!
  },
});

// Searching on unindexed fields — full table scan, will time out on large datasets
await ctx.clients.masterdata.searchDocuments({
  dataEntity: "reviews",
  where: "productId=12345", // productId is not indexed — very slow
  fields: ["id", "rating"],
  pagination: { page: 1, pageSize: 10 },
});

应用使用的每个数据实体必须有对应的JSON Schema,可以通过
masterdata
构建器定义(推荐),也可以在应用部署前通过Master Data API创建。
重要性 没有Schema的话,Master Data会将文档存储为非结构化JSON,意味着没有字段校验、没有索引(会导致大型数据集上的搜索极慢)、没有类型安全,也不支持触发器。对未索引字段的查询会执行全表扫描,可能超时或触达限流阈值。
检测方式 如果应用在某个数据实体中创建或搜索文档,但该实体没有对应的JSON Schema(无论是在
masterdata/
构建器目录下还是通过API创建),要提醒开发者定义Schema。
正确示例
json
{
  "$schema": "http://json-schema.org/schema#",
  "title": "review-schema-v1",
  "type": "object",
  "properties": {
    "productId": {
      "type": "string"
    },
    "author": {
      "type": "string"
    },
    "rating": {
      "type": "integer",
      "minimum": 1,
      "maximum": 5
    },
    "title": {
      "type": "string",
      "maxLength": 200
    },
    "text": {
      "type": "string",
      "maxLength": 5000
    },
    "approved": {
      "type": "boolean"
    },
    "createdAt": {
      "type": "string",
      "format": "date-time"
    }
  },
  "required": ["productId", "rating", "title", "text"],
  "v-default-fields": [
    "productId",
    "author",
    "rating",
    "title",
    "approved",
    "createdAt"
  ],
  "v-indexed": ["productId", "author", "approved", "rating", "createdAt"]
}
错误示例
typescript
// Saving documents without any schema — no validation, no indexing
await ctx.clients.masterdata.createDocument({
  dataEntity: "reviews",
  fields: {
    productId: "12345",
    rating: "five", // String instead of number — no validation!
    title: 123, // Number instead of string — no validation!
  },
});

// Searching on unindexed fields — full table scan, will time out on large datasets
await ctx.clients.masterdata.searchDocuments({
  dataEntity: "reviews",
  where: "productId=12345", // productId is not indexed — very slow
  fields: ["id", "rating"],
  pagination: { page: 1, pageSize: 10 },
});

Constraint: Manage Schema Versions to Avoid the 60-Schema Limit

约束:管理Schema版本以避免60个Schema的上限

Master Data v2 data entities have a limit of 60 schemas per entity. When using the
masterdata
builder, each app version linked or installed creates a new schema. You MUST delete unused schemas regularly.
Why this matters
Once the 60-schema limit is reached, the
masterdata
builder cannot create new schemas, and linking or installing new app versions will fail. This is a hard platform limit that cannot be increased.
Detection
If the app has been through many link/install cycles, warn the developer to check and clean up old schemas using the Delete Schema API.
Correct
bash
undefined
Master Data v2的每个数据实体最多支持60个Schema。使用
masterdata
构建器时,每次链接或安装应用版本都会创建一个新Schema,你必须定期删除未使用的Schema。
重要性 一旦达到60个Schema的上限,
masterdata
构建器就无法创建新Schema,链接或安装新版本应用会失败。这是平台硬限制,无法提升。
检测方式 如果应用已经经过多次链接/安装周期,要提醒开发者检查并通过删除Schema API清理旧Schema。
正确示例
bash
undefined

Periodically clean up unused schemas

Periodically clean up unused schemas

List schemas for the entity

List schemas for the entity

curl -X GET "https://{account}.vtexcommercestable.com.br/api/dataentities/reviews/schemas"
-H "X-VTEX-API-AppKey: {appKey}"
-H "X-VTEX-API-AppToken: {appToken}"
curl -X GET "https://{account}.vtexcommercestable.com.br/api/dataentities/reviews/schemas"
-H "X-VTEX-API-AppKey: {appKey}"
-H "X-VTEX-API-AppToken: {appToken}"

Delete old schemas that are no longer in use

Delete old schemas that are no longer in use

curl -X DELETE "https://{account}.vtexcommercestable.com.br/api/dataentities/reviews/schemas/old-schema-name"
-H "X-VTEX-API-AppKey: {appKey}"
-H "X-VTEX-API-AppToken: {appToken}"

**Wrong**

```text
Never cleaning up schemas during development.
After 60 link cycles, the builder fails:
"Error: Maximum number of schemas reached for entity 'reviews'"
The app cannot be linked or installed until old schemas are deleted.
curl -X DELETE "https://{account}.vtexcommercestable.com.br/api/dataentities/reviews/schemas/old-schema-name"
-H "X-VTEX-API-AppKey: {appKey}"
-H "X-VTEX-API-AppToken: {appToken}"

**错误示例**
```text
Never cleaning up schemas during development.
After 60 link cycles, the builder fails:
"Error: Maximum number of schemas reached for entity 'reviews'"
The app cannot be linked or installed until old schemas are deleted.

Constraint: Do not use Master Data as a log, cache, or temporary store

约束:不要将Master Data用作日志、缓存或临时存储

Entities used for application logging, caching (IO app state, query results), or temporary staging data do not belong in Master Data. Use
ctx.vtex.logger
for logs, VBase for app-specific caches and temp state, and external log aggregation for audit trails.
Why this matters — Log and cache entities accumulate millions of documents, hit rate limits, make the entity unusable for legitimate queries, and waste storage. MD is not designed for high-write, high-volume, disposable data.
Detection — Entities with names like
LOG
,
cache
,
temp
,
staging
,
debug
, or entities whose document count grows unboundedly with traffic volume rather than business events.
Correct
typescript
// Logs: use structured logger
ctx.vtex.logger.info({ action: 'priceUpdate', skuId, newPrice })

// Cache: use VBase
await ctx.clients.vbase.saveJSON('my-cache', cacheKey, data)
Wrong
typescript
// Using MD as a log store — creates millions of documents
await ctx.clients.masterdata.createDocument({
  dataEntity: 'appLogs',
  fields: { level: 'info', message: `Price updated for ${skuId}`, timestamp: new Date() },
})
用于应用日志缓存(IO应用状态、查询结果)或临时 staging数据的实体不应该存在Master Data中。日志使用
ctx.vtex.logger
,应用专属缓存和临时状态使用VBase,审计链路使用外部日志聚合服务。
重要性 — 日志和缓存实体会累积数百万文档,触达限流阈值,导致实体无法用于正常查询,浪费存储。MD并非为高写入、高容量、可丢弃数据设计。
检测方式 — 名为
LOG
cache
temp
staging
debug
的实体,或是文档数量随流量而非业务事件无限制增长的实体。
正确示例
typescript
// Logs: use structured logger
ctx.vtex.logger.info({ action: 'priceUpdate', skuId, newPrice })

// Cache: use VBase
await ctx.clients.vbase.saveJSON('my-cache', cacheKey, data)
错误示例
typescript
// Using MD as a log store — creates millions of documents
await ctx.clients.masterdata.createDocument({
  dataEntity: 'appLogs',
  fields: { level: 'info', message: `Price updated for ${skuId}`, timestamp: new Date() },
})

Constraint: Do not create a parallel source of truth in Master Data without justification

约束:没有合理理由不要在Master Data中创建平行数据源

Using Master Data to mirror data that already has a system of record in OMS, Catalog, or an external ERP—for example order headers for a custom list view, or SKU attributes that belong in catalog specifications—creates drift, reconciliation cost, and incident risk.
Why this matters
Two sources of truth disagree after partial failures, retries, or manual edits. Teams spend capacity syncing and debugging instead of customer outcomes.
Detection
New MD entities whose fields duplicate OMS order fields “for performance” without a BFF cache plan; product attributes stored in MD when Catalog specs would suffice; scheduled jobs to “fix” MD from OMS because they diverged.
Correct
text
1. Identify the authoritative system (OMS, Catalog, partner API).
2. Read from that source via BFF or IO, with caching (application + HTTP semantics) as needed.
3. Use MD only for data without a native home or after explicit architecture sign-off.
Wrong
text
"We store order snapshots in MD so the storefront is faster" while OMS remains canonical
and no reconciliation strategy exists — eventual inconsistency is guaranteed.
如果数据已经在OMSCatalog或外部ERP中有记录系统,却用Master Data来镜像这些数据——例如为自定义列表视图存储订单头,或是存储本应该属于Catalog规格的SKU属性——会导致数据漂移、对账成本和故障风险。
重要性 发生部分失败、重试或手动编辑后,两个数据源的数据会不一致,团队需要花费精力同步和调试,而不是聚焦于客户价值交付。
检测方式 新创建的MD实体的字段重复了OMS订单字段,号称是“为了性能”却没有BFF缓存方案;商品属性存储在MD中,而Catalog规格完全可以满足需求;需要定时任务从OMS“修复”MD的数据,因为两者已经不一致。
正确示例
text
1. Identify the authoritative system (OMS, Catalog, partner API).
2. Read from that source via BFF or IO, with caching (application + HTTP semantics) as needed.
3. Use MD only for data without a native home or after explicit architecture sign-off.
错误示例
text
"We store order snapshots in MD so the storefront is faster" while OMS remains canonical
and no reconciliation strategy exists — eventual inconsistency is guaranteed.

Preferred pattern

推荐模式

Add the masterdata builder and policies:
json
{
  "builders": {
    "node": "7.x",
    "graphql": "1.x",
    "masterdata": "1.x"
  },
  "policies": [
    {
      "name": "outbound-access",
      "attrs": {
        "host": "api.vtex.com",
        "path": "/api/*"
      }
    },
    {
      "name": "ADMIN_DS"
    }
  ]
}
Define data entity schemas:
json
{
  "$schema": "http://json-schema.org/schema#",
  "title": "review-schema-v1",
  "type": "object",
  "properties": {
    "productId": {
      "type": "string"
    },
    "author": {
      "type": "string"
    },
    "email": {
      "type": "string",
      "format": "email"
    },
    "rating": {
      "type": "integer",
      "minimum": 1,
      "maximum": 5
    },
    "title": {
      "type": "string",
      "maxLength": 200
    },
    "text": {
      "type": "string",
      "maxLength": 5000
    },
    "approved": {
      "type": "boolean"
    },
    "createdAt": {
      "type": "string",
      "format": "date-time"
    }
  },
  "required": ["productId", "rating", "title", "text"],
  "v-default-fields": [
    "productId",
    "author",
    "rating",
    "title",
    "approved",
    "createdAt"
  ],
  "v-indexed": ["productId", "author", "approved", "rating", "createdAt"],
  "v-cache": false
}
Set up the client with
masterDataFor
:
typescript
// node/clients/index.ts
import { IOClients } from "@vtex/api";
import { masterDataFor } from "@vtex/clients";

interface Review {
  id: string;
  productId: string;
  author: string;
  email: string;
  rating: number;
  title: string;
  text: string;
  approved: boolean;
  createdAt: string;
}

export class Clients extends IOClients {
  public get reviews() {
    return this.getOrSet("reviews", masterDataFor<Review>("reviews"));
  }
}
Implement CRUD operations:
typescript
// node/resolvers/reviews.ts
import type { ServiceContext } from "@vtex/api";
import type { Clients } from "../clients";

type Context = ServiceContext<Clients>;

export const queries = {
  reviews: async (
    _root: unknown,
    args: { productId: string; page?: number; pageSize?: number },
    ctx: Context,
  ) => {
    const { productId, page = 1, pageSize = 10 } = args;

    const results = await ctx.clients.reviews.search(
      { page, pageSize },
      [
        "id",
        "productId",
        "author",
        "rating",
        "title",
        "text",
        "createdAt",
        "approved",
      ],
      "", // sort
      `productId=${productId} AND approved=true`,
    );

    return results;
  },
};

export const mutations = {
  createReview: async (
    _root: unknown,
    args: {
      input: { productId: string; rating: number; title: string; text: string };
    },
    ctx: Context,
  ) => {
    const { input } = args;
    const email = ctx.vtex.storeUserEmail ?? "anonymous@store.com";

    const response = await ctx.clients.reviews.save({
      ...input,
      author: email.split("@")[0],
      email,
      approved: false,
      createdAt: new Date().toISOString(),
    });

    return ctx.clients.reviews.get(response.DocumentId, [
      "id",
      "productId",
      "author",
      "rating",
      "title",
      "text",
      "createdAt",
      "approved",
    ]);
  },

  deleteReview: async (_root: unknown, args: { id: string }, ctx: Context) => {
    await ctx.clients.reviews.delete(args.id);
    return true;
  },
};
Configure triggers (optional):
json
{
  "name": "notify-moderator-on-new-review",
  "active": true,
  "condition": "approved=false",
  "action": {
    "type": "email",
    "provider": "default",
    "subject": "New review pending moderation",
    "to": ["moderator@mystore.com"],
    "body": "A new review has been submitted for product {{productId}} by {{author}}."
  },
  "retry": {
    "times": 3,
    "delay": { "addMinutes": 5 }
  }
}
Wire into Service:
typescript
// node/index.ts
import type { ParamsContext, RecorderState } from "@vtex/api";
import { Service } from "@vtex/api";

import { Clients } from "./clients";
import { queries, mutations } from "./resolvers/reviews";

export default new Service<Clients, RecorderState, ParamsContext>({
  clients: {
    implementation: Clients,
    options: {
      default: {
        retries: 2,
        timeout: 5000,
      },
    },
  },
  graphql: {
    resolvers: {
      Query: queries,
      Mutation: mutations,
    },
  },
});
添加masterdata构建器和权限:
json
{
  "builders": {
    "node": "7.x",
    "graphql": "1.x",
    "masterdata": "1.x"
  },
  "policies": [
    {
      "name": "outbound-access",
      "attrs": {
        "host": "api.vtex.com",
        "path": "/api/*"
      }
    },
    {
      "name": "ADMIN_DS"
    }
  ]
}
定义数据实体Schema:
json
{
  "$schema": "http://json-schema.org/schema#",
  "title": "review-schema-v1",
  "type": "object",
  "properties": {
    "productId": {
      "type": "string"
    },
    "author": {
      "type": "string"
    },
    "email": {
      "type": "string",
      "format": "email"
    },
    "rating": {
      "type": "integer",
      "minimum": 1,
      "maximum": 5
    },
    "title": {
      "type": "string",
      "maxLength": 200
    },
    "text": {
      "type": "string",
      "maxLength": 5000
    },
    "approved": {
      "type": "boolean"
    },
    "createdAt": {
      "type": "string",
      "format": "date-time"
    }
  },
  "required": ["productId", "rating", "title", "text"],
  "v-default-fields": [
    "productId",
    "author",
    "rating",
    "title",
    "approved",
    "createdAt"
  ],
  "v-indexed": ["productId", "author", "approved", "rating", "createdAt"],
  "v-cache": false
}
masterDataFor
配置客户端:
typescript
// node/clients/index.ts
import { IOClients } from "@vtex/api";
import { masterDataFor } from "@vtex/clients";

interface Review {
  id: string;
  productId: string;
  author: string;
  email: string;
  rating: number;
  title: string;
  text: string;
  approved: boolean;
  createdAt: string;
}

export class Clients extends IOClients {
  public get reviews() {
    return this.getOrSet("reviews", masterDataFor<Review>("reviews"));
  }
}
实现CRUD操作:
typescript
// node/resolvers/reviews.ts
import type { ServiceContext } from "@vtex/api";
import type { Clients } from "../clients";

type Context = ServiceContext<Clients>;

export const queries = {
  reviews: async (
    _root: unknown,
    args: { productId: string; page?: number; pageSize?: number },
    ctx: Context,
  ) => {
    const { productId, page = 1, pageSize = 10 } = args;

    const results = await ctx.clients.reviews.search(
      { page, pageSize },
      [
        "id",
        "productId",
        "author",
        "rating",
        "title",
        "text",
        "createdAt",
        "approved",
      ],
      "", // sort
      `productId=${productId} AND approved=true`,
    );

    return results;
  },
};

export const mutations = {
  createReview: async (
    _root: unknown,
    args: {
      input: { productId: string; rating: number; title: string; text: string };
    },
    ctx: Context,
  ) => {
    const { input } = args;
    const email = ctx.vtex.storeUserEmail ?? "anonymous@store.com";

    const response = await ctx.clients.reviews.save({
      ...input,
      author: email.split("@")[0],
      email,
      approved: false,
      createdAt: new Date().toISOString(),
    });

    return ctx.clients.reviews.get(response.DocumentId, [
      "id",
      "productId",
      "author",
      "rating",
      "title",
      "text",
      "createdAt",
      "approved",
    ]);
  },

  deleteReview: async (_root: unknown, args: { id: string }, ctx: Context) => {
    await ctx.clients.reviews.delete(args.id);
    return true;
  },
};
配置触发器(可选):
json
{
  "name": "notify-moderator-on-new-review",
  "active": true,
  "condition": "approved=false",
  "action": {
    "type": "email",
    "provider": "default",
    "subject": "New review pending moderation",
    "to": ["moderator@mystore.com"],
    "body": "A new review has been submitted for product {{productId}} by {{author}}."
  },
  "retry": {
    "times": 3,
    "delay": { "addMinutes": 5 }
  }
}
集成到Service中:
typescript
// node/index.ts
import type { ParamsContext, RecorderState } from "@vtex/api";
import { Service } from "@vtex/api";

import { Clients } from "./clients";
import { queries, mutations } from "./resolvers/reviews";

export default new Service<Clients, RecorderState, ParamsContext>({
  clients: {
    implementation: Clients,
    options: {
      default: {
        retries: 2,
        timeout: 5000,
      },
    },
  },
  graphql: {
    resolvers: {
      Query: queries,
      Mutation: mutations,
    },
  },
});

Common failure modes

常见故障模式

  • Direct REST calls to /api/dataentities/: Using
    axios
    or
    fetch
    to call Master Data endpoints bypasses the client infrastructure — no auth, no caching, no retries. Use
    ctx.clients.masterdata
    or
    masterDataFor
    instead.
  • Searching without indexed fields: Queries on non-indexed fields trigger full document scans. For large datasets, this causes timeouts and rate limit errors. Ensure all
    where
    clause fields are in the schema's
    v-indexed
    array.
  • Not paginating search results: Master Data v2 has a maximum page size of 100 documents. Requesting more silently returns only up to the limit. Use proper pagination or
    scrollDocuments
    for large result sets.
  • Ignoring the 60-schema limit: Each app version linked/installed creates a new schema. After 60 link cycles, the builder fails. Periodically clean up unused schemas via the Delete Schema API.
  • Using MD for logs or caches: Entities that grow with traffic volume instead of business events. Millions of log or cache documents degrade the account's MD performance.
  • Bulk import without rate limiting: Flooding MD with parallel writes triggers 429 errors and account-wide throttling. Always use bounded concurrency with backoff.
  • Import without validation: Importing data without cross-referencing the catalog or user store leads to orphaned documents, broken references, and data that fails schema validation silently.
  • No checkpoint in bulk operations: A 50k-document import that fails at document 30k must restart from zero without a checkpoint file.
  • 直接调用
    /api/dataentities/
    REST接口
    :使用
    axios
    fetch
    调用Master Data端点会绕过客户端基础设施——没有认证、没有缓存、没有重试。请改用
    ctx.clients.masterdata
    masterDataFor
  • 未用索引字段搜索:对非索引字段的查询会触发全文档扫描,大型数据集下会导致超时和限流错误。确保所有
    where
    语句的字段都在Schema的
    v-indexed
    数组中。
  • 搜索结果未分页:Master Data v2最大页大小为100条文档,请求更多只会静默返回最多100条。要使用正确的分页,大型结果集使用
    scrollDocuments
  • 忽略60个Schema的上限:每次链接/安装应用版本都会创建一个新Schema,60次链接后构建器就会失败。要定期通过删除Schema API清理未使用的Schema。
  • 用MD存储日志或缓存:这类实体的规模随流量而非业务事件增长,数百万的日志或缓存文档会降低账号的MD性能。
  • 批量导入不限流:大量并行写入MD会触发429错误和账号级别的限流,必须使用带退避策略的有限并发。
  • 导入前不校验:没有和Catalog或用户存储交叉校验就导入数据,会导致孤立文档、无效引用,以及静默的Schema校验失败。
  • 批量操作没有检查点:如果没有检查点文件,一个导入5万条文档的任务在第3万条失败后必须从头开始。

Review checklist

审核检查清单

  • Is the
    masterdata
    builder declared in
    manifest.json
    ?
  • Do all data entities have JSON Schemas with proper field definitions?
  • Are all
    where
    clause fields declared in
    v-indexed
    ?
  • Are CRUD operations using
    ctx.clients.masterdata
    or
    masterDataFor
    (no direct REST calls)?
  • Is pagination properly handled (max 100 per page, scroll for large sets)?
  • Is there a plan for schema cleanup to avoid the 60-schema limit?
  • Are required policies (
    outbound-access
    ,
    ADMIN_DS
    ) declared in the manifest?
  • Was Catalog or native stores ruled in/out before MD for product or order data?
  • Is MD off the purchase critical path unless explicitly justified?
  • If exposing MD externally, is access through a controlled IO/BFF layer with auth?
  • Is the entity not being used for logging, caching, or temporary data?
  • For bulk operations, are rate limiting, backoff, and checkpoints implemented?
  • Is import data validated against the authoritative source before writing to MD?
  • Are native entities (CL, AD, OD, etc.) identified and not duplicated by custom entities?
  • manifest.json
    中是否声明了
    masterdata
    构建器?
  • 所有数据实体都有JSON Schema和正确的字段定义吗?
  • 所有
    where
    语句的字段都在
    v-indexed
    中声明了吗?
  • CRUD操作是否使用了
    ctx.clients.masterdata
    masterDataFor
    (没有直接调用REST接口)?
  • 是否正确处理了分页(每页最多100条,大型集合用滚动查询)?
  • 是否有Schema清理计划来避免60个Schema的上限?
  • 清单中是否声明了必要的权限(
    outbound-access
    ADMIN_DS
    )?
  • 存储商品或订单数据之前,是否评估过Catalog或原生存储是否适用?
  • 除非有明确理由,否则MD是否不在购买关键路径上?
  • 如果对外暴露MD能力,访问是否经过带认证的受控IO/BFF层?
  • 实体没有被用作日志、缓存或临时数据存储吗?
  • 批量操作是否实现了限流、退避和检查点?
  • 写入MD之前,导入数据是否和权威源做过校验?
  • 识别到原生实体(CL、AD、OD等),没有用自定义实体重复实现它们的功能吗?

Related skills

相关技能

  • vtex-io-application-performance — IO performance patterns (cache layers, BFF-facing behavior)
  • vtex-io-service-paths-and-cdn — Public vs private routes for MD-backed APIs
  • vtex-io-session-apps — Session transforms that may read from or complement MD-stored state
  • architecture-well-architected-commerce — Cross-cutting storage and pillar alignment
  • headless-bff-architecture — BFF boundaries when MD is not accessed from IO
  • vtex-io-application-performance — IO性能模式(缓存层、BFF面向行为)
  • vtex-io-service-paths-and-cdn — MD支撑的API的公私有路由配置
  • vtex-io-session-apps — 可以读取或补充MD存储状态的会话转换
  • architecture-well-architected-commerce — 跨领域存储和支柱对齐
  • headless-bff-architecture — 不从IO访问MD时的BFF边界

Reference

参考资料