Recipe: bucket statistics summary

Recipe: bucket statistics summary

Recipe: bucket statistics summary

💡

Prompt example

Show me a summary of how many assets, folders, and files are in each bucket, broken down by status, media type, and storage class, with totals across all buckets.

Enumerate every top-level bucket and report a full breakdown - asset types, statuses, media types, storage classes, size, and duration - all from a single search query. This is read-only, Class A (safe on any environment, including prod).

Key facts

  • Buckets have assetType == 5. Query assetType Equals 5 to enumerate them - do not scope by parentId, because buckets span the root level and a parentId filter would miss any that aren't direct children of the sentinel.
  • Each bucket's search result already carries a fully populated assetStats object computed at index time. No per-bucket follow-up queries are needed.
  • assetStats contains:
  • assetTypeCounts - counts keyed by type name ("file", "folder", "bucket")
  • assetStatusCounts - counts keyed by status name ("available", "registering", "uploading", "placeholder", "archived", "error", etc.)
  • mediaTypeCounts - counts keyed by media type name ("video", "image", "audio", "document", "text", "mediamanifest", etc.)
  • storageClassCounts - counts keyed by storage class name ("standard", "intelligenttiering", "glacier", etc.)
  • totalContentLength / totalContentLengthDisplay - total size in bytes / human-readable
  • totalVideoDuration / totalVideoDurationDisplay - total video duration
  • totalAudioDuration / totalAudioDurationDisplay - total audio duration
  • All keys in the count dicts are discovered dynamically - new statuses or storage classes appear automatically without code changes.

Python

# component: search

def get_buckets(sdk):
    """One search call - returns all bucket rows with assetStats populated."""
    flt = [{"fieldName": "assetType", "operator": "Equals", "values": 5}]
    return (search(sdk, filters=flt, size=200) or {}).get("items", [])


def bucket_stats_summary(sdk):
    """Return per-bucket stat rows plus a grand-total row.

    Each row dict contains:
      name, id, files, folders,
      all assetStatusCounts keys (prefixed "status_"),
      all mediaTypeCounts keys  (prefixed "media_"),
      all storageClassCounts keys (prefixed "storage_"),
      size, video_dur, audio_dur.
    The last row has name="TOTAL" and summed integer fields.
    """
    def _int(v):
        try:
            return int(v) if v is not None else 0
        except (TypeError, ValueError):
            return 0

    buckets = get_buckets(sdk)
    rows = []
    for b in buckets:
        name = (b.get("identifiers") or {}).get("displayName") or b.get("id")
        s = b.get("assetStats") or {}
        row = {
            "name":    name,
            "id":      b.get("id"),
            "files":   _int((s.get("assetTypeCounts") or {}).get("file")),
            "folders": _int((s.get("assetTypeCounts") or {}).get("folder")),
            "size":    s.get("totalContentLengthDisplay") or "0 bytes",
            "video_dur": s.get("totalVideoDurationDisplay") or "0 sec",
            "audio_dur": s.get("totalAudioDurationDisplay") or "0 sec",
        }
        for k, v in (s.get("assetStatusCounts") or {}).items():
            row[f"status_{k}"] = _int(v)
        for k, v in (s.get("mediaTypeCounts") or {}).items():
            row[f"media_{k}"] = _int(v)
        for k, v in (s.get("storageClassCounts") or {}).items():
            row[f"storage_{k}"] = _int(v)
        rows.append(row)

    rows.sort(key=lambda r: r["files"] + r["folders"], reverse=True)

    # Collect all dynamic keys then build the grand-total row
    all_keys = {k for r in rows for k in r if k.startswith(("status_", "media_", "storage_"))}
    grand = {"name": "TOTAL", "id": None,
             "files":   sum(r["files"]   for r in rows),
             "folders": sum(r["folders"] for r in rows)}
    for k in all_keys:
        grand[k] = sum(r.get(k, 0) for r in rows)

    return rows + [grand]

JavaScript

// component: search

async function getBuckets(sdk) {
    // One call - assetStats is populated on every bucket result.
    const flt = [{ fieldName: "assetType", operator: "Equals", values: 5 }];
    const res = await search(sdk, null, flt, null, 200);
    return res ? res.items : [];
}

export async function bucketStatsSummary(sdk) {
    const buckets = await getBuckets(sdk);

    const rows = buckets.map((b) => {
        const name = b.identifiers?.displayName ?? b.id;
        const s    = b.assetStats ?? {};
        const row  = {
            name,
            id:      b.id,
            files:   (s.assetTypeCounts?.file)    ?? 0,
            folders: (s.assetTypeCounts?.folder)  ?? 0,
            size:    s.totalContentLengthDisplay  ?? "0 bytes",
            videoDur: s.totalVideoDurationDisplay ?? "0 sec",
            audioDur: s.totalAudioDurationDisplay ?? "0 sec",
        };
        for (const [k, v] of Object.entries(s.assetStatusCounts  ?? {})) row[`status_${k}`]  = v ?? 0;
        for (const [k, v] of Object.entries(s.mediaTypeCounts    ?? {})) row[`media_${k}`]   = v ?? 0;
        for (const [k, v] of Object.entries(s.storageClassCounts ?? {})) row[`storage_${k}`] = v ?? 0;
        return row;
    });

    rows.sort((a, b) => (b.files + b.folders) - (a.files + a.folders));

    // Grand total - sum all integer fields discovered across all rows
    const allKeys = [...new Set(rows.flatMap(r =>
        Object.keys(r).filter(k => k.startsWith("status_") || k.startsWith("media_") || k.startsWith("storage_"))
    ))];
    const grand = { name: "TOTAL", id: null,
        files:   rows.reduce((s, r) => s + r.files,   0),
        folders: rows.reduce((s, r) => s + r.folders, 0),
    };
    for (const k of allKeys) grand[k] = rows.reduce((s, r) => s + (r[k] ?? 0), 0);

    return [...rows, grand];
}

Notes

  • One query total. assetStats is computed at index time and embedded in every bucket's search result - no per-bucket follow-up calls are needed. The previous approach (2 extra uuidSearchField queries per bucket) is superseded by reading assetStats directly.
  • Dynamic keys. The count dicts (assetStatusCounts, mediaTypeCounts, storageClassCounts) are iterated at runtime - new values (e.g. a new storage tier, a new status) appear automatically without code changes.
  • Index lag. assetStats is refreshed when the bucket is re-indexed. Very recently ingested assets may not be reflected until the next index cycle.
  • Buckets with identical names: display names repeat on some deployments (see folder-navigation.md). The summary keeps all rows - disambiguate by id if needed.