Structuring Uniform Buffers for Coordinate Alignment

A coordinate-transform uniform buffer is the smallest, most frequently re-uploaded buffer in a spatial pipeline — typically a handful of mat4x4<f32> matrices plus a few scalar viewport parameters — and it is the one most likely to silently corrupt a map. The exact sub-problem this page addresses: how to pack the projection, view, and model matrices (plus viewport offset and level-of-detail thresholds) into one uniform buffer whose byte offsets match what the WGSL struct declares, so that every conformant adapter reads the same transform and a pan or zoom does not shear the basemap or snap vertices to the wrong tile. The failure is almost never a thrown error. WGSL’s struct layout is deterministic, the CPU-side packer usually is not, and a one-vec4 offset mismatch reprojects geometry into the void without tripping the validation layer. The fix is to derive the offsets from the memory alignment rules once and keep the CPU packer, the WGSL struct, and any backend serializer locked to the same table.

Runnable reference implementation

The layout below is the canonical shape for a per-frame transform buffer. The WGSL struct declares explicit offsets; the TypeScript packer writes into a single ArrayBuffer at those same offsets through typed-array views, never relying on implicit field order. Because the whole struct is a multiple of 16 bytes and every field is 16-byte aligned, it is also safe to host two camera states back-to-back in one buffer and select between them with a dynamic offset.

wgsl

// Canonical coordinate-transform uniform layout.
// Every field is 16-byte aligned; total stride is 224 bytes (a multiple of 16),
// so this struct is safe to use with dynamic offsets (which must be 256-aligned —
// see the padding note below) and to place in an array.
struct TransformUniforms {
  projection      : mat4x4<f32>,  // offset   0, size 64
  view            : mat4x4<f32>,  // offset  64, size 64
  model           : mat4x4<f32>,  // offset 128, size 64
  viewport_offset : vec4<f32>,    // offset 192, size 16  (x,y origin; z,w = pixel ratio, time)
  lod_thresholds  : vec4<f32>,    // offset 208, size 16  (four screen-space LOD cutoffs)
};                                // total 224 bytes

@group(0) @binding(0) var<uniform> u_transforms : TransformUniforms;

@vertex
fn vs_main(@location(0) position : vec3<f32>) -> @builtin(position) vec4<f32> {
  // World coords are pre-shifted relative-to-center on the CPU, so `position`
  // is small and f32 precision holds even at continental extents.
  let world = u_transforms.model * vec4<f32>(position, 1.0);
  return u_transforms.projection * u_transforms.view * world;
}

typescript

// CPU-side packer. Offsets are declared once, in bytes, and reused so the
// ArrayBuffer is laid out byte-for-byte like the WGSL struct above.
const FLOATS_PER_MAT4 = 16;
const STRIDE = 224;                 // bytes per TransformUniforms record
const OFF = {
  projection:      0,
  view:            64,
  model:           128,
  viewportOffset:  192,
  lodThresholds:   208,
} as const;

function packTransforms(
  projection: Float32Array,   // 16 column-major floats
  view: Float32Array,         // 16 column-major floats
  model: Float32Array,        // 16 column-major floats
  viewportOffset: [number, number, number, number],
  lodThresholds: [number, number, number, number],
): ArrayBuffer {
  const buf = new ArrayBuffer(STRIDE);
  const f32 = new Float32Array(buf);
  f32.set(projection, OFF.projection / 4);
  f32.set(view,       OFF.view / 4);
  f32.set(model,      OFF.model / 4);
  f32.set(viewportOffset, OFF.viewportOffset / 4);
  f32.set(lodThresholds,  OFF.lodThresholds / 4);
  return buf;
}

// One buffer, sized and used as a uniform with frequent CPU writes.
const uniformBuffer = device.createBuffer({
  size: STRIDE,
  usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
});

// Per-frame update: pack on the CPU, then a single writeBuffer upload.
function uploadFrame(p: Float32Array, v: Float32Array, m: Float32Array): void {
  const cpu = packTransforms(p, v, m, [originX, originY, devicePixelRatio, now], LOD);
  device.queue.writeBuffer(uniformBuffer, 0, cpu);
}

If matrices change every frame at interactive rates (inertial pan, sub-pixel camera drift), prefer pre-transforming vertices in a compute pipeline and consuming the result with drawIndirect, rather than re-uploading uniforms per draw. For static-per-frame matrices the render-pipeline binding shown here is the lower-overhead path. For multiple cameras or ring-buffered history, allocate N * 256 bytes (dynamic offsets must be aligned to minUniformBufferOffsetAlignment, 256 by default) and index with pass.setBindGroup(0, bindGroup, [dynamicOffset]).

Parameter and configuration reference

Every tunable constant in the code above, with guidance for spatial workloads. WGSL alignment rules are fixed by the specification; the others are workload choices.

Value	Where	Default / rule	Spatial-workload guidance
`mat4x4<f32>` align / size	WGSL	16-byte align, 64 bytes	Fixed by spec. Three matrices fill the first 192 bytes with zero padding.
`vec4<f32>` align / size	WGSL	16-byte align, 16 bytes	Pad any `vec3` viewport/origin value to `vec4`; a bare `vec3<f32>` aligns to 16 but a following `f32` will not pack into the gap as you expect.
`STRIDE`	TS / WGSL	224 bytes	Must be a multiple of 16. Keep CPU and WGSL identical; this is the value to assert in CI.
`minUniformBufferOffsetAlignment`	device limit	256 bytes	Dynamic offsets must be multiples of this. Pad each camera record from 224 to 256 when ring-buffering.
`maxUniformBufferBindingSize`	device limit	≥ 65536 bytes guaranteed	One transform record is tiny; this limit only bites when batching many per-tile matrices — move those to a storage buffer instead.
`usage` flags	TS	`UNIFORM \| COPY_DST`	`COPY_DST` is required for `queue.writeBuffer`. Omit `MAP_WRITE` for a per-frame buffer; mapping adds synchronization cost.
Matrix order	TS → WGSL	column-major	WGSL default. Transpose on the CPU if your math library emits row-major, or the projection will be silently wrong.
RTC origin	`viewport_offset.xy`	world center	Subtract a per-tile or per-frame center before building `model` so `f32` precision holds past ±10,000 units.

Failure modes specific to uniform coordinate buffers

Offset drift between CPU packer and WGSL struct. Adding a field to the WGSL struct (say a vec2<f32> before lod_thresholds) shifts every later offset, but the TypeScript OFF table is edited separately — or not at all. The buffer uploads cleanly and the rasterizer reads lod_thresholds out of the padding gap. Detect it with a CI assertion that the WGSL-declared total and the TS STRIDE agree, and that each OFF.* equals the running offset computed from the field types. Fix by deriving both from one shared schema.

Row-major / column-major transpose. A projection matrix from a row-major library written straight into the buffer produces geometry that is mirrored or sheared but still on-screen, so it reads as a “coordinate system bug” rather than a layout bug. Detect by rendering a known unit square at the origin; if it skews, transpose before writeBuffer.

Dynamic offset alignment rejection. Ring-buffering cameras at the natural 224-byte stride throws a GPUValidationError the moment a non-256-aligned offset is passed to setBindGroup. Detect from the validation message (“dynamic offset … is not a multiple of”). Fix by padding each record to 256 bytes.

f32 precision snapping at large coordinates. Without a relative-to-center shift, world coordinates in the millions lose mantissa bits and vertices visibly snap to a grid as you zoom. Detect by watching for quantized motion at high zoom over far-from-origin tiles. Fix with the RTC origin in viewport_offset and, for extreme extents, a high/low vec4<f32> decomposition of position. Wrap the upload in device.pushErrorScope('validation') / popErrorScope() during development to surface any layout rejection deterministically.

Backend / Python interop note

When matrices are precomputed server-side — batched reprojection jobs, tile-pyramid baking, or a GeoParquet/Arrow pipeline that ships transforms alongside geometry — the Python serializer must emit the identical 224-byte record. Mirror the WGSL struct with a numpy structured dtype and assert its itemsize so a schema change fails loudly before upload:

python

import numpy as np

transform_dtype = np.dtype({
    'names':   ['projection', 'view', 'model', 'viewport_offset', 'lod_thresholds'],
    'formats': [('<f4', (4, 4)), ('<f4', (4, 4)), ('<f4', (4, 4)),
                ('<f4', (4,)),   ('<f4', (4,))],
    'offsets': [0, 64, 128, 192, 208],
    'itemsize': 224,
}, align=False)

assert transform_dtype.itemsize == 224  # byte-for-byte match with WGSL

# numpy is row-major by default; store matrices column-major (transpose) so the
# bytes match WGSL's column-major mat4x4<f32> without a host-side rebuild.
record = np.zeros(1, dtype=transform_dtype)
record['projection'] = projection.T

The explicit <f4 little-endian codes and pinned offsets are deliberate: every current WebGPU target is little-endian, but fixing byte order and offsets removes a variable from GIS pipelines that pre-serialize on heterogeneous workers. When the source is GeoParquet or an Arrow RecordBatch, order the schema fields to this dtype so the conversion is a zero-copy view rather than a column-by-column repack — the same byte-parity discipline the GPU upload depends on.

Memory Alignment for Spatial Data Buffers — the full WGSL stride and padding rules this layout is derived from.
WebGPU Compute vs Render Pipeline Fundamentals — when to pre-transform vertices in compute instead of re-uploading uniforms.
Configuring WebGPU Adapter Limits for Large GeoJSON — negotiating the buffer-size and binding limits these uploads run against.
Initializing WebGPU Devices for GIS Workloads — where minUniformBufferOffsetAlignment and other limits come from.
Setting Up WebGPU Device Polling for GIS Apps — acquiring the device before any buffer allocation.

Up: Memory Alignment for Spatial Data Buffers

Structuring Uniform Buffers for Coordinate Alignment

Runnable reference implementation

Parameter and configuration reference

Failure modes specific to uniform coordinate buffers

Backend / Python interop note

Related