Binding WebGPU Render Passes to deck.gl Custom Layers

The precise sub-problem here is narrow but unforgiving: deck.gl owns the requestAnimationFrame loop and the canvas, but a custom layer that needs its own GPURenderPassEncoder must encode and submit native commands inside the framework’s draw(...) callback without recreating the swap-chain attachment, without recompiling its pipeline on every pan, and without letting an asynchronous queue.submit() race the next frame’s buffer writes. Get the command-encoder scope or the attachment reuse wrong and you pay a full context switch per frame; get the fencing wrong and you tear or drop frames on a moving viewport. This page is the implementation reference for that binding boundary. It assumes you have already wired attributes to buffers and run any WGSL preprocessing as described in the parent deck.gl Layer Integration with WebGPU reference, and that you have a validated device from initializing WebGPU devices for GIS workloads. What follows is the draw-time render-pass logic that sits on top of that groundwork.

The governing rule: deck.gl’s LayerManager schedules when you draw; your layer owns how the commands reach the queue. Bind to deck.gl’s existing canvas texture rather than allocating your own, scope the encoder to a single frame, and gate buffer mutation on the previous frame’s completion.

Runnable Reference Implementation

The layer below subclasses deck.gl’s Layer, acquires the native device in initializeState, builds the compute-versus-render render pipeline and bind group exactly once, and drives a single render pass per draw. Every spatial-specific choice is annotated inline. The pipeline and bind group are cached on the layer’s state so the framework’s reactive re-renders never trigger a shader recompile.

typescript

import { Layer, type LayerContext, type UpdateParameters } from "@deck.gl/core";

// The native handles deck.gl exposes when the WebGPU backend is active.
interface WebGPUContext extends LayerContext {
  device: GPUDevice;
  queue: GPUQueue;
  // luma.gl surfaces the canvas-backed context; getCurrentTexture() returns
  // the SAME swap-chain texture deck.gl rasterizes into this frame.
  gpuCanvasContext: GPUCanvasContext;
  depthTexture: GPUTexture; // deck.gl's shared depth attachment
}

interface RenderState {
  pipeline: GPURenderPipeline;
  bindGroup: GPUBindGroup;
  vertexBuffer: GPUBuffer;     // packed position data, 16-byte aligned
  uniformBuffer: GPUBuffer;    // view/projection matrices, 256-byte sized
  vertexCount: number;
  frameInFlight: Promise<undefined> | null; // the previous frame's fence
}

export class WebGPULayer extends Layer {
  static layerName = "WebGPULayer";
  declare state: RenderState;

  initializeState(context: LayerContext): void {
    const { device } = context as WebGPUContext;

    // Pre-validate the limits a large spatial tile grid will hit, BEFORE any
    // pipeline creation — a failed pipeline throws an opaque error otherwise.
    const required = this.props.tileBufferBytes as number;
    if (device.limits.maxStorageBufferBindingSize < required) {
      throw new Error("maxStorageBufferBindingSize too small for tile grid");
    }

    const pipeline = this.buildPipeline(device);

    // Allocate once. Per-frame createBuffer() is the top cause of allocator jank.
    const vertexBuffer = device.createBuffer({
      size: required,
      usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
    });
    // 256 bytes covers two mat4x4<f32> (view + projection) at the dynamic-uniform
    // offset granularity WebGPU enforces; see memory-alignment reference.
    const uniformBuffer = device.createBuffer({
      size: 256,
      usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
    });

    const bindGroup = device.createBindGroup({
      layout: pipeline.getBindGroupLayout(0),
      entries: [{ binding: 0, resource: { buffer: uniformBuffer } }],
    });

    this.setState({
      pipeline,
      bindGroup,
      vertexBuffer,
      uniformBuffer,
      vertexCount: 0,
      frameInFlight: null,
    });
  }

  // deck.gl calls draw() once per rAF tick with the live viewport state.
  draw({ uniforms, context }: { uniforms: Record<string, unknown>; context: LayerContext }): void {
    const { device, queue, gpuCanvasContext, depthTexture } = context as WebGPUContext;
    const { pipeline, bindGroup, vertexBuffer, uniformBuffer, vertexCount } = this.state;
    if (vertexCount === 0) return;

    // Push the frame's view/projection matrices. writeBuffer is non-blocking and
    // the right channel for a <256 KB per-frame uniform on a moving viewport.
    const viewProjection = uniforms.viewProjectionMatrix as Float32Array;
    queue.writeBuffer(uniformBuffer, 0, viewProjection.buffer, viewProjection.byteOffset, 64);

    // Scope the encoder to THIS frame only — never reuse across rAF ticks.
    const encoder = device.createCommandEncoder({ label: "spatial-layer-frame" });

    // Reuse deck.gl's own swap-chain + depth attachment. Allocating a private
    // colour target here would force a full-screen blit and a context switch.
    const pass = encoder.beginRenderPass({
      colorAttachments: [{
        view: gpuCanvasContext.getCurrentTexture().createView(),
        loadOp: "load",   // 'load', not 'clear' — deck.gl already cleared the frame
        storeOp: "store",
      }],
      depthStencilAttachment: {
        view: depthTexture.createView(),
        depthLoadOp: "load",
        depthStoreOp: "store",
      },
    });

    pass.setPipeline(pipeline);              // cached — no recompile on pan/zoom
    pass.setBindGroup(0, bindGroup);         // cached — static layout
    pass.setVertexBuffer(0, vertexBuffer);
    pass.draw(vertexCount);
    pass.end();

    queue.submit([encoder.finish()]);

    // Fence: hold a handle to this frame's completion so the next buffer mutation
    // can wait on it rather than racing the GPU still reading these vertices.
    this.state.frameInFlight = queue.onSubmittedWorkDone();
  }

  // Stream new geometry only after the in-flight frame has drained.
  async updateGeometry(data: Float32Array): Promise<void> {
    const { queue, frameInFlight, vertexBuffer } = this.state;
    if (frameInFlight) await frameInFlight; // avoid write-while-read corruption
    queue.writeBuffer(vertexBuffer, 0, data.buffer, data.byteOffset, data.byteLength);
    this.setState({ vertexCount: data.length / 4 }); // vec4<f32> stride
  }

  finalizeState(): void {
    // Release native handles on unmount / hot-reload to prevent VRAM leaks.
    this.state.vertexBuffer?.destroy();
    this.state.uniformBuffer?.destroy();
  }

  private buildPipeline(device: GPUDevice): GPURenderPipeline {
    const module = device.createShaderModule({ code: this.props.wgsl as string });
    return device.createRenderPipeline({
      layout: "auto",
      vertex: {
        module,
        entryPoint: "vs_main",
        buffers: [{
          arrayStride: 16, // vec4<f32>: lon, lat, elevation, attribute
          attributes: [{ shaderLocation: 0, offset: 0, format: "float32x4" }],
        }],
      },
      fragment: { module, entryPoint: "fs_main", targets: [{ format: navigator.gpu.getPreferredCanvasFormat() }] },
      primitive: { topology: "point-list" },
      depthStencil: { format: "depth24plus", depthWriteEnabled: true, depthCompare: "less-equal" },
    });
  }
}

The whole binding hinges on three decisions that are easy to get wrong: loadOp: "load" (not "clear") so your pass composites onto deck.gl’s already-rendered frame instead of erasing it; reusing gpuCanvasContext.getCurrentTexture() so there is no separate target to blit; and stashing onSubmittedWorkDone() so updateGeometry cannot overwrite vertices the GPU is mid-read on.

Parameter and Configuration Reference

Every tunable referenced above, with guidance for typical spatial workloads.

Parameter	Value used	Spatial-workload guidance
Vertex `arrayStride`	`16` bytes	One `vec4<f32>` per point (lon, lat, elevation, attribute). Promote `vec3` positions to `vec4` so stride matches the GPU’s 16-byte vector alignment rather than a packed 12.
Uniform buffer size	`256` bytes	Two `mat4x4<f32>` fit in 128; round to 256 to satisfy the dynamic-uniform offset granularity if you later batch per-tile uniforms.
`writeBuffer` payload	64 bytes (one matrix)	Use `writeBuffer` for per-frame uniforms under ~256 KB; switch to a mapped staging buffer above that to dodge `mapAsync` stalls.
Buffer alignment	16-byte multiples	Unaligned strides silently double bandwidth on AMD/NVIDIA. Pad records to 16 even when 12 would “fit.”
`colorAttachment.loadOp`	`"load"`	`"load"` to composite onto deck.gl’s frame; `"clear"` only if your layer owns the entire canvas.
`depthCompare`	`"less-equal"`	`less-equal` lets coincident geometry (stacked tiles at one zoom) draw without z-fighting flicker.
`primitive.topology`	`"point-list"`	For point clouds / scatter. Use `"triangle-list"` for filled polygons; the binding logic is identical.
In-flight frames	1 (single fence)	Hold one `onSubmittedWorkDone()` handle. Allow at most two before the queue over-subscribes and latency climbs on a moving map.
GPU frame budget	< 2 ms	Target sub-2 ms pass execution to leave headroom for deck.gl’s own layers inside a 16.6 ms (60 FPS) frame.

Timestamp profiling

To confirm the sub-2 ms budget, attach a GPUQuerySet to the pass rather than wrapping it in performance.mark(), which only measures CPU-side submission, not shader execution.

typescript

const querySet = device.createQuerySet({ type: "timestamp", count: 2 });
const resolveBuffer = device.createBuffer({
  size: 16, // 2 timestamps × 8 bytes (uint64)
  usage: GPUBufferUsage.QUERY_RESOLVE | GPUBufferUsage.COPY_SRC,
});
const readBuffer = device.createBuffer({
  size: 16,
  usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST,
});

// Declare timestamp writes on the pass descriptor (modern path).
const pass = encoder.beginRenderPass({
  ...renderPassDescriptor,
  timestampWrites: { querySet, beginningOfPassWriteIndex: 0, endOfPassWriteIndex: 1 },
});
// ... setPipeline / draw / end ...

// resolveQuerySet lives on the command encoder, NOT the pass encoder.
encoder.resolveQuerySet(querySet, 0, 2, resolveBuffer, 0);
encoder.copyBufferToBuffer(resolveBuffer, 0, readBuffer, 0, 16);
queue.submit([encoder.finish()]);

await queue.onSubmittedWorkDone();
await readBuffer.mapAsync(GPUMapMode.READ);
const ts = new BigUint64Array(readBuffer.getMappedRange());
const durationMs = Number(ts[1] - ts[0]) / 1e6; // nanoseconds -> ms
readBuffer.unmap();

The timestamp-query feature must be requested at device creation; on adapters that lack it, fall back to CPU marks and treat the number as an upper bound.

Failure Modes Specific to This Binding

Cleared frame — your layer erases deck.gl’s other layers. Symptom: every layer drawn before yours vanishes, leaving only your geometry on a transparent canvas. Cause: loadOp: "clear" on the colour attachment. Fix: use loadOp: "load" so the pass composites onto the existing frame. If you genuinely need a clear, you have ordered the layer wrong — it must be the first to draw.

Pipeline recompile stutter on pan/zoom. Symptom: a frame-time spike every time the viewport changes. Cause: building the GPURenderPipeline or GPUBindGroup inside draw or updateState instead of caching it; deck.gl’s reactive cycle re-invokes those on every prop change. Fix: build pipeline and bind group once in initializeState, store them on state, and only mutate buffer contents per frame. Confirm with device.pushErrorScope("validation") that no new pipeline objects are created mid-stream.

Geometry corruption during streaming. Symptom: points smear or flicker for one frame after a data update. Cause: writeBuffer/mapAsync overwriting a vertex buffer the GPU is still reading for the in-flight frame. Fix: await this.state.frameInFlight before mutating, as in updateGeometry; or double-buffer the vertex storage so writes target the idle buffer.

Stale swap-chain view — GPUValidationError on submit. Symptom: an “attachment texture not from current frame” validation error after a canvas resize. Cause: caching the texture view from getCurrentTexture() across frames. Fix: call gpuCanvasContext.getCurrentTexture().createView() fresh inside every draw; the swap-chain texture is only valid for the frame it was acquired in.

Device lost mid-stream — silent no-op submits. Symptom: rendering simply stops, no error. Cause: a driver reset or backgrounded tab fired device.lost, invalidating every buffer and pipeline. Fix: subscribe to device.lost, tear down via finalizeState, re-run device initialization, and degrade to the WebGL2 backend through browser-support fallback routing when re-acquisition repeatedly fails. For resilient re-acquisition under driver load, reuse the device polling pattern.

Backend / Python Interop Note

The vertex buffer this pass binds expects a 16-byte stride (vec4<f32>), and the Python side that produces the stream must emit byte-identical records or the rasterizer reads shifted coordinates. With pyarrow, build a fixed-size-list column of four float32 lanes — pa.list_(pa.float32(), 4) — so each record is exactly 16 bytes with no Arrow-level padding between rows, then ship the raw buffer over a binary WebSocket. With geopandas, extract coordinates with geometry.get_coordinates(), stack lon/lat/elevation/attribute into a C-contiguous numpy array of dtype=np.float32 and shape (n, 4), and call .tobytes() — the resulting bytes drop straight into queue.writeBuffer with no reparse. The fourth lane is never wasted: carry intensity, classification, or a timestamp there so the promotion from vec3 to vec4 pays for itself. The 256-byte uniform sizing on the GPU side mirrors what the matrix-alignment rules in memory alignment for spatial data buffers require; if the backend also emits view/projection matrices, serialize them column-major to match WGSL’s mat4x4<f32> expectation. Delta-encode coordinates between frames where the viewport moves incrementally and batch them into a single writeBuffer per frame to keep the main thread free for interaction handlers. Authoritative byte-layout rules are in the W3C WebGPU specification.

deck.gl Layer Integration with WebGPU — parent reference for attribute-to-buffer mapping and WGSL preprocessing
React state hydration for GPU contexts — keeping the control plane out of the per-frame byte path that feeds this pass
CesiumJS mapping pipeline optimization — the same render-pass binding applied to a 3D Tiles host engine
Memory alignment for spatial data buffers — the 16-byte and 256-byte rules the buffers above depend on
WebGPU compute vs render pipeline fundamentals — when a preprocessing compute pass should feed this render pass

Up one level: deck.gl Layer Integration with WebGPU.