WebGPU Compute vs Render Pipeline Fundamentals

A common breaking point in production GIS rendering is the moment a coordinate array needs both transformation and display in the same frame: a continental vector tile must be reprojected, culled, and indexed, then drawn — at 60 fps, without copying millions of vertices back to the CPU between steps. WebGPU resolves this by explicitly decoupling general-purpose GPU computation from rasterized output. Where WebGL forced developers to abuse fragment shaders for compute via framebuffer feedback loops, WebGPU exposes dedicated compute pipelines that run independently of any rendering context. A compute pipeline executes arbitrary data transformations — coordinate reprojection, spatial indexing, viewport culling, tile generation — without rasterization overhead; a render pipeline is optimized for vertex assembly, primitive clipping, and fragment shading, making it the right stage for final map rendering, heatmap compositing, and vector overlay draws. Getting the boundary between the two right is what lets a spatial engine hand a freshly transformed buffer straight from a compute pass into a draw call with zero round-trips. This page is part of the broader WebGPU Architecture for Spatial Visualization reference.

Prerequisites

This page assumes you can already stand up a device and submit a frame. Before working through the implementation below, you should have:

A working device handle. You can call navigator.gpu.requestAdapter() and adapter.requestDevice() and hold a valid GPUDevice. If acquisition is flaky under driver load, set that up first via Initializing WebGPU Devices for GIS Workloads and the retry orchestration in Setting Up WebGPU Device Polling for GIS Apps.
A browser with a conformant implementation. Chrome/Edge 113+ or any Chromium with WebGPU enabled; Firefox 141+ and Safari 18+ ship it but with differing limits. Sessions that cannot supply a conformant device need Browser Support & Fallback Routing Strategies.
WGSL fluency at the entry-point level. You can read a @compute and a @vertex/@fragment shader and know what @group/@binding mean.
Data already in a typed array. Coordinates arrive as a Float32Array (or are decoded from GeoParquet/Arrow on the backend) — this page covers what happens once those bytes are GPU-resident, not the decode step.
Familiarity with WGSL memory rules. Buffer interop below relies on the alignment constraints detailed in Memory Alignment for Spatial Data Buffers.

Pipeline descriptor reference

The lifecycle divergence between the two pipeline kinds begins at creation. A compute pipeline needs only a compute stage and a bind group layout; a render pipeline demands full vertex state, primitive topology, multisample configuration, and color/depth attachment formats. The table below summarizes the descriptor surface that matters for spatial data.

Concern	Compute pipeline	Render pipeline
Descriptor	`GPUComputePipelineDescriptor`	`GPURenderPipelineDescriptor`
Required stage(s)	`compute.entryPoint`	`vertex.entryPoint` (+ usually `fragment`)
Topology / attachments	none	`primitive.topology`, `targets[]`, `depthStencil`
Buffer usage flags	`STORAGE` (+ `COPY_DST`/`COPY_SRC`)	`VERTEX` / `INDEX` / `UNIFORM` (often `STORAGE \| VERTEX` shared)
Work launch call	`pass.dispatchWorkgroups(x, y, z)`	`pass.draw()` / `pass.drawIndexed()`
Key adapter limit	`maxComputeWorkgroupSizeX`, `maxComputeInvocationsPerWorkgroup`, `maxStorageBufferBindingSize`	`maxVertexBuffers`, `maxVertexAttributes`, `maxColorAttachments`
Spatial use	reprojection, culling, spatial hash/quadtree build	tile rasterization, vector overlays, heatmaps

For GIS workloads the load-bearing line in that table is the shared usage flag: a buffer created with GPUBufferUsage.STORAGE | GPUBufferUsage.VERTEX is written by a compute pass and then bound directly as vertex input, eliminating the CPU round-trip that dominates large-dataset frame time. Adapter limits in the final row decide whether continental data fits at all — see How to Configure WebGPU Adapter Limits for Large GeoJSON for negotiating elevated ceilings.

Implementation walkthrough

Step 1 — Reproject coordinates in a compute pass

The compute shader transforms an array of coordinate pairs in place. The @workgroup_size(256) matches a common occupancy sweet spot, and the bounds guard handles datasets whose length is not a multiple of the workgroup size — routine for arbitrary GeoJSON feature counts.

wgsl

// Compute: spatial reprojection in place
@group(0) @binding(0) var<storage, read_write> coords: array<vec2<f32>>;
@group(0) @binding(1) var<uniform> transform: TransformParams;

@compute @workgroup_size(256)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
    let idx = id.x;
    if (idx >= arrayLength(&coords)) { return; } // ragged tail guard
    coords[idx] = apply_projection(coords[idx], transform);
}

Step 2 — Bind the same buffer as vertex input

The render pipeline reads the transformed buffer with no copy. Because the storage buffer was created with both STORAGE and VERTEX usage, the bytes the compute pass just wrote are the bytes the vertex shader reads.

wgsl

// Render: vector overlay reads the transformed buffer
@group(0) @binding(0) var<storage, read> coords: array<vec2<f32>>;

@vertex
fn vs_main(@builtin(vertex_index) idx: u32) -> @builtin(position) vec4<f32> {
    let pos = coords[idx];
    return vec4<f32>(pos, 0.0, 1.0);
}

Step 3 — Allocate the shared buffer and record both passes

The crucial choices on the TypeScript side are the combined usage flags and the ordering of the two passes inside one command encoder. Recording the compute pass before the render pass in a single submission is what lets WebGPU resolve the data dependency for you.

// One buffer, two roles: compute writes it, render reads it as vertices.
const coordBuffer = device.createBuffer({
  size: featureCount * 2 * Float32Array.BYTES_PER_ELEMENT, // vec2<f32> per point
  usage: GPUBufferUsage.STORAGE | GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
});
device.queue.writeBuffer(coordBuffer, 0, sourceCoords); // typed array from decode step

const encoder = device.createCommandEncoder();

// --- Compute pass: reproject in place ---
const cpass = encoder.beginComputePass();
cpass.setPipeline(computePipeline);
cpass.setBindGroup(0, computeBindGroup);
cpass.dispatchWorkgroups(Math.ceil(featureCount / 256)); // one invocation per point
cpass.end();

// --- Render pass: draw the transformed buffer, same submission ---
const rpass = encoder.beginRenderPass(renderPassDescriptor);
rpass.setPipeline(renderPipeline);
rpass.setBindGroup(0, renderBindGroup);   // binds coordBuffer as read-only storage
rpass.draw(featureCount);
rpass.end();

device.queue.submit([encoder.finish()]);

Bind group reuse across the two passes eliminates redundant descriptor updates and the associated CPU-side driver overhead. WebGPU guarantees that compute passes recorded before a render pass in the same submission complete before the render pass reads their outputs — no explicit barrier call is required within a single command buffer. That guarantee is the entire reason the hand-off above is safe.

Memory and performance implications

The @workgroup_size chosen in Step 1 dictates thread distribution across the dataset. For large vector and point-cloud data, partitioning into fixed chunks (256 or 512 invocations) maximizes occupancy while keeping any single dispatch short enough to avoid the watchdog timeouts that kill long-running kernels. Render pipelines then consume those partitioned buffers via indexed or instanced draws.

Three quantities govern whether a frame fits:

VRAM footprint. A single vec2<f32> coordinate buffer costs featureCount * 8 bytes. Five million points is ~40 MB for position alone; add per-vertex attributes (elevation, category, timestamp) and a copy for double-buffered streaming, and a single zoom level can approach maxStorageBufferBindingSize. Reusing one STORAGE | VERTEX buffer instead of a separate compute output plus a vertex copy halves that footprint.
Transfer cost. The zero-copy hand-off removes the most expensive operation in the naive design — a copyBufferToBuffer back to a mappable buffer plus a CPU read. For a 40 MB buffer that round-trip alone can exceed a frame budget; eliminating it is usually the single largest win.
Dispatch sizing. Workgroup count is ceil(featureCount / workgroup_size). Keep workgroup_size a multiple of the hardware warp/wave width (32 or 64) so trailing invocations do not idle a partially filled wave. The ragged-tail guard in Step 1 lets you pick a clean power of two regardless of feature count.

Adapter limits constrain all three. Exceeding maxBufferSize, maxStorageBufferBindingSize, or the compute dispatch dimensions causes pipeline-creation or runtime validation failure, so query adapter.limits and implement chunking for continental-scale datasets rather than assuming defaults. The exact negotiation and chunking patterns live in How to Configure WebGPU Adapter Limits for Large GeoJSON. Layout decisions interact with cache behavior too: 16-byte-aligned structs enable coalesced fetches, covered in Memory Alignment for Spatial Data Buffers.

Failure modes and diagnostics

Most compute/render boundary bugs surface as one of a small set of named errors. Detecting which one — and where it fires — tells you the cause.

GPUValidationError at bind group or pipeline creation. The most common spatial-data cause is a buffer missing a usage flag — e.g. you bind a buffer as vertex input but created it with STORAGE only, omitting VERTEX. It also fires when a struct’s @align/@size differs between the compute and render WGSL modules sharing a bind group. Wrap creation in device.pushErrorScope('validation') / popErrorScope() to capture the exact message instead of a console warning.
OperationError from buffer mapping. Raised if you call mapAsync() on a buffer still in use by an in-flight submission, or one lacking MAP_READ. This appears when debugging — reading back a compute result — not in the steady-state zero-copy path. Await queue.onSubmittedWorkDone() before mapping.
Device lost (device.lost). A dispatch that runs too long, or a buffer allocation past physical VRAM, can trigger a TDR-style reset; the promise resolves with reason: 'unknown' or 'destroyed'. The spatial cause is almost always an unchunked dispatch over a continental dataset. Recovery means re-acquiring a device and re-uploading buffers — implement it through the polling page above so the re-upload path is idempotent.
Silent geometry tearing, no error. A mismatch in stride or field order between how the compute pass writes and how the vertex shader reads produces no exception — features simply land in the wrong place. This belongs in CI: snapshot a known tile and diff against a reference render. The defense is deriving both layouts from one shared struct definition.

For cross-submission work — where a later command buffer must wait on results from an earlier one — the automatic intra-submission ordering does not apply; the CPU must await queue.onSubmittedWorkDone() before recording the dependent submission. The WebGPU Specification defines the precise execution-ordering guarantees, and the MDN WebGPU Documentation gives worked examples of error scopes and device-lost handling.

In this section

How to Configure WebGPU Adapter Limits for Large GeoJSON — negotiating requiredLimits, clamping against physical ceilings, and chunking topology preprocessing so continental payloads clear maxStorageBufferBindingSize and maxBufferSize.

Memory Alignment for Spatial Data Buffers — the WGSL alignment rules the shared compute/render buffer must satisfy on both sides.
Structuring Uniform Buffers for Coordinate Alignment — laying out the TransformParams uniform the reprojection kernel reads.
Initializing WebGPU Devices for GIS Workloads — acquiring the device and limits these pipelines are built against.
Browser Support & Fallback Routing Strategies — what to render when no conformant device is available for the compute path.

Up: WebGPU Architecture for Spatial Visualization