ShareVerse: Collaborative Video Generation for Shared World Modeling

arXiv Code 🤗 Model Dataset
Teaser Figure

ShareVerse: Collaborative Video Generation for Shared World Modeling. ShareVerse empowers distributed agents to collaboratively synthesize a globally consistent virtual environment. We bridge isolated generative priors through two core mechanisms: (1) implicit cross-agent interaction, which resolves visual conflicts during concurrent exploration (red/blue vehicles); and (2) a global Spatiotemporal Memory Cache, which guarantees long-term environmental permanence during asynchronous revisitation (green vehicle).

Framework Overview

Dataset (a) Dataset
Pipeline (b) Pipeline
Method (c) Method

Experimental Results

For each group, the top row sequentially shows the global point cloud, local point cloud, and a pair of certain frames. The bottom row contains the trajectory map and two videos: the left video corresponds to agent 1, and the right one corresponds to agent 2.

(a) Both Straight (Opposing)

1g 1l 1f
1

00:00 / 00:00

(b) Both Turning (Parallel Opposing)

2g 2l 2f
2

00:00 / 00:00

(c) A1 Straight, A2 Turning (Lateral)

3g 3l 3f
3

00:00 / 00:00

(d) A1 Straight, A2 Turning (Longitudinal)

4g 4l 4f
4

00:00 / 00:00

(e) Both Turning (T-Pattern)

5g 5l 5f
5

00:00 / 00:00

(f) Both Turning (X-Pattern)

6g 6l 6f
6

00:00 / 00:00