WorkSim Voyager — Interactive Environment

🎮 Interactive Demo — Try the Environment

Tool Call

Arguments (JSON)

Last Tool Result

Reset an episode to begin...

Episode Log

Press "Reset Episode" to generate a scenario...

📁 World Assets

🧠 Voyager Agent Loop

Actor
LLM generates

→

Execute
Tool call

→

Critic
LLM evaluates

→

Reflect
Update memory

Skill Library — ChromaDB vector store of reusable tool-call patterns

Working Memory — In-episode scratchpad: goals, facts, plan, errors

Episodic Memory — Cross-episode learning: strategies, failure patterns

Auto Curriculum — Progressive difficulty based on agent performance

Skill Extractor — Mines reusable skills from successful trajectories

🎯 Multi-Component Reward System

R_task — Rubric-based deliverable grading

R_progress — Incremental step rewards

R_evidence — Source quality & authority

R_consistency — Fact alignment checks

R_efficiency — Step economy & focus

R_recovery — Error correction ability

R_skill — Tool-use pattern mastery

R_penalty — Error & repetition costs

R_total = 0.35·R_task + 0.15·R_progress + 0.10·R_evidence + 0.10·R_consistency + 0.10·R_efficiency + 0.08·R_recovery + 0.07·R_skill − R_penalty

🔧 Tool Inventory — 31 Tools

📈 Training Results (Expert Iteration + Voyager-lite)

Agent	Env Reward	Shaped Reward	Δ vs Baseline	Skills
Random Agent	+0.039	N/A	—	0
Basic LLM (pre-train)	+0.021	+4.69	ref	0
Voyager-lite (pre-train)	+0.024	+5.71	+1.02 (+22%)	2
Basic LLM (post-train)	+0.023	+4.56	-0.13	0
⭐ Voyager-lite (post-train)	+0.025	+5.53	+0.84 (+18%)	2