Code Policy Models
An AI research project exploring whether large language models can serve as effective optimizers for game-playing agents by iteratively writing and refining Python code as the policy representation - replacing neural network weights with human-readable programs and gradient descent with LLM-guided code editing. The system operates an evolutionary loop: each generation, the LLM produces candidate policy edits, which are evaluated through parallel rollouts in a target environment (currently Pokemon Blue running on a headless Game Boy emulator), with optional Gemini video analysis providing multimodal feedback on agent behavior. A tournament selection mechanism pits multiple LLM-generated candidates against an elite policy to balance exploration with stability, while the full rollout trajectory and reward signal are fed back as context for the next generation's edits.
Read article→