Experiment: AI Autonomously Evolving a Web App’s Objective


Building web apps is often an iterative process of refinement: start with a rough concept, implement core features, test, tweak, and gradually shape it into something useful. What if one of the biggest refinement steps—rethinking the goal itself—could happen automatically, driven by the AI?

I’ve been thinking about an experimental setup where a powerful LLM (like the latest Grok, Claude, or a mixture) builds and evolves a web app from an initial concept, with explicit permission to update its own objective over time. This is purely an experiment to push the boundaries of autonomous AI-driven development and see how far the concept can be taken.

The Experimental Setup

At the project root sits a single objective.md file as the source of truth. It begins with my high-level guidance:

  • A clear description of the initial app concept
  • Key requirements or must-have behaviors
  • A small set of unchanging core principles (e.g., “must remain offline-first and local-only,” “prioritize privacy with zero server dependencies,” “keep the stack minimal and close to web platform primitives”)

Development then proceeds in tight iterations:

  1. The AI reads the current objective.md
  2. It implements, adds, or refactors code to advance toward that objective
  3. After making changes, it creatively re-evaluates: What could make this better? What might users actually want? Are there simpler approaches? How could the app become more focused, capable, or interesting?
  4. It rewrites objective.md with an updated vision—refining scope, introducing new ideas, or dropping unpromising directions—while strictly preserving the core principles

This creates a deliberate feedback loop where the AI isn’t merely executing a fixed spec but actively steering the project’s direction. Over many iterations, the final app could diverge significantly from the starting prompt. The experiment is explicitly about letting the AI choose what to build, within loose guardrails.

Why I’m Trying This (As an Experiment)

The main goal is to explore the limits of near-zero human involvement and observe how small, iterative updates compound when the AI is allowed to critique and redirect its own goals. A trivial “simple expense tracker” might autonomously evolve into something with smart categorization, forecasting, or gamified elements—ideas the model surfaces that I wouldn’t have considered upfront.

Known and Expected Major Issues

This approach will almost certainly break down in practice for many reasons:

  • Objective drift: Without strong corrective mechanisms, scope can explode, features can become incoherent, or the app could become a simple collection of unrelated features.
  • Lack of self-correction: There’s no built-in mechanism here to reliably navigate away from wrong directions.
  • Human involvement in disguise: In reality, someone will likely need to review and approve (or revert) objective.md changes and fix broken refactors.
  • Weak feedback loops for web apps: Unlike CLI tools with instant executable feedback, web apps require manual browser testing or complex automation (Playwright, screenshots, visual diffs), which adds fragility, cost, and token overhead.
  • Absence of taste and judgment: LLMs can generate endless speculative features but lack the human discernment needed to ship focused, polished products.

Token cost is a hard limiter as well, especially for experimental projects.

I plan to run this experiment soon on a few low-stakes ideas that I haven’t had time to build manually. I’ll likely start with a simple CLI tool (where feedback is tightest) before attempting a web app. The goal is purely to learn how far the concept can be pushed.

Popular Posts