Testing GLM-5.2 on a Real Rails Upgrade

Every few weeks, a new open-source model appears with claims that it matches the frontier. I’ve watched this pattern long enough to be skeptical — the charts are fine for YouTube content, but they don’t tell you how a model behaves on a codebase that hasn’t been touched in years, with deprecated gems, a Node-dependent asset pipeline, and data that needs to move between databases.

When GLM-5.2 from Z.ai started circulating on X last week, the comparisons were to Claude Opus 4.8 — specifically on long-horizon coding benchmarks like Terminal-Bench 2.1, where GLM-5.2 scores 81.0 against Opus 4.8’s 85.0, and FrontierSWE, where it trails by about one point. For a fully open-weight model under an MIT license, that’s a meaningful result. Whether it translates to real work is a different question.

I had a project waiting that would give it an honest answer.

The application

Agencia f/64 is a photojournalism site I built for documentary coverage in Colima — the link points to the live site, still running Rails 6.1 as of this writing. It’s been that way for several years: Node in the asset pipeline to compile CSS and JavaScript, small database, no complex integrations — but the kind of accumulation you’d expect from an app that was built once and then mostly left alone: some gems no longer actively developed, configurations that made sense at the time, dependencies on JavaScript libraries that don’t play well with importmap.

The goal was clear: bring it to Rails 8.1, remove the Node dependency entirely, and migrate the data from PostgreSQL to SQLite. PostgreSQL was the default choice years ago, but for an app this size it’s unnecessary overhead.

The setup: Ollama Cloud and Oh-My-Pi

I can’t run a 744B model locally — GLM-5.2 is a mixture-of-experts model with 40B active parameters, and it requires hardware I don’t have. So I created an account on Ollama Cloud and took the Pro plan at $20/month, which gives full access to the catalog and enough GPU time for sustained work.

For the harness I used Oh-My-Pi, an open-source AI agent for the terminal. One thing I didn’t expect: when I started it, it discovered the Claude Code skills I already have set up. That included the Rails Upgrade Assistant, one of the Maquina tools I’ve built for exactly this kind of work.

For context, I had already been running Qwen 3.6 locally via Ollama for lighter tasks, but it runs out of capability quickly on anything that requires sustained reasoning across a large or legacy codebase. The upgrade task needed something with more range.

The Rails 6.1 to 8.1 upgrade path

Rails upgrades have to be sequential — you can’t jump from 6.1 directly to 8.1. The path here was five hops:

6.1 → 7.0 → 7.1 → 7.2 → 8.0 → 8.1

The Rails Upgrade Assistant handled the planning step well. For each hop it analyzed the codebase, identified breaking changes specific to the code rather than just listing everything from the CHANGELOG, and generated upgrade reports. From there, GLM-5.2 worked through the implementation.

The model moved through configuration changes, deprecated API replacements, and gem updates without needing much direction — including finding replacements for a few gems that had been abandoned. It was also willing to let the app break intentionally. When we hit the point of removing Node and switching to importmap and the Tailwind CSS Rails gem, some JavaScript libraries that had been bundled through Node had no importmap-compatible equivalent. We left those broken and moved on. The goal was Rails 8.1 running; the UI work — which I’ve written about building without a JavaScript framework — would come later.

The SQLite migration — moving the schema and data from PostgreSQL — was another explicit task. GLM-5.2 handled it without needing much guidance: it understood why the switch made sense for a small app, scripted the migration, and flagged the few PostgreSQL-specific query patterns that needed adjustment — including pg_search multisearch, native Postgres enums, and uuid-ossp UUID generation, all of which needed SQLite-compatible replacements.

Planning the Postgres to SQLite migration — two timeouts during plan generation, both resumed with `continue`. After the second, the model wrote a 385-line migration plan to disk.

Executing the migration plan — Oh-My-Pi mid-run removing `PgSearch::Model` from `ApplicationRecord`, with the full task list visible and the model confirmed as GLM-5.2 on high reasoning.

The resulting pull request: 16 commits, 247 files changed, +8,947 / -5,460 lines. The commit history maps the full path — Ruby 4.0.0, each Rails hop, the asset pipeline migration, the Solid stack, and the final Kamal-ready cleanup.

What worked and what didn’t

The model’s knowledge of Rails was solid. It knew the version-specific changes, understood the reasoning behind them, and didn’t introduce unnecessary complexity. The smoke tests it ran to verify configurations caught real problems.

One moment stood out. When working on the Kamal configuration, the model needed Docker to build and test the image. Docker wasn’t running. Rather than stopping to ask, it looked for an alternative, found OrbStack, and started it — then continued with the build. I noticed OrbStack had opened on my machine, went back to the Oh-My-Pi terminal to see what had happened, and found the model had already moved on. That kind of environment awareness isn’t something I expected from an open-weight model running through a cloud API.

Two things pushed back against the experience. First, plan generation at length caused timeouts — Ollama’s session limits are measured in GPU time rather than tokens, and long planning steps sometimes hit those limits mid-response. The workaround was straightforward (type continue to resume), but it interrupted flow more than I’d like. Whether this is a limitation of the $20 plan’s quotas or something about how Oh-My-Pi handles long turns, I’m not sure. Second, every two plans or so I was hitting the GPU time ceiling and waiting a few hours for the quota to reset.

By the end, the application reached a point where I could build a Kamal image and run it. The UI is broken — expected, given the JavaScript libraries we left behind — but the upgrade itself is done. Rails 8.1, no Node, SQLite, the Solid suite in place.

What comes next

The next phase is where the evaluation gets more interesting. I’ll bring in Maquina Components to rewrite the views and replace the missing JavaScript functionality with Hotwire, and the Rails Simplifier to review the code as each section gets updated. That work involves conventions specific to how I build things — my own component library, Stimulus patterns, Turbo Streams. Whether GLM-5.2 can operate effectively inside that context is what I actually want to know.

So GLM-5.2 continues. The upgrade was the first test. The UI rebuild is the second.

I’m considering moving to Ollama’s higher tier for the next phase — the quota limits at $20/month are a real constraint for longer sessions, and I don’t want the infrastructure to become the limiting factor in what is supposed to be an evaluation of the model.