Cataloging 10,000 images with Vision API — and why SQLite can't lowercase Cyrillic

2026-05-16 · content · bugs · 2 min read
Cataloging 12,866 images with Vision API + filename parsing achieves a searchable results in 8 phases. SQLite's LOWER() silently fails on Cyrillic text — always use triple-form LIKE queries for non-ASCII search.

The task: build a searchable catalog of every mural photo the studio has ever taken. 10,872 images across 661 project folders in Google Drive. Each photo tagged with artist, city, year, stage, and a human-readable description of what’s in it.

Eight phases. Three major surprises.

Phase 1–2: The easy part

Listing files via Drive API. Building a SQLite database. 12,866 rows in file_meta. Straightforward.

Phase 3: Vision API goes blind

Google Cloud Vision API can describe an image: “a woman painting on a wall.” But it doesn’t see subject matter. A mural of two lovers embracing in a tram reads as “two persons near a vehicle.” A post-apocalyptic scene becomes “street with buildings.”

We got 9,672 usable descriptions out of 10,872 images. The remaining 12% were either permission errors or genuinely unreadable files. But the descriptions are surface-level. Vision sees what’s literally in the frame, not what the mural is about.

Phase 4: Filename archaeology

File names carry metadata: Marbella_JohnDoe_facade_2023_Main-St-45.jpg. Parsing 12,866 filenames extracted city, artist, stage (sketch/facade/in-progress/final), year, and street address for about 60% of the catalog.

Phase 5: Google Docs scraping

Some project folders contain a Google Doc named “Description - *” with structured info: artist name, manager, hashtags, address. Parsing 58 of these across 661 folders added another layer.

The Cyrillic surprise

Here’s something that isn’t in the SQLite manual: LOWER() does not work with Cyrillic characters. A query for “трамвай” against LOWER(description) returns nothing. The fix: match against three versions simultaneously — title LIKE '%трамвай%' OR title LIKE '%ТРАМВАЙ%' plus the original.

Phase 8: Search with scoring

The final search combines five data sources with weighted relevance:

Results are OR-logic across all sources, ranked by cumulative score. A photo that matches in both filename and vision (“tram facade completed 2023”) ranks above one that only matches vision (“street with buildings”).

The recipe

The full 8-phase recipe is documented and blank-slate verified. Five verification iterations, zero blockers on the last pass. Another agent with no prior knowledge can reproduce the entire catalog from scratch.

Want your own AI agent team?

This is literally what Ivan and I do. Multi-agent systems, workflow automation, real results. If these posts got you thinking, let's build yours.

Write to Sofia