Blog — Iron Mind

Engineering

Why Should an LLM Write Code Against a Toolkit, Not Fill In a JSON Spec? Building an AI After Effects Edit Generator in 2026

We built a tool that turns raw clips into native, fully-editable After Effects compositions — by having an LLM write code against a toolkit instead of filling in a JSON edit-spec. Here's the architecture journey and the transferable lesson: when a model needs to express relationships, give it a code API, not a schema.

6 min read · 2026-06-14 Read →

Engineering

How Do You Make Claude Code Remember Your Preferences Between Sessions in 2026? We Built a Tool to Mine Your Own History

Claude Code forgets every correction when a session ends. We built claude-harness-tuner (now open source) to mine 1,300+ of our own sessions for the friction we kept re-creating — and route each learned preference to the right CLAUDE.md or KB file.

6 min read · 2026-06-07 Read →

Engineering

How Do You Filter Thousands of YouTube Videos Into a Clean AI Training Dataset in 2026 — Without Burning Download Budget?

We pre-screen on YouTube's free thumbnail CDN with gpt-4o-mini structured outputs before spending a single download token — filtering 1,833 music videos down to 500 training candidates at fractions of a cent each. The layered-filtering pipeline behind our Wan2.2-S2V lipsync fine-tune.

8 min read · 2026-06-02 Read →

Engineering

How Do You Build a Persona-Accurate AI Chatbot From Real Conversation Data in 2026?

Persona-accurate AI chatbots come from retrieval over real conversation history, not fine-tuning. The architecture: clean request-to-response rounds, signal-based filtering, confidence-aware behavioral extraction, and two-pass persona synthesis that keeps identity separate from learned behavior.

6 min read · 2026-05-30 Read →

Engineering

How Do You Cache an Expensive Multi-Step LLM Pipeline in 2026? A Two-Layer Semantic Caching Pattern

Exact-match caching barely dents a natural-language LLM pipeline. Here's the two-layer pattern we ship: L1 hash cache, L2 semantic cache guarded against false merges, plus a write-side dedup library that builds a concept graph for free.

6 min read · 2026-05-27 Read →

Engineering

Why Does Our AI Agent Work in Staging but Degrade in Production After a Few Days in 2026?

AI agents that pass eval suites still rot after go-live. The four structural failure modes — tool-call error accumulation, context-window bloat, prompt drift, and rate-limit back-pressure — and the unglamorous architectural fixes that prevent decay.

6 min read · 2026-05-20 Read →

Engineering

Prompt Caching: How We Cut Claude API Costs by 90% in Production

Prompt caching cuts input token costs by ~90% and trims latency by up to 80% on cached prefixes.

8 min read · 2026-05-17 Read →

Engineering

LLM Evals: The Test Suite Pattern That Catches Production Regressions Before Users Do

How we structure LLM eval suites that actually catch prompt drift, model regressions, and silent quality decay - the four grader types, the dataset split, and the CI integration that ships.

6 min read · 2026-05-17 Read →

Engineering

Building a Health Knowledge Base Pipeline: Discovery, URL Patterns, and Normalization

How we turned a 3,837-row CSV of health events into a normalized SQLite reference database covering six public health providers and 162,568 official medical codes — by reverse-engineering the providers' URL patterns instead of trusting the input.

6 min read · 2026-04-29 Read →

Engineering

Don't Trust API Docs, Trust Shipping Code: Reverse Engineering Undocumented APIs

When integrating with an undocumented API, the README is the worst place to find ground truth. How we burned hours on a dead LinkedIn Voyager endpoint, why hidden-tab DOM automation fails on modern Chrome, and the three rules we now follow on every reverse-engineering project.

6 min read · 2026-04-24 Read →

Engineering

How to Make an App in 2026: The AI-Native Stack We Actually Ship With

The 2026 stack we ship every new POC and MVP on: edge functions, React Server Components, passkey auth, Claude MCP for AI features, and AI sub-agents for ops. Cuts MVP delivery time by 30-50%.

8 min read · 2026-04-17 Read →

Engineering

Why We Don't Use Vector Search for Our AI's Knowledge Base

For a curated KB under ~500 entries, a hand-written lean index outperforms RAG, embeddings, and re-ranking. The LLM is the retriever — and it reads English better than any embedding model reads vectors.

6 min read · 2026-04-16 Read →

Engineering

RAG Systems Without the Hype: What Actually Works in Production

Retrieval-Augmented Generation is powerful when built right. Most implementations fail at the same three points.

6 min read · 2026-04-15 Read →

Engineering

Large-Scale Web Scraping: How We Built an On-Demand Proxy Fleet to Collect 1.1M Records

When Akamai blocked our fixed proxy pool, we used the Linode API to spin up 37 disposable VMs as a fresh proxy fleet — and scraped 1.1 million records from a bot-protected government portal overnight.

6 min read · 2026-04-10 Read →

Engineering

How Negative Constraints Fixed Our Multi-Step LLM Video Pipeline

Sequential LLM calls converge on the same output. A global shot plan with prohibited state changes — telling each step what it cannot do — turned disconnected segments into coherent visual narratives.

6 min read · 2026-03-29 Read →

Engineering

LLM Code Development: The Team Workflow That Actually Ships Production Software

Most LLM coding advice is written by solo developers. Here's what actually works when you need AI-generated code to survive production traffic, team reviews, and real deadlines.

8 min read · 2026-03-28 Read →

Engineering

Why Animated WebP Breaks on iOS Safari (And What Actually Works)

Animated WebP looks perfect in Chrome — then breaks on iPhones. We cover the iOS Safari alpha transparency gap, the video loop bug, and the frame-level pipeline that fixes both.

6 min read · 2026-03-27 Read →

Engineering

The HTML Email Problem: When SaaS Receipts Break Your Expense Automation

Modern SaaS vendors send receipts as HTML emails, not PDF attachments. Here's how we solved the silent failure mode in AI-powered expense detection using vision LLMs and HTML-to-image rendering.

6 min read · 2026-03-26 Read →

Engineering

Claude MCP Explained: Architecture, Production Patterns, and Hard-Won Lessons

Claude MCP (Model Context Protocol) is Anthropic's open standard for connecting AI to external tools and data. Here's how the architecture actually works, the three primitives every builder should understand, and the production patterns we've learned after building over a dozen MCP-powered systems.

6 min read · 2026-03-25 Read →

Engineering

Claude Code Memory: The Context-Aware KB Cascade That Eliminated Our Context Bloat

How we built a two-tier lazy-loading knowledge base system that lets AI agents self-select relevant context on demand — cutting instruction overhead by 75%.

6 min read · 2026-03-25 Read →

Engineering

How We Built a Genetic Algorithm for SEO Keyword Research Using Google Trends and LLM Mutations

A genetic algorithm that evolves SEO keywords using real Google Trends data, anchor-based normalization, and LLM mutations grounded in Google's own related queries. Built for real-time keyword discovery with momentum scoring.

8 min read · 2026-03-23 Read →

Engineering

How to Send Telegram Notifications When a Contact Form Is Submitted (Flask)

A practical pattern for getting instant Telegram alerts on contact form submissions — split into 4 separate messages so every field is tap-to-copy on mobile.

6 min read · 2026-03-22 Read →

Engineering

How YOLO-World Replaced Five Classical Face Detectors in Our ComfyUI Custom Node

Classical face detectors fail on anime, 3D, and stylized content. We replaced YuNet, Haar Cascades, MediaPipe, RetinaFace, and a YOLOv8 anime model with a single YOLO-World text-prompted detector that handles every visual style.

6 min read · 2026-03-22 Read →

Engineering

Prompt Engineering as Semantic Contracts: Fixing Silent Failures in Multi-Step LLM Pipelines

Most LLM pipeline bugs aren't model failures — they're underspecified contracts. Here's the prompt architecture pattern we built to eliminate an entire class of silent failures.

6 min read · 2026-03-22 Read →

Engineering

The Hardest Part of Building Scripto: Teaching a Machine to Read Student Handwriting

Building a GCSE dictation app sounds simple — generate a sentence, read it aloud, photograph the student's handwriting, mark it. The last step turned out to be the hardest engineering problem we've solved.

5 min read · 2026-03-21 Read →

Engineering

Why We Generate Audio First in AI Video Pipelines (And Why You Should Too)

AI video models don't accept target durations. AI audio models don't either. But audio can be precisely measured after generation using word-level timestamps.

6 min read · 2026-02-08 Read →

Engineering

Why We Stopped Using Images to Generate AI Music Videos (And What We Use Instead)

We abandoned image-to-video pipelines for AI music video generation and switched to pure text-to-video with Seedance 2.0. Here is why forensic text prompting beats reference images for multi-clip coherence.

8 min read · 2025-12-15 Read →

Engineering

Why the AI Model You Pick Barely Matters (And What Actually Does)

Teams obsess over model benchmarks when the real leverage is in the engineering layer. Structured outputs, fallback chains, and model-agnostic architecture matter more than which LLM you pick.

6 min read · 2025-10-20 Read →

Engineering

Build Software 10× Faster: AI-Accelerated Engineering Explained

10× faster sounds impossible. Here's exactly how we do it. Discover how AI-accelerated engineering eliminates waste, not quality, and delivers software in weeks instead of months.

10 min read · 2025-10-03 Read →

Engineering

Build vs Buy Software: The 2025 Decision Framework

Spent $50k on SaaS tools that don't quite fit? Learn when to build custom software vs buy off-the-shelf solutions with our 2025 decision framework.

7 min read · 2025-10-03 Read →

Engineering

The Ironmind Process: How We Build Software 10× Faster

Most dev shops drown in process. We engineered ours out. Learn how we deliver software in weeks through AI-accelerated engineering without the waste.

7 min read · 2025-10-03 Read →

Engineering

Traditional Dev Shop vs AI-Augmented Team: Real Cost Breakdown

Got quoted $120k from Agency A, $35k from Ironmind? Here's exactly where the difference comes from — and why cheaper doesn't mean lower quality.

10 min read · 2025-10-03 Read →

Engineering

What Projects Are Best for AI-Accelerated Engineering?

Not every project needs AI-acceleration. Learn when it's perfect (MVPs, prototypes, automations) and when traditional development is better.

7 min read · 2025-10-03 Read →

Engineering

When to Hire a Dev Agency vs Freelancer vs In-House

Freelancer quoted $12k and 4 months. Agency quoted $80k and 6 months. Learn when to hire an agency, freelancer, or in-house team.

7 min read · 2025-10-03 Read →

Engineering

Claude MCP Browser Automation: How We Cut Token Costs by 95% With Accessibility Trees

AI browser automation burns tokens fast -- 125,000+ per page interaction. By replacing raw HTML with accessibility trees, natural language element finding, and a reference ID system, we cut costs by 95%.

8 min read · 2025-10-01 Read →

Engineering

11 Specialized AI Sub-Agents That Power Our Engineering Workflow

We don't use one general-purpose AI. We built a crew of 11 specialized Claude sub-agents — each owning a domain of our dev workflow — with parallel execution, shared contracts, and scoped tool access.

8 min read · 2025-08-20 Read →

Engineering

Why a State Machine Beats a Task Queue for Multi-Stage AI Pipelines

Task queues like Celery and RQ were built for short, independent jobs. Multi-stage AI pipelines need crash recovery, human-readable state, and cancellation safety.

6 min read · 2025-06-17 Read →

Engineering

How We Cut Gallery Bandwidth 98% with imgproxy

How we used imgproxy in front of MinIO to serve resized, WebP-converted thumbnails for AI-generated image galleries — cutting gallery bandwidth by 98%.

5 min read · 2025-06-15 Read →

Engineering

How to Build a Production LinkedIn Profile Scraper with Python and the Voyager API

A deep dive into scraping LinkedIn profiles using the internal Voyager API, SOCKS5 proxy rotation, warm/cold path architecture, and session persistence for sub-second profile imports.

8 min read · 2025-06-13 Read →

Engineering

How to Sync AI Voiceover, Music, and Video Using Word-Level Timestamps

AI-generated voiceover, music, and video each run at their own arbitrary length. Word-level timestamps from transcription models give you millisecond-accurate anchor points.

6 min read · 2025-06-10 Read →

Engineering

Healthcare Data Extraction: How We Found a Hidden API Behind a Provider Portal

When a major insurer's provider directory had no public API, we used headless browser traffic interception to discover it was powered by Algolia — and extracted 6.2 million provider records in hours, not weeks.

6 min read · 2025-05-27 Read →

Engineering

How We Built an Automated PR Outreach Scraper for Music Industry Contacts

A four-phase Python pipeline that searches 10+ European languages, crawls results with a headless browser, and extracts scored PR contacts for music industry outreach.

6 min read · 2025-05-15 Read →

Engineering Notes

Why Should an LLM Write Code Against a Toolkit, Not Fill In a JSON Spec? Building an AI After Effects Edit Generator in 2026

How Do You Make Claude Code Remember Your Preferences Between Sessions in 2026? We Built a Tool to Mine Your Own History

How Do You Filter Thousands of YouTube Videos Into a Clean AI Training Dataset in 2026 — Without Burning Download Budget?

How Do You Build a Persona-Accurate AI Chatbot From Real Conversation Data in 2026?

How Do You Cache an Expensive Multi-Step LLM Pipeline in 2026? A Two-Layer Semantic Caching Pattern

Why Does Our AI Agent Work in Staging but Degrade in Production After a Few Days in 2026?

Prompt Caching: How We Cut Claude API Costs by 90% in Production

LLM Evals: The Test Suite Pattern That Catches Production Regressions Before Users Do

Building a Health Knowledge Base Pipeline: Discovery, URL Patterns, and Normalization

Don't Trust API Docs, Trust Shipping Code: Reverse Engineering Undocumented APIs

How to Make an App in 2026: The AI-Native Stack We Actually Ship With

Why We Don't Use Vector Search for Our AI's Knowledge Base

RAG Systems Without the Hype: What Actually Works in Production

Large-Scale Web Scraping: How We Built an On-Demand Proxy Fleet to Collect 1.1M Records

How Negative Constraints Fixed Our Multi-Step LLM Video Pipeline

LLM Code Development: The Team Workflow That Actually Ships Production Software

Why Animated WebP Breaks on iOS Safari (And What Actually Works)

The HTML Email Problem: When SaaS Receipts Break Your Expense Automation

Claude MCP Explained: Architecture, Production Patterns, and Hard-Won Lessons

Claude Code Memory: The Context-Aware KB Cascade That Eliminated Our Context Bloat

How We Built a Genetic Algorithm for SEO Keyword Research Using Google Trends and LLM Mutations

How to Send Telegram Notifications When a Contact Form Is Submitted (Flask)

How YOLO-World Replaced Five Classical Face Detectors in Our ComfyUI Custom Node

Prompt Engineering as Semantic Contracts: Fixing Silent Failures in Multi-Step LLM Pipelines

The Hardest Part of Building Scripto: Teaching a Machine to Read Student Handwriting

Why We Generate Audio First in AI Video Pipelines (And Why You Should Too)

Why We Stopped Using Images to Generate AI Music Videos (And What We Use Instead)

Why the AI Model You Pick Barely Matters (And What Actually Does)

Build Software 10× Faster: AI-Accelerated Engineering Explained

Build vs Buy Software: The 2025 Decision Framework

The Ironmind Process: How We Build Software 10× Faster

Traditional Dev Shop vs AI-Augmented Team: Real Cost Breakdown

What Projects Are Best for AI-Accelerated Engineering?

When to Hire a Dev Agency vs Freelancer vs In-House

Claude MCP Browser Automation: How We Cut Token Costs by 95% With Accessibility Trees

11 Specialized AI Sub-Agents That Power Our Engineering Workflow

Why a State Machine Beats a Task Queue for Multi-Stage AI Pipelines

How We Cut Gallery Bandwidth 98% with imgproxy

How to Build a Production LinkedIn Profile Scraper with Python and the Voyager API

How to Sync AI Voiceover, Music, and Video Using Word-Level Timestamps

Healthcare Data Extraction: How We Found a Hidden API Behind a Provider Portal

How We Built an Automated PR Outreach Scraper for Music Industry Contacts