Engineering Notes

Thinking on AI systems, architecture, and the craft of building things that work.

Engineering

How to Make an App in 2026: The AI-Native Stack We Actually Ship With

The 2026 stack we ship every new POC and MVP on: edge functions, React Server Components, passkey auth, Claude MCP for AI features, and AI sub-agents for ops. Cuts MVP delivery time by 30-50%.

8 min read · 2026-04-17 Read →
Engineering

Why We Don't Use Vector Search for Our AI's Knowledge Base

For a curated KB under ~500 entries, a hand-written lean index outperforms RAG, embeddings, and re-ranking. The LLM is the retriever — and it reads English better than any embedding model reads vectors.

6 min read · 2026-04-16 Read →
Engineering

Large-Scale Web Scraping: How We Built an On-Demand Proxy Fleet to Collect 1.1M Records

When Akamai blocked our fixed proxy pool, we used the Linode API to spin up 37 disposable VMs as a fresh proxy fleet — and scraped 1.1 million records from a bot-protected government portal overnight.

6 min read · 2026-04-10 Read →
Engineering

How Negative Constraints Fixed Our Multi-Step LLM Video Pipeline

Sequential LLM calls converge on the same output. A global shot plan with prohibited state changes — telling each step what it cannot do — turned disconnected segments into coherent visual narratives.

6 min read · 2026-03-29 Read →
Engineering

LLM Code Development: The Team Workflow That Actually Ships Production Software

Most LLM coding advice is written by solo developers. Here's what actually works when you need AI-generated code to survive production traffic, team reviews, and real deadlines.

8 min read · 2026-03-28 Read →
Engineering

Why Animated WebP Breaks on iOS Safari (And What Actually Works)

Animated WebP looks perfect in Chrome — then breaks on iPhones. We cover the iOS Safari alpha transparency gap, the video loop bug, and the frame-level pipeline that fixes both.

6 min read · 2026-03-27 Read →
Engineering

The HTML Email Problem: When SaaS Receipts Break Your Expense Automation

Modern SaaS vendors send receipts as HTML emails, not PDF attachments. Here's how we solved the silent failure mode in AI-powered expense detection using vision LLMs and HTML-to-image rendering.

6 min read · 2026-03-26 Read →
Engineering

Claude MCP Explained: Architecture, Production Patterns, and Hard-Won Lessons

Claude MCP (Model Context Protocol) is Anthropic's open standard for connecting AI to external tools and data. Here's how the architecture actually works, the three primitives every builder should understand, and the production patterns we've learned after building over a dozen MCP-powered systems.

6 min read · 2026-03-25 Read →
Engineering

Claude Code Memory: The Context-Aware KB Cascade That Eliminated Our Context Bloat

How we built a two-tier lazy-loading knowledge base system that lets AI agents self-select relevant context on demand — cutting instruction overhead by 75%.

6 min read · 2026-03-25 Read →
Engineering

How We Built a Genetic Algorithm for SEO Keyword Research Using Google Trends and LLM Mutations

A genetic algorithm that evolves SEO keywords using real Google Trends data, anchor-based normalization, and LLM mutations grounded in Google's own related queries. Built for real-time keyword discovery with momentum scoring.

8 min read · 2026-03-23 Read →
Engineering

How to Send Telegram Notifications When a Contact Form Is Submitted (Flask)

A practical pattern for getting instant Telegram alerts on contact form submissions — split into 4 separate messages so every field is tap-to-copy on mobile.

6 min read · 2026-03-22 Read →
Engineering

How YOLO-World Replaced Five Classical Face Detectors in Our ComfyUI Custom Node

Classical face detectors fail on anime, 3D, and stylized content. We replaced YuNet, Haar Cascades, MediaPipe, RetinaFace, and a YOLOv8 anime model with a single YOLO-World text-prompted detector that handles every visual style.

6 min read · 2026-03-22 Read →
Engineering

Prompt Engineering as Semantic Contracts: Fixing Silent Failures in Multi-Step LLM Pipelines

Most LLM pipeline bugs aren't model failures — they're underspecified contracts. Here's the prompt architecture pattern we built to eliminate an entire class of silent failures.

6 min read · 2026-03-22 Read →
Engineering

The Hardest Part of Building Scripto: Teaching a Machine to Read Student Handwriting

Building a GCSE dictation app sounds simple — generate a sentence, read it aloud, photograph the student's handwriting, mark it. The last step turned out to be the hardest engineering problem we've solved.

5 min read · 2026-03-21 Read →
Engineering

Why We Generate Audio First in AI Video Pipelines (And Why You Should Too)

AI video models don't accept target durations. AI audio models don't either. But audio can be precisely measured after generation using word-level timestamps.

6 min read · 2026-02-08 Read →
Engineering

Why We Stopped Using Images to Generate AI Music Videos (And What We Use Instead)

We abandoned image-to-video pipelines for AI music video generation and switched to pure text-to-video with Seedance 2.0. Here is why forensic text prompting beats reference images for multi-clip coherence.

8 min read · 2025-12-15 Read →
Engineering

Why the AI Model You Pick Barely Matters (And What Actually Does)

Teams obsess over model benchmarks when the real leverage is in the engineering layer. Structured outputs, fallback chains, and model-agnostic architecture matter more than which LLM you pick.

6 min read · 2025-10-20 Read →
Engineering

Build Software 10× Faster: AI-Accelerated Engineering Explained

10× faster sounds impossible. Here's exactly how we do it. Discover how AI-accelerated engineering eliminates waste, not quality, and delivers software in weeks instead of months.

10 min read · 2025-10-03 Read →
Engineering

Build vs Buy Software: The 2025 Decision Framework

Spent $50k on SaaS tools that don't quite fit? Learn when to build custom software vs buy off-the-shelf solutions with our 2025 decision framework.

7 min read · 2025-10-03 Read →
Engineering

The Ironmind Process: How We Build Software 10× Faster

Most dev shops drown in process. We engineered ours out. Learn how we deliver software in weeks through AI-accelerated engineering without the waste.

7 min read · 2025-10-03 Read →
Engineering

Traditional Dev Shop vs AI-Augmented Team: Real Cost Breakdown

Got quoted $120k from Agency A, $35k from Ironmind? Here's exactly where the difference comes from — and why cheaper doesn't mean lower quality.

10 min read · 2025-10-03 Read →
Engineering

What Projects Are Best for AI-Accelerated Engineering?

Not every project needs AI-acceleration. Learn when it's perfect (MVPs, prototypes, automations) and when traditional development is better.

7 min read · 2025-10-03 Read →
Engineering

When to Hire a Dev Agency vs Freelancer vs In-House

Freelancer quoted $12k and 4 months. Agency quoted $80k and 6 months. Learn when to hire an agency, freelancer, or in-house team.

7 min read · 2025-10-03 Read →
Engineering

Claude MCP Browser Automation: How We Cut Token Costs by 95% With Accessibility Trees

AI browser automation burns tokens fast -- 125,000+ per page interaction. By replacing raw HTML with accessibility trees, natural language element finding, and a reference ID system, we cut costs by 95%.

8 min read · 2025-10-01 Read →
Engineering

11 Specialized AI Sub-Agents That Power Our Engineering Workflow

We don't use one general-purpose AI. We built a crew of 11 specialized Claude sub-agents — each owning a domain of our dev workflow — with parallel execution, shared contracts, and scoped tool access.

8 min read · 2025-08-20 Read →
Engineering

Why a State Machine Beats a Task Queue for Multi-Stage AI Pipelines

Task queues like Celery and RQ were built for short, independent jobs. Multi-stage AI pipelines need crash recovery, human-readable state, and cancellation safety.

6 min read · 2025-06-17 Read →
Engineering

How We Cut Gallery Bandwidth 98% with imgproxy

How we used imgproxy in front of MinIO to serve resized, WebP-converted thumbnails for AI-generated image galleries — cutting gallery bandwidth by 98%.

5 min read · 2025-06-15 Read →
Engineering

How to Build a Production LinkedIn Profile Scraper with Python and the Voyager API

A deep dive into scraping LinkedIn profiles using the internal Voyager API, SOCKS5 proxy rotation, warm/cold path architecture, and session persistence for sub-second profile imports.

8 min read · 2025-06-13 Read →
Engineering

How to Sync AI Voiceover, Music, and Video Using Word-Level Timestamps

AI-generated voiceover, music, and video each run at their own arbitrary length. Word-level timestamps from transcription models give you millisecond-accurate anchor points.

6 min read · 2025-06-10 Read →
Engineering

Healthcare Data Extraction: How We Found a Hidden API Behind a Provider Portal

When a major insurer's provider directory had no public API, we used headless browser traffic interception to discover it was powered by Algolia — and extracted 6.2 million provider records in hours, not weeks.

6 min read · 2025-05-27 Read →
Engineering

How We Built an Automated PR Outreach Scraper for Music Industry Contacts

A four-phase Python pipeline that searches 10+ European languages, crawls results with a headless browser, and extracts scored PR contacts for music industry outreach.

6 min read · 2025-05-15 Read →
Engineering

RAG Systems Without the Hype: What Actually Works in Production

Retrieval-Augmented Generation is powerful when built right. Most implementations fail at the same three points.

8 min read · 2025-03-05 Read →