Engineering Notes
Thinking on AI systems, architecture, and the craft of building things that work.
How to Make an App in 2026: The AI-Native Stack We Actually Ship With
The 2026 stack we ship every new POC and MVP on: edge functions, React Server Components, passkey auth, Claude MCP for AI features, and AI sub-agents for ops. Cuts MVP delivery time by 30-50%.
EngineeringWhy We Don't Use Vector Search for Our AI's Knowledge Base
For a curated KB under ~500 entries, a hand-written lean index outperforms RAG, embeddings, and re-ranking. The LLM is the retriever — and it reads English better than any embedding model reads vectors.
EngineeringLarge-Scale Web Scraping: How We Built an On-Demand Proxy Fleet to Collect 1.1M Records
When Akamai blocked our fixed proxy pool, we used the Linode API to spin up 37 disposable VMs as a fresh proxy fleet — and scraped 1.1 million records from a bot-protected government portal overnight.
EngineeringHow Negative Constraints Fixed Our Multi-Step LLM Video Pipeline
Sequential LLM calls converge on the same output. A global shot plan with prohibited state changes — telling each step what it cannot do — turned disconnected segments into coherent visual narratives.
EngineeringLLM Code Development: The Team Workflow That Actually Ships Production Software
Most LLM coding advice is written by solo developers. Here's what actually works when you need AI-generated code to survive production traffic, team reviews, and real deadlines.
EngineeringWhy Animated WebP Breaks on iOS Safari (And What Actually Works)
Animated WebP looks perfect in Chrome — then breaks on iPhones. We cover the iOS Safari alpha transparency gap, the video loop bug, and the frame-level pipeline that fixes both.
EngineeringThe HTML Email Problem: When SaaS Receipts Break Your Expense Automation
Modern SaaS vendors send receipts as HTML emails, not PDF attachments. Here's how we solved the silent failure mode in AI-powered expense detection using vision LLMs and HTML-to-image rendering.
EngineeringClaude MCP Explained: Architecture, Production Patterns, and Hard-Won Lessons
Claude MCP (Model Context Protocol) is Anthropic's open standard for connecting AI to external tools and data. Here's how the architecture actually works, the three primitives every builder should understand, and the production patterns we've learned after building over a dozen MCP-powered systems.
EngineeringClaude Code Memory: The Context-Aware KB Cascade That Eliminated Our Context Bloat
How we built a two-tier lazy-loading knowledge base system that lets AI agents self-select relevant context on demand — cutting instruction overhead by 75%.
EngineeringHow We Built a Genetic Algorithm for SEO Keyword Research Using Google Trends and LLM Mutations
A genetic algorithm that evolves SEO keywords using real Google Trends data, anchor-based normalization, and LLM mutations grounded in Google's own related queries. Built for real-time keyword discovery with momentum scoring.
EngineeringHow to Send Telegram Notifications When a Contact Form Is Submitted (Flask)
A practical pattern for getting instant Telegram alerts on contact form submissions — split into 4 separate messages so every field is tap-to-copy on mobile.
EngineeringHow YOLO-World Replaced Five Classical Face Detectors in Our ComfyUI Custom Node
Classical face detectors fail on anime, 3D, and stylized content. We replaced YuNet, Haar Cascades, MediaPipe, RetinaFace, and a YOLOv8 anime model with a single YOLO-World text-prompted detector that handles every visual style.
EngineeringPrompt Engineering as Semantic Contracts: Fixing Silent Failures in Multi-Step LLM Pipelines
Most LLM pipeline bugs aren't model failures — they're underspecified contracts. Here's the prompt architecture pattern we built to eliminate an entire class of silent failures.
EngineeringThe Hardest Part of Building Scripto: Teaching a Machine to Read Student Handwriting
Building a GCSE dictation app sounds simple — generate a sentence, read it aloud, photograph the student's handwriting, mark it. The last step turned out to be the hardest engineering problem we've solved.
EngineeringWhy We Generate Audio First in AI Video Pipelines (And Why You Should Too)
AI video models don't accept target durations. AI audio models don't either. But audio can be precisely measured after generation using word-level timestamps.
EngineeringWhy We Stopped Using Images to Generate AI Music Videos (And What We Use Instead)
We abandoned image-to-video pipelines for AI music video generation and switched to pure text-to-video with Seedance 2.0. Here is why forensic text prompting beats reference images for multi-clip coherence.
EngineeringWhy the AI Model You Pick Barely Matters (And What Actually Does)
Teams obsess over model benchmarks when the real leverage is in the engineering layer. Structured outputs, fallback chains, and model-agnostic architecture matter more than which LLM you pick.
EngineeringBuild Software 10× Faster: AI-Accelerated Engineering Explained
10× faster sounds impossible. Here's exactly how we do it. Discover how AI-accelerated engineering eliminates waste, not quality, and delivers software in weeks instead of months.
EngineeringBuild vs Buy Software: The 2025 Decision Framework
Spent $50k on SaaS tools that don't quite fit? Learn when to build custom software vs buy off-the-shelf solutions with our 2025 decision framework.
EngineeringThe Ironmind Process: How We Build Software 10× Faster
Most dev shops drown in process. We engineered ours out. Learn how we deliver software in weeks through AI-accelerated engineering without the waste.
EngineeringTraditional Dev Shop vs AI-Augmented Team: Real Cost Breakdown
Got quoted $120k from Agency A, $35k from Ironmind? Here's exactly where the difference comes from — and why cheaper doesn't mean lower quality.
EngineeringWhat Projects Are Best for AI-Accelerated Engineering?
Not every project needs AI-acceleration. Learn when it's perfect (MVPs, prototypes, automations) and when traditional development is better.
EngineeringWhen to Hire a Dev Agency vs Freelancer vs In-House
Freelancer quoted $12k and 4 months. Agency quoted $80k and 6 months. Learn when to hire an agency, freelancer, or in-house team.
EngineeringClaude MCP Browser Automation: How We Cut Token Costs by 95% With Accessibility Trees
AI browser automation burns tokens fast -- 125,000+ per page interaction. By replacing raw HTML with accessibility trees, natural language element finding, and a reference ID system, we cut costs by 95%.
Engineering11 Specialized AI Sub-Agents That Power Our Engineering Workflow
We don't use one general-purpose AI. We built a crew of 11 specialized Claude sub-agents — each owning a domain of our dev workflow — with parallel execution, shared contracts, and scoped tool access.
EngineeringWhy a State Machine Beats a Task Queue for Multi-Stage AI Pipelines
Task queues like Celery and RQ were built for short, independent jobs. Multi-stage AI pipelines need crash recovery, human-readable state, and cancellation safety.
EngineeringHow We Cut Gallery Bandwidth 98% with imgproxy
How we used imgproxy in front of MinIO to serve resized, WebP-converted thumbnails for AI-generated image galleries — cutting gallery bandwidth by 98%.
EngineeringHow to Build a Production LinkedIn Profile Scraper with Python and the Voyager API
A deep dive into scraping LinkedIn profiles using the internal Voyager API, SOCKS5 proxy rotation, warm/cold path architecture, and session persistence for sub-second profile imports.
EngineeringHow to Sync AI Voiceover, Music, and Video Using Word-Level Timestamps
AI-generated voiceover, music, and video each run at their own arbitrary length. Word-level timestamps from transcription models give you millisecond-accurate anchor points.
EngineeringHealthcare Data Extraction: How We Found a Hidden API Behind a Provider Portal
When a major insurer's provider directory had no public API, we used headless browser traffic interception to discover it was powered by Algolia — and extracted 6.2 million provider records in hours, not weeks.
EngineeringHow We Built an Automated PR Outreach Scraper for Music Industry Contacts
A four-phase Python pipeline that searches 10+ European languages, crawls results with a headless browser, and extracts scored PR contacts for music industry outreach.
EngineeringRAG Systems Without the Hype: What Actually Works in Production
Retrieval-Augmented Generation is powerful when built right. Most implementations fail at the same three points.