Day 8 | nikki

Watched: MIT, Anthropic, and New Benchmarks Just Revealed AI's Biggest Coding Limits (from ~3:26).

Research from MIT and Anthropic highlights where AI coding still falls short: benchmarks like SWE-Bench can be gamed, and models that score well often fail on other languages or real-world tasks. Automatic evaluation tends to overestimate performance—agents produce code that passes tests but has issues with formatting, linting, or test coverage that humans would catch. So despite bold industry claims, actual coding limits and productivity gains are still unclear.

Today, I:

read DDIA Chapter 6: Partitioning — partition strategies (key range vs hash), secondary indexes (local vs global), rebalancing, and request routing
added collections to this website to better organize some parts

Today, I:

By Tony Duong