been sitting on this for like three weeks because the flight time seemed like the only window i had to actually ship it. built a small tool that lets you see where your cache hits are landing on claude 4.7 and 4.6, because that drop everyone's been seeing in hit rate has been driving me insane. turns out most people just aren't logging the cache_creation_input_tokens field properly, or they're changing their system prompts slightly between calls and nuking the whole cache. the tool ingests your api logs, groups by prompt hash, and shows you the delta. it's nothing fancy but it saves you from having to manually parse json logs at 30,000 feet like some kind of animal. it's up on the usual place if you want to grab it, free tier covers like 5000 api calls worth of logs. made it because i was losing money on redundant cache rebuilds and got tired of explaining to other devs why their cache hit rate was actually their own fault. honestly the bigger issue is that people see cache as a free lunch and don't think about the lifecycle of their prompts at all. you change one word in your system message and you're starting from zero again, which is something the docs could be clearer about but whatever. if you're trying to optimize your claude spend and you're not tracking where your cache actually hits, you're leaving money on the table.
Loading...