Architectural-Aware Performance Optimization: From The Foundational Math Library To Cutting-Edge Applications