Background
The development team reported that the test environment page wasn't updating. Jenkins build logs showed successful packaging, but the browser still displayed the old version.
Troubleshooting Process
After receiving the task, the Agent followed the full-chain troubleshooting process to check each layer:
- Origin Verification โ SSH into EC2, compared file md5, confirmed the origin files were already the latest version
- CloudFront Cache โ Called CloudFront API to create an Invalidation, refreshed the page after completion, found that HTML was updated but CSS was still old
- Deep Analysis โ Inspected HTML source code, discovered that CSS references weren't using the CloudFront domain, but cdn-x.example.comโa third-party CDN
- Root Cause Identification โ This CDN was previously integrated to accelerate image loading. Later, the frontend team also put CSS there, but the operations team wasn't aware
- Cache Purge โ Cleared the corresponding path cache through the third-party CDN's API
Results
From receiving the task to resolving the issue took a total of 30 seconds. If done manually, just discovering that CSS came from another CDN could have taken 20 minutes.
Technical Points
- Full-chain troubleshooting can't just look at the primary CDN; you need to check the sources of all external resources in the page
- The Agent automatically parses resource references in HTML and compares cache states across all layers
- Recommend adding cache purge steps to CI/CD pipelines, covering all CDNs โ ClawNOC Operations Agent Practice Notes