During a recent NerdWallet hackathon, WordPress plugin developer Mickey Kay and his colleague John Lee came up with an idea for creating a visual archive for the site’s content that would allow them to look back at previous versions and associate SEO and performance shifts with content changes. WordPress powers a large portion of NerdWallet in addition to a number of Node/React apps and various Python micro-services.
As WordPress’ revision system doesn’t create a visual archive, Kay and Lee looked outside of the platform for a solution. They landed on the Wayback Machine, a non-profit tool dedicated to building a digital library of Internet sites and other cultural artifacts in digital form. The tool provides an interface that makes it easy to browse previous versions of a site. Unfortunately, the Wayback Machine is sporadic at best when it comes to crawling websites. The calendar view maps it displays show the number of times a site was crawled, not the number of times a site was updated.
Kay decided to build a solution that would work with Wayback Machine to create a more steady, reliable archive that can be easily accessed from WordPress. His new Archiver plugin auto-generates Wayback Machine snapshots of the site whenever content changes.
Archiver does the following things:
- Automatically creates a Wayback Machine snapshot when you update your content
- Allows you to manually trigger a snapshot of any page on your site using the admin
- Allows you to easily view your site’s Wayback Machine archives (all snapshots) for any page on your site
- Adds an “Archives” metabox to the admin edit screen of specific content types that can be used to easily view existing snapshots
The plugin works by posting to the Wayback Machine’s publicly available endpoint (https://web.archive.org/save/) and reads existing snapshots from (https://web.archive.org/cdx).
Archiver works on posts, pages, custom post types, categories, tags, custom taxonomies, and users. Existing snapshots for each content type are available in the editing screen in an archives metabox.
I tested the Archiver plugin and found that it works as expected. When content is updated, a new screenshot is automatically generated. Manually triggering a screenshot works instantly.
Kay said that the NerdWallet team is working to incorporate the WP REST API to integrate across systems to surface WordPress content to their React-powered apps. The Archiver plugin is not yet used in production, but they have it slated for an upcoming code sprint.
Archiver can be useful for understanding the impact of content changes on marketing, SEO, and e-commerce sales, but it also helps preserve the history of web pages as they evolve over time. The best part is that it sends the snapshots automatically and doesn’t use up space on your server. The only drawback is that if someday the Wayback Machine were to disappear, the snapshots would no longer be available.
Archiver is available on WordPress.org and contributions and suggestions are welcome on GitHub. Usage of the Wayback Machine is free but its maintainers estimate that permanent storage costs them approximately $2.00 USD per gigabyte. If you’re depending heavily on the Wayback Machine’s snapshots, you might consider a donation to help keep the digital library up and running.
As far as I can see there is no public API for saving sites to the Machine, in the FAQ they say to use the Save Page Now form for saving a site one time. Don’t think people should use the plugin for frequent saves. Wayback Machine is not a backup service.
Can I add pages to the Wayback Machine?