Over the last few weeks I've been thinking about a problem that is not at all new, not at all solved, and one that I am cognitively struggling to solve. 10Centuries is a platform that I've built the last few years to solve some of the very important problems that we face when it comes to storing information online. Matters such as data degradation, link rot, and digital permanence can be reduced as the people who host online services typically have a backup and restore strategy that is tested and proven sound to quickly recover after catastrophic failures in hardware or software. This is great as the typical person rarely ever thinks about the problem of data loss until their data has been lost. At 10Centuries, automated processes are in place to make sure that information is very hard to lose. Backups are automatically tested moments after being created, and files are verified to ensure the copies are always exactly the same as the originals. This isn't the problem that I'm thinking about, though. The issue that I'm struggling with comes down to one word: trust.
You need to trust that my services are living up to your expectations. You need to trust that I'm not going to lose your information in the event of a server failure. You need to trust that the posts that you've deleted or expired are actually gone from the servers … but how?
Digital Receipts?
Trust is a very delicate thing. It can take a long time to earn, and just a split second to lose. When something is marked for deletion on the 10Centuries platform, a few things kick into action.
For text-based information, any content related to the post is immediately scrubbed from the server, including the metadata which includes things like post length, publication dates, and the like. Any files associated with the post are left intact in an account's storage in the event it will be used again in the future.
When files are deleted, they're immediately removed from the main server. The backup server, which keeps a "hot copy" of the files in case the main storage area fails, is sent a message to delete the files and the action is carried out almost immediately. I've been testing this quite a bit over the last few months, and the system appears to be pretty solid.
When entire sites or accounts are deleted, the first thing to go is the text, and then the files. The entire process, as of this writing, is typically completed in about 0.5 seconds across the entire 10Centuries platform.
But what about backups?
10Centuries has a very strict backup regimen. The main servers run in Japan, with some being in Tokyo and others in Osaka. A virtual server located in Vancouver has a copy of the database and the files in the event a huge earthquake turns the entire country of Japan into a modern-day Atlantis. Backups are made from the Vancouver server and stored both in Japan and Canada. The servers that keep the backups are only used for this purpose and test the data upon receipt. Backups that fail to open or restore information are discarded and a new backup is requested. These files are cycled every 7 days. This means that 10Centuries never has deleted data for more than 7 days in its system backups, which is all well and good … but how do I prove this?
As the service grows, this question will become more and more important as people will rely on deleted content to be gone just as much as they rely on existing content to be available at a moment's notice. We know from past experience that some web services do not actually delete anything from their database when they claim to and, unless that service is Facebook, this can cause a painful backlash from people around the world. What we need is a verifiable way to show people when data is deleted from servers … but how?
If you have any suggestions or ideas, I'd love to hear them. Get in touch!