GitHub unveiled the Arctic Code Vault, a new project aimed at archiving the open source software of the world and making it usable in a future world that may not have the machines or knowledge to read it.
The code-sharing site would position the vault near the North Pole and a mile from the Global Seed Vault in the Arctic World Archive (AWA), a decommissioned coal mine in Norway’s Svalbard archipelago.
The company owned by Microsoft is planning to take its first snapshot of each active repository on February 2, 2020 and store it on Norwegian long-term storage company Piql’s 3,500-foot film reels.
Film usually has a lifespan of about 500 years, but Piql’s film is supposed to last 1,000 years.
PiqlReader is also made by the company to read offline data.
At its GitHub Universe conference in San Francisco, GitHub revealed the project.
For very long-term data archiving, the Svalbard mine provides a few key advantages: it is a demilitarized zone and is one of the most isolated and geopolitically secure places on Earth where people live.
In addition to other historical data from countries like Italy, Denmark, and the Vatican, the GitHub snapshot will be held.
According to GitHub, the 2020 GitHub snapshot will include repositories of public code as well as “Significant dormant repositories as determined by stars, dependencies, and an advisory panel.”
“All data will be stored QR-encoded for greater information reliability and transparency. A human-readable index and guide will identify the location of each database and clarify how data can be retrieved.” The advisory panel will include experts from a variety of fields, including anthropology, archeology, history, linguistics, archival science, and futurism.
GitHub sees the Arctic World Archive as part of its cold storage policy, which also includes the Silica Project of Microsoft Research, a recent effort has been made to store the Superman movie on a credit card-sized piece of quartz glass.
Cold storage information is updated every five years or so.
At the other side, warm storage data includes all Gits, Issues and Pull Requests data that are constantly backed up.