(désolé – ce texte est en anglais)

Two innovations that illustrate applications of immutable and connected data tend to widely disseminate in our lives. Can we complement or improve on this and what do we learn?

One is Git. Git is the most widely used version control system. The emphasis of Git on data integrity builds upon an underlying scheme called “content addressable storage” and Merkle trees.

Content addressable storage may operate like this: given a file, one computes a unique hash from its contents, then manages to store and retrieve the data from the hash alone, henceforth being immune to renaming, moving and of course modification.

Merkle trees work like this: every node in the tree may contain the hashes of its subnodes. The hash of the node itself accounts for these subnodes hence locks the entire structure. Merkel trees are useful to Git, because for various storage efficiency reasons files are chunked into pieces and versions are stored by difference. Hence every file will be converted to a tree of content addressed blocks.

In Git, every data block or node is immutable, because the contents are addressed from the hash, and can then further be verified as genuine. The whole node structure is also connected.

Another innovation is the Bitcoin blockchain (also here on wikipedia, here on bitcoin.org). In a block chain (many abound besides Bitcoin), every block has a hash, and contains the hash of the previous block in the chain. [Computing this hash is difficult and called mining because the protocol requires blocks to be generated every 10 minutes on average by introducing competition among miners.]

In Bitcoin, every block  is immutable, and is linked to the chain of previous blocks down to block #0 called the genesis block. It is also timestamped by the network. Internally, the block also contains a Merkle tree of the transactions that were included.

From there on, a question arises:

are documents more useful and/or valuable when they are mutable, or when they can be proved as unmodified?

Being immutable, it appears that document gains a wealth of useful properties. An immutable file may be :

  • addressed by its content hash, hence becoming immune to renaming or moving
  • verified for integrity, hence becoming immune to unwanted modifications
  • be timestamped for inclusion in (smart) contracting processes
  • be included in complex chains or trees of documents (versions, references…)

So, can we benefit from these advantages within our own document organization?

KeeeX generalizes immutability and connectivity to the whole set of your documents. But it does a lot more. KeeeX not only computes an integrity identifier (idx) for every file but it also:

  • encodes this idx so that it is usable as a name by humans (including on the phone)
  • injects this idx in the file itself so it can be found or indexed by search engines, on a computer or on the web. It does not require a specific software or infrastructure to discover the files!
  • injects into your very own documents references to other documents. However, in so doing KeeeX ensures that searches for these references do not collide with searches for the document itself
  • automatically chains versions of your files, also allowing when required multiple parallel versions
  • injects the public key of your digital identity, so your contribution is locked in forever
  • injects a signature of the idx with your private key, so that nobody can counterfeit you. Such a signature can also be verified without KeeeX.
  • injects more metadata allowing the statement of copyright, confidentiality statements, instructions to collaborators or any meaningful information (up to 4K)
  • injects a specific hash compatible with insertion in the Bitcoin blockchain (for example, search for 2HzFzM99P59PV4ssMhkC9bcZuzzjMdRNiYaqDTC at the bottom of this link)
  • promotes your folder structure to a global tag system

So using KeeeX, one builds on his very own disks and without any infrastructure a provably immutable network of documents that can altogether be:

  • trusted for their integrity and authorship : never wonder or reread
  • fully interconnected for a variety of needs such as semantic tags, process marks, links to original files or versions or references : search, organize at blazing speed
  • further connected to the bitcoin blockchain or any other : prepare for ultimate dematerialization and notarization scenarios
  • addressed by their content at machine or web scale : instant and exact search with artificial words that generate zero collisions
  • involved into arbitrarily complex automation schemes : build your digital heritage, for a future where semantics and trust will break data silos and allow for continuous process improvement

Start using KeeeX and invest in your very own digital heritage.