The Content Chaos

Increasingly many sources refer to ’information chaos’ or ’content chaos’ as the prevailing and increasing situation in industry where exponential amounts of data and documents pile up in front of information workers. More precisely, the content chaos stemming from an inability to manage unstructured or standalone documents is an issue to many organizations.

The multiplication of storage and sharing possibilities (e.g clouds, backups, online or not) for data don’t ease the task. They make it impossible to trace the flow of arguments and validations that led to a specific decision, and very often simply to find the expected data.

A number of important figures gatherered over the last years draw a rather dull picture of the situation and how it is perceived. For instance AIIM’s 2011 ECM Survey revealed that:

  • 60% of new ECM users are not confident that “e-mails related to documenting commitments and obligations made by their staff are recorded, complete and retrievable.”
  • 41% cite content chaos as the trigger for adopting ECM.
  • 56% are not confident that their electronic information (excluding e-mails) is “accurate, accessible and trustworthy.”

Content chaos at the individual level

Content chaos impacts everyone at every level. Frequently cited are the increasing pains of:

  • finding a known to exist document
  • ensuring that a document is in the correct (latest, not accidentally modified) version
  • guaranteeing that their correspondent reads the same document
  • making their way through a number of duplicates, archives, cloud systems where the data may reside, or is duplicated, maybe with distinct and inaccurate modification dates

The Knowledge Work Deficit

Companies and individuals pay the price of content chaos. The concept of ’knowledge work deficit’ was coined by a 1999 IDC report that revealed that the average yearly cost per employee of searching data was 5000$ (3600€) in 1999.

This report also showed that considering an average sized company having 1000 information workers paid an average 80K$ (60K€),

  • the cost of not finding data was 6000$,
  • the cost to redo was 12000$.

Not to mention the missed opportunities resulting from low productivity and failed time to market. Of course, some issues relate to paper management, but handling dematerialized documents and numeric data in general has not proven to be a relief in the recent years.

Henceforth a solution allowing perfectly accurate instant searches, without ever needing to redo, would under those settings yield an average 23000$ (so approximately 30% of 80K). Those figures match a famous quote by former Hewlett-Packard CEO Lew Platt that “If only HP knew what HP knows, we would be three times more efficient.”

A wider set of figures has been collected more recently:

  • the fortune 500 companies lose an estimated 12B$ by not finding data.
  • 59% middle managers say they miss important data everyday because it exists in the company but they cannot find it (from Accenture)
  • companies statistically misfile up to 20% of their documents hence lose information forever (ARMA international)
  • companies pay six times the cost of the original for searching a lost document (Coopers & Lybrand)
  • companies pay eleven times the cost of the original for redoing a lost document (id)
  • 90% of typical office tasks revolve around the gathering and distribution of paper documents. 15% of all papers are lost, 30% of the time is used trying to find these lost documents (Delphi Group,1999)
  • and executives spend a statistical average of six weeks per year searching for lost documents, according to a survey of 2,600 executives by worldwide office supply leader, Esselte

These figures clearly show that any lightweight seamless solution to address content chaos is subject to extremely high ROI. Let’s introduce KeeeX.

Knowing what you know: the KeeeX way

In essence, solving the content chaos issue amounts to a few simple properties, that pertain to well known data governance guidelines:

  • existing documents must be easily found, then verified as non corrupt
  • authors can be traced
  • decision and version history can be traced
  • references to other documents must be easily found
  • the context of a document (references, categories, tags, authors …) can be used to search relevant items

KeeeX achieves this using simplistic means : every file contains a plain text indexable, pronounceable, verifiable identifier, as well as references to:

  • a plain text indexable reference to an author profile,
  • a date,
  • possibly a PKI (public key cryptography) signature
  • plain text indexable references to other document’s identifiers

Technically, references differ from identifiers so that a content based search using a web search engine or a machine based search application will separate the documents that define an identifier from the ones that reference it.

That the identifier is pronounceable is extremely interesting to humans: they can talk about file X, version Y, that refers to Z.

To paraphrase the innovation: KeeeX leverages file checking mechanisms to the level of a human friendly yet universe scale indexing scheme.

From raw data to linked documents

KeeeX thus embeds references to documents into other documents. These references are plain text searchable. They also allow when found for verifying that the target was never modified. KeeeX hence translates the “linked data” perspective known to the semantic web to a notion of linked documents.

One difference lies in the fact that with keeex, the metadata are embedded inside the documents. From a ’semantic’ perspective, it means that the document IS the concept. There is no notion of attaching meaning to an item outside of it. In turn, this results of not requiring a complex infrastructure to deal with the metadata.

Metadata enhanced organization

KeeeX provides a unique way to describe any possible semantic or tag based document organization. Each concept or tag can be defined as a document. For instance, the concept ’Data Governance’ can be implemented by a pdf print of the corresponding wikipedia page. When future documents refer to that concept, the intent and meaning can be perfectly understood.

KeeeX thus allows a user or company to define the exact granularity of concepts (or tags, semantic keywords, notions …), as are required to organize data. And this organization is written in the documents themselves, and/or in other documents.

Normally, one only shares a document. Using KeeeX, one also shares organization and veracity. Both the latter notions are normally offered by external infrastructures, and not carried by the documents themselves.

Process friendly

Because KeeeX does not encapsulate documents to inject metadata, but rather exploits a large variety of options to embed plain text metadata into every file format, the files retain their full original functionality.

KeeeX hence remains compatible with any kind of document process / infrastructure / management / sharing tool.

Cloud friendly

Because KeeeX can verify user files offline and locally at any time, it offers to solve part of the cloud/archive chaos. Document clones can be backed up or shared using infinitely many cloud, sharing or backup systems: all copies will possibly verified to test whether they are a truly valid instance of the original.

Knowing that she has several backups of a document may safely allow a user to delete a local version. Indeed, should the document be needed in the future, it will be found in a snap by content based file system or web search engines.

Web friendly

KeeeX lets a user forget about file names. Of course, KeeeX copies the first three words of a file id in the filename at creation time. However, KeeeX does not rely upon this name for searching (indexing is content based), and KeeeX does not rely upon this name for verifying: only the contents matter).

Content indexing is good for the web. Web crawlers may learn to look for keeex idioms even in places where they do not (officially) attempt indexing.

Filename independence is also good for the web. Very often, the files uploaded to a service (e.g. photos on flickr, documents on box etc…) will not be served using the same filename. Also, a file name may simply have been translated to a new language.

Conclusion

KeeeX offers a new paradigm to address the information chaos in the most possibly simple way: by instrumenting files so that they carry the required organization, veracity and authorship.

Forever

(note: this is a repost of http://laurent.henocque.com/ and linkedIn)

References