Summary
One of the greatest achievements of the Internet is to
dramatically decrease the time it takes to find information. However, this
capacity has not greatly improved over the past decade due in part to the explosion
of unstructured content. The problem is even worse on the enterprise Intranets,
where information is much more fragmented and lacks the interconnectivity of
the Web that improves navigation and search result ranking.
The situation can be improved if content actors (producers,
managers, and consumers) can combine efforts to better formalize information,
making it easier to process by computers. While technologies to support such
activity have been evolving, much still remains to be done. Lacking in
particular are systems that assist content consumers, by far the largest
segment of content actors.
Problem
Despite Apple’s Siri, we are still pretty far away from the
Star
Track computer, i.e., a computer that is capable of answering any question
based on the available information. While content search technology has
gradually improved over time, it has not been able to match the information explosion
that we are experiencing (aka Big
Data). In the end, you still get a (usually large) collection of documents
that may or may not contain what you are looking for.
Even when information is semantically structured and we can
ask a computer for a specific information object, there still remains a problem
of searching across multiple structured data sets with different data models. One
cannot specify query parameters if these parameters are not the same across
datasets.
What’s needed is a way to formalize and merge semantic information
structure :
· Give semantic structure to content during its creation
·
Structure
existing content through
o
Adding external structured properties called metadata
(e.g., topic, author, date, etc)
o
Extracting structured facts from unstructured
content as an alternative knowledge representation
·
Unify and interlink the resultant structure
across all data sets
The result would be much easier to automatically process and
search for computers, with powerful consequences for information consumers.
They would then have their Star Track computer.
Unfortunately, this is a very complex undertaking that
requires a lot of investment on the part of
information creators, managers, and consumers. Therefore, the supporting
technology has a large challenge of boosting the ROI in order to achieve the
tipping point of mass adoption.
So far, the technology has generally taken two opposing
approaches :
- The content formalization work is carried out by dedicated trained people who are referred to as Information Architects, Content Curators, etc. These people create domain-specific data models, interconnect different models, and use them to formalize new and existing content. Some of this work can be automated to a certain degree, but automation usually introduces a significant amount of noise.
- The content formalization is carried out by content consumers via so-called « free tagging », whereby users can add whatever metadata to content that they wish. Free tags are just simple short phrases that add a bit of semantic structure. While content consumers can be motivated to improve semantic structure for better retrieval , the resultant degree of formalism is very weak.
There exist now sophisticated software and algorithms for
experts that help to create, manage, consolidate, and reuse metadata and data
models, such as linguistic rules and reference vocabularies. However, no tools
are available that empower content consumers and consolidate their
contributions with those made by experts.
Solution
What’s needed is a platform that allows to effectively crowdsource
the content formalization task to those who use the content. After all, this formalization
is done for the benefit of content users so it makes perfect sense that they
should have a say in how the content is formalized. Just like one can
crowdsource production of data, one can crowdsource production of data models
and metadata.
Many research papers have been written on this subject (see an example), but strangely
no effective commercial tools exist that would support such a process. Yet, the
underlying concept is fairly simple:
- Allow users to create, define, and modify tags, split them into properties and values, and create semantic links between tags (e.g., synonyms, translation, sub-terms).
- Allow users to discuss and evaluate modifications proposed by others (e.g., using voting). Merge identical modification proposals and count them as votes.
- Allow automatic acceptance based on the number of votes, user profile, etc.
- Allow different moderation rights based on user profiles (e.g., certain users can have expert status and reject modifications made by others).
- Assist and guide metadata creation and consolidation by using search keywords and expert-generated data models.
As research papers indicate, there are a lot of details to
work out, but conceptually the main issues have already been resolved and modeled.
All that’s needed is to create a viable commercial product.
Market Overview
Content is consumed through a huge number of diverse software
systems. Many prominent systems already use free tagging:
- Enterprise collaboration platforms such as Sharepoint, Drupal, and Confluence.
- Public collaboration platforms, such as Twitter, Stack Exchange, and Delicious.
A few of these systems are starting to offer a basic level
of tag management functionality. For example, in Stack Overflow users with enough
reputation can specify and edit tag definitions as well as suggest and validate
tag synonyms. While this is a move in the right direction, it is far from
sufficient.
On the other extreme of the spectrum, Google has recently
launched Knowledge
Graph , which is the most complete public collection of expert-structured
knowledge. This collection can be used by external services via an API, thereby
providing a good basis for semi-automated enrichment of free tags.
Business model
The goal of the proposed solution is to enhance the functionality
of existing content platforms, which can be accomplished in two ways:
- Sale of a software component to content platform providers (OEM license)
- Sale of a platform plugin to content platform users (software license or SaaS subscription)
Go-to-market strategy
Many of the platforms using free tagging provide API access
and application marketplaces (see
an example). The best starting strategy would be to develop tag management
plugins for such platforms.
Moreover, those platforms that chose not to implement free tagging
have done so in the knowledge of its limitations, and so may change their mind
once a better system is in place.
No comments:
Post a Comment