|
Some months ago, Sxip made the rather
dramatic step of moving the Sxip protocols away from XML as the basic structure of
identity information exchange to simple name-value pairs. Ever since, I have been pondering
whether that radical simplification is on the right side or the wrong side of
"as simple
as possible, but not simpler".
As a regular reader of this blog, you probably know that I'm a
big fan of simple,
and so questions like this one are rather important to me. If this could really
work, it would be a huge simplification and A Great Thing.
Unfortunately, I've convinced myself lately that that is one step too far, that
this simplification won't work except in the simplest cases. While I'm a huge fan
of building something very simple for the simplest cases first, and then adding more
(optional) complexity as one goes towards more complex requirements, this approach
does not work with name-value pairs: this is because when things do become more complex,
it requires a break and a different design. In other words, it doesn't scale
towards more complexity, and that makes the name-value pair design a dead end
for digital identity information. Unfortunately, because I do like simple.
Before anybody starts shouting, note that I'm not making this argument for
competitive reasons: both friends and competitors have been flirting with this direction.
Instead, I consider this a fundamental issue that we all should try to resolve
collectively, and this is why I'm blogging this.
So let me try to make my case. (I know from past experience that I won't make it
well in this first iteration, as hard as I may try. And I have been thinking about
this for some months already! But bear with me; as you argue
back, I will hopefully improve my explanation and my case ...)
Let's start with the big picture. I think nobody will really argue that in the general
case, all the world's information, regardless of purpose, is best represented
as name-value pairs. We have relational databases, for example, not name-value
databases, and for good reasons. We have object graphs, and hyperlinks, for other
good reasons, instead of just a two-column spreadsheet for names and values.
As a simple example for the general case, think of the
set of all your ancestors, descendants and their relationships. One could probably
devise a scheme by which all of this information could be packed into name-value
pairs, such as:
father: Joe
mother: Jane
father.father: Jim
father.mother: Jill
...
father.father.father: Jack
...
father.father.son[2]: My paternal uncle
This becomes ridiculous rather quickly: imagine you have to express that your
mother's aunt married your father's great-uncle. You'd have to create a construct
such as:
mother.father.mother.daughter[2].married.father.father.mother.father.son[5]: true
(maybe there is a simpler way to represent this with name-value pairs that
currently doesn't occur to me, but even if there is, I don't think it's going to be
much simpler than this because we are really trying to shoehorn something that is a
directed graph into name-value pairs.)
Note that while it is complex already, this example is a rather academic; the world
is much more complex than this because so much more information is related to people than
just their ancestry information. If we tried to represent that additional information,
too, the syntax and conventions needed would have to be even much more laborious...
Exercise for the unconvinced reader:
let's try to add their addresses and phone numbers, and past employers.
So if you agree with me that for the general (not specific to identity) case, name-value
pairs are not a feasible way of representing many kinds of information, the question
becomes:
While name-value pairs are not a feasible way of representing information in
the general case, is the subset of information that is of interest in digital identity use cases
simple enough that name-value pairs are sufficient?
This is the question I have been struggling with for some time ...
Well, of course I chose the ancestor-relatives example for a reason: it would
be entirely conceivable that information of that kind plays a rather large role
in many identity use cases. For example: "my great-uncle is a millionaire"
or "I am the beneficiary of 62% of the trust set up by ... upon the death of ...
who is currently 98 years old." or "I have not been fired for cause by
my last N employers" or "none of the members or visitors in the N
on-line communities that I frequent has ever made the statement that I spam".
You can make up more examples of this kind: what's central to all of them is that
we are talking about structured information.
Granted, these examples are not the type of identity information that people typically
focus on today. But I'd argue that this is not because this kind of information isn't
important for digital identity use cases, but because digital identity is in its infancy
and we naturally deal with simple kinds of information first, such as street addresses,
credit card numbers, date of birth, that kind of thing. But I'm quite certain that
this kind of information is going to become relevant very quickly, and that it
will produce much higher business value than, say, knowing the zip code of a person.
For example: which one convinces you more that Joe has a good reputation:
a street address where Joe lives, or the knowledge that Joe comes from a family of
millionaires and industrialists that stretches back over four generations?
So complex, structured information that at its heart is best thought of as graphs is going to
become very important for many digital identity use cases in the future, and we need
to work under the assumption that such complex information is not going to be a weird
corner case, but a high-value case.
If you agree with me so far, let's assume for a second we'd start our quest for ubiquitous
identity protocols with the plan of first using name-value pairs (because it is simple),
and when we do need to represent more complex stuff, we move to XML, RDF, or some
other kind of mechanism that can represent more structured information. The trouble
is that we will end up with a fundamentally different representation of information
than name-value pairs which is not down-ward compatible, e.g. such as:
father: Joe
mother: Jane
..
marriages: <set>\
<marriage>\
<husband id="1234"/>\
<wife id="4567"/>\
</marriage>
ancestry: <ancestor>\
<father>\
<person id="1234"><male/><first>Joe</first></person>\
<father>\
<person .../>\
</father>\
...\
</father>\
<mother>\
<person id="5678">Jane\
</mother>\
</ancestor>
Information representation doesn't get much uglier than this mixture of simple name-value
pairs and XML, in my experience, and I feel with any poor soul who has to debug or
support this in production deployments.
Ergo: let's please not do name-value pairs for new protocols, but let's assume it is
going to be more structured than name-colon-value.
Coincidentally, there is good precedent
for that already: RSS, OPML, Atom etc. It would have been quite easy to design a
format with the same capabilities as RSS that was name-value-based (at least quite easy
compared to some of the contortions one would have to go through for the examples above).
But neither Netscape nor Dave Winer nor the Atom guys nor anybody else that I know of
seriously considered that. This should give us pause: if their information, that is
simpler in structure than much of what we need for identity, needs to be represented
in XML (or, some people would argue, RDF), chances are that simple name-value approaches,
although appealing on first sight, simply won't work as soon as people adopt them
for not-quite-trivial use cases..
I will have a constructive suggestion for what to do some time soon (in the meantime,
look what we do with the
LID Profile for
Traversals and Selections), but for the time being, I'd like to only focus on my
case that name-value-pairs won't work, and I'm looking forward to your (dis)agreements.
Joaquin Miller saw a draft of this post yesterday, and commented:
You are definitely right. The point in brief is:
Data is not a collection of data elements; instead data has structure.
Yep. That is also true for identity data, and that's why we need something else than
name-value pairs as the foundation to represent it.
Update: Chuck
Mortimore, who
demonstrated
his open-source version of InfoCard-In-The-Browser at IIW, says he
"completely
agree[s]", as have several people who have sent me e-mail privately so far.
Update: John
Panzer points out that some people have
argued that one could
do RSS without XML. Sure one can, just like one can do the contortions in my examples above ...
Certainly, the contortions for RSS would be much simpler than the ones that we would have to
go through for more complex identity.
|