|
Here are some of the proposals I have heard recently:
- name-value pairs
- name-value pairs in a hierarchical arrangement (think LDAP)
- the relational model (think SQL)
- XML
- RDF and/or RDF schema
- OWL
- UML and variants
and I'm sure I'm forgetting a few.
If this sounds like every possible model in the known universe of data representation,
that's probably because it is...
When faced with a situation like this (and this is entirely independent of the topic of digital
identity, but just good information modeling practice), what I tend to do is to determine whether
some of the proposed approaches ("meta-models" would be the more correct term)
are just inherently more powerful than others. Then I
use the most powerful one to express the information that needs expressing, and
after that, I see whether a simpler approach could have been used instead
of the more complex one. Let's see where that leads us.
Name-value pairs are clearly the simplest of the lot. Hierarchical name-value pairs
are a superset, as is XML. Both of those (without additional conventions) have an
inherently hierarchical structure, and it is hard to model general-purpose graphs
in them, like RDF/RDFS, OWL, UML and others can do, so the latter ones are more powerful.
RDF schema, OWL and UML in turn are more powerful than RDF by itself because not only can
they express graphs, like RDF can, but have
additional concepts like inheritance, entities vs. attributes etc. So those three
are similarly powerful (and the distinctions, for our purposes here, are largely
not very relevant).
So let's take an example set of identity data and model it accordingly. Let's say
I want to model Joaquin's and my business card information. We need to capture name,
company, job title, and business and private phone numbers. With a UML-ish mindset,
we'd find the following concepts:
- Person, with attributes such as Given and Last Name
- Company (which is probably a special case of a more general concept called Organization)
with attributes such as LegalName etc.
- Role, with attributes such as Name
- VoiceCommunicationsEndPoint (or whatever we want to call this), with an attribute called
PhoneNumber
- Some others which we'll ignore for this example in the interest of brevity
- associations between them.
For the above example, we are getting two instances of Person (Joaquin and Johannes). We
get one instance of Company (NetMesh). There are two instances of Role (CEO, Architect),
and a number of instances of VoiceCommunicationsEndPoint (for the various phone numbers we
are using.). There are relationships between Person Joaquin and Role Architect, between
Role Architect and Company NetMesh, between Person Johannes and Role CEO, between CEO and
Company NetMesh.
The associations between the VoiceCommunicationsEndPoint and the other objects are
interesting. My private phone number is clearly an association between Person Johannes
and the VoiceCommunicationsEndPoint, but my business phone number is most likely best
represented as an association between VoiceCommunicationsEndPoint and Role. This is because
my use of this VoiceCommunicationsEndPoint is contingent upon my being in the role at
NetMesh that I'm in; if I was fired, my use of the VoiceCommunicationsEndPoint would
go away, too.
This is a very tiny example, but it serves as a good illustration for how complex object
relationships can become very quickly in an identity scenario, and how rich a vocabulary
one would like to use.
So let's see how we would represent the same information using less powerful techniques
than UML / OWL etc.:
- It's clear that using RDF wouldn't be much of a problem. One can clearly express this information as
a graph, constructed from a set of triples. It loses the distinction between attributes
and classes, but with a well-managed schema, that is not really much of a problem
(tools support, verbosity etc. etc. are other issues, but we'll not deal with those
right now)
-
plain XML is a bit harder, for two reasons: 1) In our model, we have one instance of
company, which is related to two people. In other words, we have something that can only
be expressed in the hierarchical XML structure if we put the company at the top (as
LDAP does). Unfortunately, this structure breaks down as soon as one person has a
relationship to more than one company: we'd have to construct a workaround
to have two places in two XML files be declared to be synonyms (either representing
the person, or the company, depending on whether we put the company or the person
at the root of the hierarchy). This is one of the reasons directories are usually
only applied to employees of a single organization, by the way.
Fortunately, there is a trick that can get us out of this particular problem: if we
use URLs/URIs (which could be identity-enabled, by the way) in place of in-lined
company or person information, it is easy to determine
that two nodes in two different XML files represent the same person, or company,
and it is easy to look up information about it simply by dereferencing that URL.
So we can probably make plain XML work.
-
But name-value-pairs become really really ugly, just like in my
previous
example. It's easy to come up with name-value
pairs for a Person, or a Company, or a Role, or any of the classes. But it is very
hard, if not impossible, to represent the associations. How do you say, for example,
using name-value pairs, that Johannes works in the role of CEO for NetMesh, and
as part of this, may use this phone number? At the very least, we'd have to have
the ability for properties to carry two values, not just one (so we could represent
an association between, say, Johannes the Person and CEO the Role).
If we don't have these two pointers, and absent of say, re-constructing RDF in
name-value syntax, we can't represent the information in the richness that is
inherent even in this very simple example. Instead, we'd have to fudge, such as:
we don't allow the person to work for multiple companies, or we just call it
"work phone" and don't relate it to the role, or we leave it as an
exercise to the reader to determine that if Joaquin and Johannes work both for
a company whose name is "NetMesh Inc.", it must be the same company. None of
which is a good foundation to build trustworthy stuff on ... which is why I'm
calling it a dead end.
So my conclusion, which I'm sure you aren't surprised about given what we've done
in LID and InfoGrid, is that we can probably scrape by with XML, but not with
anything simpler. InfoGrid itself (the platform, of which identity is only one
component), is based on a richer approach that's closer to UML and OWL, and
with examples like this one,
it's easy to understand why. For InfoGrid LID, the identity layer, we chose
XML, and very conciously so, although this is the first time that the reasoning
has been written down ...
|