Johannes Ernst’s Blog

How does identity data relate to transactional and other kinds of data?

Is it non-overlapping? The same, or a subset? Is there overlap; if so, where, and under which circumstances?

These questions are at the heart of the thought process that needs to get into designing identity technologies for the era of pervasive identity. For example, if the answer was "non-overlapping", then we could merrily go ahead and design identity systems on the green field (at least with respect to non-identity systems), not worrying about what is in, say, the customer database. If the answer was "the same, or a subset", then we’d better not design any identity systems, but instead devise methods by which existing systems (such as transactional systems and other systems that were not built for identity per se) can be recruited to also meet the requirements of identity.

I’ve found that when I’m asking that question, which I occasionally do, the answers that I’m getting from people in the community often dramatically differ depending on how much the person being asked has an "enterprise directory" background. (and if you think about that for a minute, that makes some sense.) To exaggerate, not being entirely fair here for a minute, that opinion sometimes seems to be "what’s in the directory is the identity information, and all other systems aren’t directories, so they don’t have identity information." with the conclusion of "little or no overlap". If that sounds like a self-fulfilling promise to you, it certainly does to me.

In my experience, however, this traditional view is increasingly inconsistent with what is happening, in terms of user behavior, in terms of the shift of power and control from centralized organizations with a firm wall around them to individuals, and in terms of the new technologies and services that are springing up in response.

For example, the other day, I needed to get in touch with somebody whose phone number I had lost, or never had had in the first place. I found it: by Googling his name first, finding his blog, and doing a whois lookup on the DNS address of his blog. It could have been through reverse phone number lookup by address at Google. Or by going through LinkedIn. If we had happened to work for the same company with a well-maintained directory, I could have gotten the number from there; but we didn’t.

Of course, this use case is not an enterprise use case. But that is the whole point about pervasive, indepedent identity! It isn’t tied into any one organization or central repository of identity information. It is the non-enterprise use cases, the "open internet" use cases of identity technology that are needed to be addressed today because increasingly, the people we interact with and relate to are outside of the confines of the same organization; certainly outside of 9-5, which isn’t what it used to be, either. There is also the convenience factor: Google is a lot closer to the fingertips of a lot of people than the enterprise directory application.

Note also that what is one person’s identity data is somebody else’s transaction data. We certainly don’t run MyLID.net (our hosted LID, OpenID, Yadis identity service) on top of a directory, and I’m pretty sure that is also true for many social web applications. Netflix’s social network functionality has a lot of identity-related data in it, but they probably (conjecture on my part, I don’t know) store it just in the same database that all their other information is in: maintaining the relationships between people and their purchases would be rather difficult if one introduced the usual impedance mismatch between a directory and a database; the benefits would be rather marginal, and Neflix does use transaction data as a form of identity data in any case …

The conclusion: a separation between these different kinds of data, and allocation to different kinds of information systems with strict boundaries between them, might have made sense in the past, and within a tightly structured IT environment (and even then, show me the enterprise application that does not have at least a bit of identity information in it). Today, on the open web, with social software being one of the primary areas of innovation, this separation is increasingly anachronistic, if it is performed for the purposes of "separating identity data from transaction data". (There are good other reasons, such as differing performance profiles. But conceptually, we should be thinking about one tightly cross-referenced set of information, even if we decide that data item A should rather sit in system B than C because it’s faster, or cheaper, or …).

What we need, in the end, is an approach that considers the entire web and enterprise IT infrastructure, warts and all, one giant, distributed, decentralized meta-directory (or meta-database, or …) that has parts that are optimized for different requirements, but that can be accessed uniformly so application development "native to the web" is possible. Identity data elements are a subset of all of that information, and tightly related to other data elements, both identity and not. And that way, we don’t even need to draw an artificial line whether or not information item X (say, somebody’s presence or transaction record on eBay) is or isn’t a piece of identity information.

How will cyber criminals make a buck once we have pervasive digital identity on the internet?

I proposed the following topic:

Let’s role play as aspiring cyber criminals, 10 years from now, when there’s a ubiquitous digital identity layer that is part of all digital interactions over the internet. How do we make a buck or a billion?

for dinner conversation at the upcoming Harvard/Berkman conference on digital identity. Apparently they have a Berkman conference tradition called Food for Thought dinners. These dinners are informal gatherings of about 8 people who gather around discussion questions and are lead by one of the conference panelists.

I wonder whether we’ll find a killer business plan in there ;-) and if so, what we can do about (ahem, against) it.

The Challenges of Open Data — example: Digital Identity

Last week, I posted about why forcing identity data into name-value pairs is an architectural dead end. Of the many comments that I received, those from Phil Hunt and Mark Wilcox, in particular, turned out to warrant a much more detailed response than I initially thought. I realized that they are raising a much broader topic that one could call "The Challenges of Open Data", applied to the example application domain of digital identity. I hopefully will get around to writing an article on the general case some time soon, but for now, I’ll focus on digital identity data. Because that is already complex enough, I have broken down my thoughts into multiple posts, which will be published over a few days, one at a time, and which, for convenience, will be linked from here.

I’m rephrasing the points that were made as questions (and hope I don’t miss anything really important) and also add a few related questions.

  • “How is identity data different from other kinds of data, such as transactional data? Where does one start and the other end? What overlaps are there?” (go to separate post)
  • “What is a good way of thinking about the (conceptual) structure of identity information? Is it RDF? Is it name-value pairs? Is it SQL? Is it … [long list of potential candidates].” (go to separate post)
  • “Given that LDAP seems to work for identity data in many use cases, where does the need for more complex structures for identity data arise, and what are those more complex structures?” (go to separate post)
  • “Applications written against LDAP directories from one vendor often do not work against LDAP directories of other vendors. If we want to build a ubiquitous identity layer on the internet, how are we going to solve this problem?”. This question is really about how to deal with multiple ontologies of identity data. (go to separate post)
  • “What is the best way of representing identity information for the purpose of storage and retrieval?” (go to separate post)
  • “What is the best way of representing identity information during exchange on the open internet?” (go to separate post)
  • “Can name-value-pair-based representation of identity information be “fixed” with few additional conventions that add a lot of power? E.g. by extending the allowed values (in the name-value pairs) to be “pointers” to other name-value pairs?” (go to separate post)

Guardian quotes Wendy Hall: URLs instead of National Insurance Number

According to Wendy Hall, a longstanding colleague of Berners-Lee.., the babies of the future, for example, will have a web address instead of a National Insurance Number. Hall said: ‘I have a vision that in the future when a baby is born you’ll get some sort of internet ID that is effectively your digital persona, and it will grow with you. It will actually represent you in some way - what you know, what you’ve done, your experiences. I guess you’d call it your URI [Uniform Resource Identity]. This is the thing that always identifies you. Every time you do something on the internet, it is effectively logged, building up this profile that is with you for your life. Then you have your life’s record, which can include any legal documents or photographs or videos that you might have, that you can pass on to your children. We will be able to build software that can interpret that profile to help get the answer that you need in the context that you’re in.’

Quoted from here.

If this isn’t an endorsement of URL-based identity then I haven’t seen one … great! Of course, I would want to suggest that we don’t want to have a single URL-based identity throughout our lives, and instead want to use several, or even a very large number of URIs that cannot be correlated directly. But having URLs instead of National Insurance Numbers, or US Social Security Numbers sounds rather straightforward.

The idea of re-interpreting URI to mean Uniform Resource Identity from somebody involved in the W3C is something unexpected … I got to think about that…

WEF Meeting protected by halted sea traffic, naval aircraft and ships

I won’t go this year, but invited I was. Today I’m reading that for the World Economic Forum meeting at Sharm el-Sheikh, which will open on Saturday,

…the Egyptians have deployed naval aircraft and ships from the Suez Canal to guard the Red Sea resort… Egyptian security has closed jetties and moved divers’ boats out for the next four days. Pleasure boats have been ordered to stay clear of Sharm and the ferry lines from Jordan and Saudi Arabia halted.

This followed a letter from Prof. Schwab, the principal behind the WEF and initiator of the famous Davos meetings, encouraging everybody to trust in the security that the Egyptians are putting in place. Unfortunately, just a few days after his letter, the second attack this year occurred. Looks like now they are bringing in the artillery, almost literary!

I can’t tell you what it feels like making it through security for a WEF meeting in an Arab country, in particular if you aren’t on the list of expected attendees — as in my case, when I went to a WEF in Jordan a few years back. The sad thing is that the security is entirely warranted; several of the hotels that we stayed at in Amman have been bombed since.

Sounds like one has to go through a substantial amount of personal risk these days to improve the state of the world

Next Page »