Someone asked me a question today about how to find out more about how decentralised validation works. I was way over my head r.e. JSON blobs, etc. (where can we point people?)
Though we did not hold a community call this week, I figured I’d try my hand at answering it with a lightly technical blog post and a few links. I’ll break it down and focus on a couple points that I think are particularly important.
How are open badges validated?
- Reading data from the PNG file: Open badges data is stored as a text “chunk” in the PNG image. This post from Jon Buckley illustrates how chunks are arranged in a PNG, and where the open badges information rests. Recently, it appears the specification is changing from a “tEXt” chunk to an ‘iTXt” chunk, as Buckley had suggested, to promote better international language support. On the open badges Github wiki, more recent information shows an example of what the badge data looks like in the PNG file for the “hosted assertion” style of badge (as opposed to the “signed” badge, where more information is “baked” in), however we are warned that this information is currently out of sync with the production code located in the “openbadges-bakery” repository, which seems to still be using a “tEXt” chunk. Essentially, with a hosted badge, the goal is to pull a URL out of the PNG file. The URL allows the retrieval of the badge assertion, in JSON format, hosted by the badge issuer.
- Checking the assertion URL: The next step is to determine that the issuer indeed asserts the existence of the badge and that it matches the user who is presenting it to you. In introducing what an assertion does, the Open Badges team says, “With Open Badges, we talk about assertions as files that describe three things: who a badge was awarded to, what that badge represents, and who issued the badge.” This information is stored in a predictable format (in JSON), so that badge validating applications can understand it. The technical Assertions Specification details exactly how that information is encoded, but essentially, the person or application reading the assertion file needs to compare the alleged badge recipient’s email address with the information in the assertion. The recipient data in the assertion is usually hashed to avoid broadcasting users’ email addresses to the world, so a validator needs to run the alleged email address through the indicated hash function and compare the result with the stored hashed value. For example, the JSON recipient information:
indicates an email address that has been “salted” by appending the string “deadsea” and run through the sha256 algorithm to yield the string after the $ in the “identity” field. If this process yields a match, the validator can then harvest information about the badge and what was done to earn it. Note: For “signed” badges, which are just entering play but will most likely become the dominant form of issued badges the assertion is also baked into the badge in a format that may be decrypted by running it through a known algorithm (usually RSA-SHA256) with the published public key of the issuer. See the Assertions Spec.
What does it mean to have distributed validation?
I just described a process that may be undertaken by any person or application that understands open badges. Anybody connected to the Internet could dig the assertion URL out of a badge PNG image and check the results. The alternative would be validation by one centralized badge authority, and the whole ecosystem would depend on all issuers submitting badge information to the central repository and that repository being able to handle the load of the totality of validation requests for the long haul.
Why distributed validation?