1729 Billion Users Table: A Review

Here is my review of 1729’s The Billion User Table.

I do like the unique take for measuring when a company is too big, with too much potential direct impact on individuals, based on the size of their Users table. I think one really positive benefit of this measure, even if it is not used actually used for anti-monopoly legislation and just for tax or other some other relatively minor regulation, is that it will force companies to ensure their Users table actually reflects real users and not bots and duplicates. It will force a lot of companies to keep their Users table clean and avoid misleading investors about their metrics and projections, most of which are extrapolated from the size of their Users table.

One caveat, even though obvious but important to mention for those with an enterprise architecture background, is that when the article mentions Users table, in essence it is only talking about it from a logical architecture perspective and not from a technical / implementation architecture perspective. The actual technical implementation of the logical Users table can be anything that meets the design goals and expected throughput needs – relational table, columnar storage, graph based storage, json object or even byte streams or blobs.

I agree more with the article’s localism argument for moving the Users table to block chain than the anti-monopoly. While we all agree monopoly is bad and in general anti-monopoly legislation is good for the consumers, when it comes to technology companies, rapid expansion of technology companies has resulted in more innovation such as big data technologies, horizontally scalable architecture and cloud computing. If there were regulation, say in the mid-2000s, to beak up companies which have a Users table larger than say 100 million rows, the big data and cloud computing innovations would likely have not occurred and this would have directly inhibited the entire world GDP as cloud computing has ensured the ease and ubiquity of deep learning, machine learning, crunching of vast data sets such as those from the Large Hadron Collider, astrophysical data (Hubble Space Imagery), etc. Another drawback of anti-monopoly legislation is that all legislation is fraught with loopholes, which in turn just makes it more difficult for smaller companies to compete legally and makes it easier for companies with large pools for free cash flow to keep growing in other jurisdictions.

I’ll elaborate more on the localism argument for User table shortly, but before that I do want to propose an extension to the article and emphasize the security issues related to the companies having centralized repositories of user information. And how moving user data to the block chain might, just might, make it safer and actually enable the user to literally own their data, not just on paper but also physically.

Security

One of the things that we implicitly understood and were explicitly told by elders while growing up was to never disclose our full identity, address or any such details to any stranger that we come across. Most people follow this principle IRL, but most of us are extremely lax about it when it comes to our online footprints.

Part of it isn’t really our fault – most sites and platforms that we interact with online, be it businesses, banks, telecom, eCommerce, internet companies etc. require detailed information about our identity – name, email, address, date of birth, cell number, etc. We provide this information by rationalizing to ourselves that it wouldn’t be possible for these businesses to operate if we do not provide this information. After all, how will a bank really ascertain the identity of an applicant digitally and issue a product (e.g. chequing account, credit cards, etc.) in response to an online application.

Unfortunately headlines like below are a routine now and not a day goes by without one coming across extremely disturbing news like the ones below. It impacts all of us and not a day goes by, where our individual personally identifiable data is being compromised and in hands of entities that can use it for innumerable nefarious purposes.


 

Everyone understands that the main reason behind these kinds of hacks is that most businesses are sitting on a treasure trove of data. We implicitly assume that these companies implement extremely diligent data governance and cybersecurity practices, which are routinely audited. However, as we can see from the list above, there have been data leaks and hacks of massive amounts of personal information, even with these stringent cybersecurity governance practices and implementations. Hence, to solve the problem, it needs to be solved at the root of it. Businesses should not be allowed to store any customer’s personally identifiable information as a centralized repository attracts internal and external bad actors. 


1729 A Billion Users Table: An Implementation Approach

One thing we can expect all critically minded readers to call out is that storing user information on the block chain can open a pandora’s box when it comes to security, even if the data is encrypted. In defense of the article, I propose below a potential implementation approach while might assuage some concerns around user data on the block chain. The solution isn’t perfect but an attempt to convince some critics that it might actually be more secure.

Let’s start in true architect fashion with the goals we want this system to accomplish

System Goals/Objectives

  1.  Business should be able to know you, without requiring the storage of your personally identifiable information in a centralized repository
  2. A centralized identity service is a single point of vulnerability and failure
  3. A decentralized identity service with on-chain data is also a point of vulnerability
  4. If you want your data to be safe, some amount of accountability and responsibility lies with the individual user too
  5. All data will be divided in to two parts enforces by a smart contract: 
    1. Non-critical data stored encrypted on-chain and based on user authorization to be shared across entities
    2. All critical data that belongs to the user should be stored at a location of user’s choosing
      1. This could be on their computer, app, some custom storage solution (private cloud or on-prem service)
      2.  This could also be a custom solution, where the user provides a webhook, for entities to REST POST any critical data points for storage 

End Result

There is more to the end result that this system will accomplish but one of the things it will enable is a secure sign in mechanism using the block chain.


Registration

Before the user can sign in using the block chain based Users table, the identity needs to be registered. However, we never want the user’s real identity to end up in the User table.

One option, and this is in line with design goal of the user taking some accountability and responsibility for managing their data, would be to use a locally installed wallet app either on their desktop or phone, to generate an encrypted digital identity.

Here’s how it may look:

  1. User installs a wallet app. “wallet” is a loaded term in this context, it has no payment or money aspects, it is more of an identity wallet
  2. User clicks a button called “Create Identity”
  3. In response, the wallet asks the user to input (not all mandatory):
    1. Username
    2. Password
    3. Name
    4. Email
    5. Phone number
    6. Address
  4. Wallet does 2 factor auth to ensure email and phone provided indeed belong to the user
  5. Wallet creates a SHA 256 hash based on provided data. Creation of ID could also be combined with timestamp to allow the user to generate multiple IDs.
    1. Multiple IDs could be useful if the user wants to keep their data segregated in the user table for different entities. Example, a user not wanting to use the same generated ID for a bank that they use for social media
  6. In this sense, the wallet serves as “Certificate Authority” but for the individual user instead of a business organization or entity
  7. This generated SHA 256 ID becomes the user’s ID for the user’s table and uniquely identified the user “row” in user table
  8. On confirmation from the user, the wallet publishes the SHA 256 ID to the Blockchain, where participating nodes, on validating wallet and wallet provider’s signature (see Footnote), make the entry in to the block chain
  9. The wallet will expose a locally callable API that can be called by SDKs supporting the sign on process, just like Facebook Login or Google Login SDKs
  10. The wallet will serve one additional purpose and this is where we can really leverage block chain to safe guard user data. In addition to generating the user’s digital ID, it will also allow the user to configure a storage location that entities will use to store the user’s critical data (such as profile pictures, credit file details, etc.). 
    1. All critical data that belongs to the user should be stored on a location that the user provides
    2. All non-critical data, that can potentially also be shared across entities, will end up on-chain on the block chain

Data Model

The unencrypted version of on chain data model could be something like below with access enforced by smart contracts executed by validating block chain nodes:


Permissions

  1. Either through the wallet or through the Sign In process, the user can indicate in the wallet the domains (bank.ca, reddit.com, twitter.com, etc.) that validating nodes within the block chain check for permissions before granting any particular domain or company access to user’s data in the table.
  2. The user can grant granular level permissions that allow some domains access to some fields and not others.
  3. All businesses that use this mechanism have to adhere to the following protocols:
    1. Save any user specific data back in to the user table in the Block Chain
    2. Not make any back ups in their central repositories – One might think there is no way to prevent this but consequences for this could be permanent debarment for the business from the Block Chain and this very rich data source

Data Control

  1. The user, through the Wallet, should be able to directly access all the data that belongs to the user and delete / modify anything element as they seem fit. The data belongs to the user and the user should have full control over it
  2. There might be some proprietary data that business store that the user should be able to view but never edit or update (e.g., bank account balance).

 Open Questions

  1. What prevents a rogue wallet provider from not adhering to the standard of only ending the generated SHA 256 ID across the wire and not PII to a central repository managed by the wallet provider?
    1.  Seems like some kind of protocol enforcement is needed – not a big fan of this but something like how Google / Apple enforce rules for apps in their app store?
    2.  Protocol enforcement and wallet certification itself could be a block chain solution
  2. Businesses can save any non-proprietary data in the block chain but what about proprietary data
    1.  In this case, I think user’s data will get distributed across 3 storage locations: the private storage location for user specific critical data, on chain for non-critical / shareable data and the business stores any proprietary data associated with the business on its own storage.
    2. Even though the “read” and “write” operations for such a distributed storage system will take more operations, it will enable not only sign on using block chain but it will also make it harder for hackers and bad actors to grab millions of rows of data as now it will require at least three times the effort to read one row of user data.

Footnotes

  1. This does imply that wallet and wallet providers might need to be pre-registered in the block chain, just like browsers come pre-installed with CA's certificates to enable HTTPS communication. However, like everything else, a “wallet provider” would also be a user in the on-chain users table. This does create a “what was before the big bang” scenario, and a special case of initiating the block chain with native wallet providers, at least to start of, might be needed. 


Comments

Popular posts from this blog

Azure Virtual Machine — Visual Studio Platform Image Not Found

Rafique's Philadelphia Melancholy