Hope you had as much fun changing passwords over the last few days as I have. If you have not gotten to it yet, the best set of tools I found to deduce if a site is ready for a password switch post Heartbleed was in this Forbes article.
Just like Heartbleed has been a major distraction for every security and IT organization, it’s also got me off track in my thoughts about “big data”. Although it’s not totally off topic. You may recall a few weeks ago when the White House Office of Science and Technology Policy (OSTP) had its big deadline for the “big data” RFI they issued . I decided to submit some thoughts on some of their questions after wading through some of the really excellent content in a summit of sorts they held in the run up to the deadline. What became clear was that better access management policy and technology needed to be applied to achieve good privacy results. Part of that means making sure we keep our eye on who should have control over personal data. But another part that’s almost as important is that we keep our eye on the technologies used to enforce that access, which boils down to the various security mechanisms powering all manner of access controls. If there’s one thing Heartbleed has shown, it’s that we collectively don’t have a grasp on complicated infrastructure that enforces the controls over our collective data. We all need to demand that data is better locked down in every part of our lives. That’s going to call for a level of discipline in testing and modeling around security and privacy that simply has not existed before. Once the sum total of our lives are out there in “big data” repositories in the cloud, we won’t be able to afford another slip up like Heartbleed.
Having new discipline around access is going to start with the basics for most. Where is the data I need to lock down? What are the current means used to secure that data? Who currently has access to that data and what can they do with that access? These are basic questions regarding all kinds of data that most organizations struggle to answer. And that struggle is with all forms of data, not just the formal notion of distributed databases codified in the geek world as “big data”, not just with the gigantic amount of data being produced by the normal operations of so many businesses today and being stored in unstructured and structured form and loosely called “big data”, but also the yet to come and even larger sets of data that will result for using the first generation of “big data” and analysis to create the next generation.
As a final thought, I’d like to say how much I liked the OSTP’s and associated organizations collective choice to always refer to the problem space as a problem with “big data”. “Big data” is a much abused term these days. Marketers use it to refer to any data that is somewhat large (and which their products have some value around witch to offer). Technologists get red in the face reminding marketers that “big data” is only about distributed data storage and query engines like Hadoop. Then there are people in the sciences, medical, insurance, and government sectors producing gigantic datasets that defy explanation saying that they are dealing with “big data”. Using the quotes around “big data” in all the communications really seemed to capture that spirit. It says “big data” is not yet a complete definition. Of course, how can you secure something you don’t really understand? That kind of question won’t make your heart bleed, but it’s sure to give many heartburn in the near future.