Saturday, 29 November 2014


What is big data?  Wikipedia defines big data as an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using traditional data processing applications.   Big data is the machine-based collection and analysis of astronomical quantities of information.  Vast quantities of data are analyzed to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information.  Algorithms are used to search for statistical correlations between one kind of behavior and another.  The data is typically derived from the internet and includes web server logs and Internet clickstream data, social media content and social network activity reports, text from customer emails and survey responses, mobile-phone call detail records and machine data.

Essentially, big data is all about following and then analyzing our digital footprints on the internet and elsewhere.  Every aspect of our life including location records via mobile phones, purchases via credit cards and interests via web-surfing behavior has been recorded and potentially shared by some entity somewhere. All of this data is then analyzed to provide fundamental insights into both our personal and collective behavior.   The unpleasant reality is that these insights can be surprisingly revealing.

Just how powerfully intrusive is big data.  A recent study published in the journal Science found that just four bits of information mined from a shopper's credit card could be used to uniquely identify ninety per cent of individuals.  The credit card information was made up of ordinary, every day expenditures such as where the individual bought coffee or purchased a new sweater or pair of shoes.

Credit cards use was just as reliable at identifying someone as mobile phone records.  An individual could be re-identified with "just a few more additional data points” even if some of the specifics were removed from credit card data, such as the general area where a purchase was made instead of the specific shop or if the time range was expanded to one day to 15.  Interestingly women were more identifiable from metadata than men.  Similarly people with higher incomes were also easier to identify.

Data brokers collect consumer information and then sell it to other companies. They are not hindered by "do not track" option which are available on many browsers, since it is not legally binding.  Giant data-collection firms sort details of online and offline purchases to help categorize people as runners or hikers, spenders or savers, conservatives or liberals, main streamers or counter-culturalists and so on.

A bank will use the data to assess a mortgage applicants credit worthiness.  Big data’s supposed predictive qualities go far beyond the traditional elements which determine whether a person has good credit i.e. the person has paid all of his or her loans and debts in a timely fashion.  For example, a person’s contacts on Facebook or Twitter or LinkedIn can be used to assess his or her "character and capacity" when it comes to loans.  This might be done by “analyzing” friends. Are they rich? Are they poor.  If a number of associations with losers are detected by algorithms, a person can be labeled a poor credit risk and disqualified from a loan.  The individual may very well never find out about the true cause of the loan rejection - the “”lender will likely avoid being up front about the reason.  

It has been suggested that even the repayment history of the other customers of stores where a person shops can result in a negative "behavioral scoring".  Although this method of credit rating (guilt-by-association) is something that is totally beyond an individual’s control, lenders will use it because it provides a reliable statistical inference.   The consequences, however, for the individual will likely be horrendous.

Insurers are particularly excited by big data.  It is costly to use blood, urine and other physical tests to assess a person’s health.   Big data, however, may be able to reveal as much about a person as a lab analysis of bodily fluids.  The individual leaves an information trail that show such lifestyle factors as exercise habits and fast-food diets which can be used to estimate risk for illnesses such as high blood pressure and depression.  Similarly the individual might reveal concern about injuries and illnesses from websites that are visited.

The contemporary notion of privacy is based on two premises; namely, 1) individual choice in the sense that there is informed consent to disclosure and 2) on anonymization or that data is decoupled from an identified individual.  With big data, however, the information is used and reused in a manner separate from purpose for which it was originally provided.  Similarly even if the information is devoid of identifying data, the relationships between the individual pieces will identify the individual.

A person can avoid creating bad data by simply by cutting himself or herself off from the internet. We can’t help that suspect that “no data” will have the same negative consequence as having no credit cards and no mortgages.   

The negative impact of big data is not limited to a form of economic blackballing.  Edward Snowden revealed that the National Security Security Agency in the U.S. has full access to internet information and all the analytic tools of big data.  Combine this access to an individual’s digital footprint (which, of course, is mostly gratuitously created by the individual) with all the information that government already has on individuals through income tax and social program records, financial disclosure requirements, travel records and the like and you have the blueprint for surveillance at a level never previously approached even in highly totalitarian societies.  

Assurances are given that these super powers of surveillance will only be used to counter terrorism or the drug trade or other nefarious activities ... but once a “power” is there does it ever really remain benign for very long.  There will be an inevitable expansion in the ways that super surveillance is used to “regulate” the entire population.


  1. If you are engaging in activities that you don't want the NSA to know about, perhaps you shouldn't be engaging in those activities in the first place. If you are not guilty, then you have nothing to worry about.

  2. " If you are not guilty, then you have nothing to worry about." seems to be valid in the U.S. of A.
    But in civilized constitutional states, the legal dictum of the presumption of innocence still is valed.