Algorithm bias and pitfalls : How big data can benefit the economy without imperiling vulnerable groups


A few weeks ago, stakeholders gave their views on the upcoming Data Protection Bill that will be headed to Parliament for debate. There was obvious tension between some of the data protection principles and dig data practices that are commonly executed by private enterprises.

In particular, the principles of fairness, transparency, purpose limitation and data minimisation seem to go against the normal practices that big data companies take for granted.

Fairness and transparency require that those collecting personal data seek consent from the data subjects or citizens while educating them on how they would use their data.Purpose limitation ensures that data is processed only for the purpose that was defined before that data was collected while data minimisation limits the scope of what is collected according to the purpose defined.

In other words, if you are registering at a hospital as a patient, the hospital has no business asking for your tribe, county or your academic qualifications since such data may not add value for the purpose at hand – getting medical care.

However, from a big-data perspective, it is such “unnecessary” data that enables data analytics companies to discover useful patterns between different datasets that were previously hidden.

Big data is often described in terms of its four dimensions, the four V’s – volume, variety, velocity and veracity.

Data volume implies that the more data collected the better, and preferably from a variety of different sources, including social media or CCTV cameras. Velocity means that data is always changing or can be real-time while veracity implies that some of the data may be unreliable or not accurate.

With all these factors in mind, data companies would collect and curate all these data points about several subjects and draw valuable insights that could be both useful and detrimental.

In the hypothetical example of patient data, tribal or county-related data about a group of patients may be mined to reveal insights such as a particular population segment being more prone to malaria, cancer, fluorosis or HIV.


Such insights would obviously lead to better targeting of medical interventions for each of the vulnerable population segments.

However, this very insight could be abused by, for example, ensuring that the vulnerable population is deliberately further exposed to conditions that would worsen their ailment.

In other words, data collectors or controllers have the power to make positive or negative impacts on society based on their data analytics capabilities.

Sometimes the negative impacts are not deliberate but accidental, since the analytics algorithms tend to reinforce inherent biases that already exist in society.


For example, employee recruitment algorithms may prefer male CEOs based on big-data sources that insinuate that a majority of successful CEOs are male.

The algorithm would not be sensitive to the fact that whereas that data may be true, it is not based on superior male power, but rather on centuries of discriminative policies against women that saw them enter the formal workforce just seventy or so years ago.

In one country, women were finally allowed to drive just a year ago. Data on female drivers would be very limited in such populations to the extent that the algorithms would conclude that women are not proficient or good candidates to drive vehicles.

These are some of the challenges and blind corners of big data analytics. It is for these reasons that there must be frameworks in place to ensure that the economy does benefit from big data while not discriminating against segments of the population.

It is indeed possible and desirable to have both big data analytics and data protection principles working hand in hand.

Mr Walubengo is a lecturer at Multimedia University of Kenya, Faculty of Computing and IT. Email:, Twitter: @Jwalu

This article was originally published at the Nation Media Group

N.B: The views and opinions expressed in this article are those of the author and do not necessarily reflect the official  position of the African Academic Network on Internet Policy

Related Posts