Billionaires Decision Tree Model with IBM Cognos Analytics

Vanessa Fotso
5 min readDec 27, 2020

--

Introduction and Data Description

The number of billionaires has tremendously increased over the years. Forbes’ 2019 World’s Billionaires list records 2,153 billionaires in the world. This paper presents an overview of the billionaire dataset retrieved from https://think.cs.vt.edu/corgis/csv/billionaires/billionaires.html . The dataset has 22 variables with 2614 records. The data lists the name of billionaires in the world from 1996 to 2014, as well as their age, sex, country of origin and net worth. The data also classifies billionaires as self-made or inherited and provides information on the origin of wealth, the sector and the company associated with the billionaires’ wealth, as well as their political affiliation. The following table provides a brief description of the variables.

Billionaires Dataset Snapshot

We will be developing complete decision tree prediction models to predict a billionaire’s net worth, in order to better understand the characteristics of the wealthiest people on the planet.

Data Preparation and Exploration

A brief analysis shows that the number of billionaires as well as their total net worth has drastically grown over the past 20 years, with an emergence of self-made billionaires .

Rank and Worth in Billions for Years
Worth in Billions by Wealth Type and Year

The data exploration shows that North America has the most billionaires, with the total net worth of 3,821.4 billion, followed by Europe and East Asia.

Worth in Billions by Regions

Additionally, the year of 2014 has more data points, with 1653 billionaires recorded and a total net worth of 6.5 trillion.

Rank and Worth in Billions by Year

The median wealth is $2.1 billion, but the mean is $3.9 billion for that year. The difference between the mean and the median shows lot of variability among the billionaires’ population. Billionaires have acquired their wealth across several industries, most importantly in the consumer industry which contributed 19.1% of the total wealth of billionaires in 2014.

Worth in Billions by Year and Industry

Finally, our exploration reveals that several variables have missing values. Those variables include age, gender, GDP, industry, sector and wealth type. Those missing values and null values were filter out in Cognos Analytics during our analysis.

Decision Tree predictive Models

Using Cognos analytics, we have developed a complete decision tree to predict the wealth (worth in billion variable) of a billionaire or what characteristics made billionaires. Using all variables to develop the decision tree, the resulted tree shows that only 5 out of 21 variables from our dataset can predict a billionaire’s wealth. Those 5 variables include Rank, Year, GDP, Country code and Industry.

Decision Tree Diagram 1
Decision Tree Rules

The only single, best predictor of wealth is the rank of the billionaire, with the predictive strength of 35%. This is normal because rank has a linear relationship to net worth, the smaller the rank, the more money the billionaire makes. Additionally, the combination of rank and year or rank and GDP drive the wealth value by about 63%. This could be justified by the fact that the rank, year and GDP variables are strongly correlated (97%). Lastly, the four variables that best predict a billionaire’s net worth from the generated tree when combined are rank, year, country code and industry, with a predictive strength of 68%.

Our data exploration have shown that the number of billionaires and their global net worth have grown overtime. Additionally, some industry generates more revenue than others, and most billionaires reside in specific region of the planet. This could explain why the above four variables are the best combined predictors of wealth creation. The top five rules generated predict an average total net worth between 5.64 billion and 21.82 billion. For the top 4 rules, the billionaire’s rank is less than 174 in the year of 2014. This again supports the growth of wealth overtime. The rules also show which regions and industries generate the richest billionaires; this can also portray a country economic status.

Another decision tree was generated from the previous one by editing the drivers of the net worth value and selecting only five variables. In this second tree model, the combination of the selected five variables (rank, year, region, industry and age) best predict a billionaire net worth value by 68.1%. Furthermore, the new model rules increased the range of predicted top five average total net worth (in billion) from [5.64, 21.82] to [6.59 , 22.52]. The Sunburst also determined industry and regions as the strongest predictors; however, age is also an important factor.

Decision Tree Diagram 2
Decision Tree rules 2
Tree Sunburst

Conclusion

The billionaire dataset provides key information on the wealthiest people of the planet. The data gives detail on how wealth is created and display the trends in wealth growth. The decision trees generated in this paper show how variables such as regions ,country, year, industry and age contributes to wealth growth. It is important to note that Rank is not an indispensable variable for constructing a decision tree predicting a billionaire’s net worth because the net worth and the rank are strongly (100%) correlated. Additionally, the missing values of the industry and age variables might have compromised our decision tree models.

The decision tree modeling helps to better understand the factors influencing patterns by defining the importance of each factor.

--

--

Vanessa Fotso
Vanessa Fotso

Written by Vanessa Fotso

Health IT Software Engineer with broad technical exposure and passion for learning. https://www.buymeacoffee.com/vanessuniq

No responses yet