Government armament purchases and their effects on the number of casualties in armed conflicts around the world ¶

By: John Salter-Cid¶

Published: 12/6/2019¶

Last Edited: 12/6/2019¶

View site on Github or Github Pages

Table of Contents:¶

Introduction¶

Part 1a: Load & Wrangle Arms Transfer Dataset¶

Part 1b: Initial Exploration into Arms Transfer Database¶

Part 2a: Load & Wrangle Arms Transfer Dataset¶

Part 2b: Initial Exploration into Arms Transfer Database¶

Part 3: Merging the Datasets¶

Part 4: K-Nearest Neighbors Regression¶

Part 5: Conclusion¶

Introduction:¶

If you are like me then you may feel overwhelmed at times by the onslaught of news coverage surrounding wars - or threat thereof- around the world. By simply going to your preferred news source, you can surely find a news anchor outlining the increasing military tensions in countries like North Korea and Iran or bemoaning atrocities in war-torn countries like Syria and Yemen. Along with news stories like these, you will also surely see discussions between experts surrounding the militarization or demilitarization of these regions that may point to the escalation or deescalation of the conflict being discussed.

Thus, if you ARE like me, you may also be asking yourself, what does militarization mean?

I decided to use this curiosity as the basis for this project. Here, I will use two datasets, the SIPRI Arms Transfers Databases and the UCDP Battle-Related Deaths Database to try to figure out which countries are buying mass amounts of armaments, who is selling it to them, and if there is a correlation between these armament purchases and the number of fatalities in the wars these governments are involved in. This notebook will take data from these datasets and, walking through the data life cycle, will ultimately aim to answer these questions.

Dataset #1: SIPRI Arms Transfers Database

"The Arms Transfer Database tracks the international flow of major weapons — artillery, missiles, military aircraft, tanks, and the like. Maintained by the Stockholm International Peace Research Institute (SIPRI), the database contains documented sales since 1950 and is updated annually". This dataset, and similar datasets also provided by SIPRI are rich in information concerning everything you could possibly need to know about weapons sales across the world. It should be noted that the SIPRI dataset does not use convential currency to value the arms transfers. Instead, SIPRI has developed a unique pricing system to measure the volume of deliveries of major conventional weapons and components using a common unit the "SIPRI trend-indicator value" (TIV). The TIV of an item being delivered is intended to reflect its military capability rather than its financial value. This common unit can be used to measure trends in the flow of arms between particular countries and regions over time—in effect, a military capability price index.

Dataset #2: UCDP Battle-Related Deaths Data

"This dataset contains information on the number of battle-related deaths in the conflicts around the world from the years 1989 to 2018. The Uppsala Conflict Data Program (UCDP) is the world’s main provider of data on organized violence and the oldest ongoing data collection project for civil war, with a history of almost 40 years". This dataset has everything needed for deep looks into every armed conflict since 1989 and will prove usefule in our comparison with arms sales from the SIPRI dataset.

Part 1a: Load & Wrangle Arms Transfer Dataset¶

import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from scipy import stats
%matplotlib inline

# These two things are for Pandas, it widens the notebook and lets us display data easily.
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))

pd.set_option('display.max_rows', 10000)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

#These files can be found on the github page linked at the top of notebook.
arms_50_69 = pd.read_csv('arms_transfer_1950-1969.txt', sep=';')
arms_70_84 = pd.read_csv('arms_transfer_1970-1984.txt', sep=';')
arms_84_99 = pd.read_csv('arms_transfer_1985-1999.txt', sep=';')
arms_00_18 = pd.read_csv('arms_transfer_2000-2018.txt', sep=';')

#combine the files into one
dataframes = [arms_50_69,arms_70_84, arms_84_99, arms_00_18]
arms = pd.concat(dataframes)
# arms = arms.sort_values(by= ['Buyer','Delivery year'])

arms.head()

As we can see, SIPRI does a very good job of curating and maintaining their data. It doesn't appear that a lot of wrangling will be needed to get this dataset ready for analyzing. However, we should still look for null values and make sure the data types are how we want them.¶

Below you can see in the count row that there are no nulls in this dataset! SIPRI helps us out yet again.¶

arms.describe(include='all')

arms.dtypes

Deal ID                            int64
Seller                            object
Buyer                             object
Designation                       object
Description                       object
Armament category                 object
Order date                         int64
Order date is estimate            object
Numbers delivered                  int64
Numbers delivered is estimate     object
Delivery year                      int64
Delivery year is estimate         object
Status                            object
SIPRI estimate                   float64
TIV deal unit                    float64
TIV delivery values              float64
Local production                  object
dtype: object

Looking at the data types listed above, we can see that they are almost all where we want them. The only change we need is to "Armament category". It is currently a object and it should be a category data type. A simple "astype" command is all we need to fix that.¶

arms['Armament category'] = arms['Armament category'].astype('category')

To demonstrate what type of information this dataset can show us, look at command below. Here we can see the arms deals had between Germany and Singapore in the year 2009. It shows that Germany delivered 2 new turbofan engines, 8 new diesel engines, and 12 used tanks totaling 38.68 million TIV values (the TIV values are in millions).¶

arms.loc[(arms["Delivery year"] == 2009) & (arms["Seller"] == 'Germany') & (arms["Buyer"] == 'Singapore')]

Now, the arms dataset is ready to be analyzed! Before we load the other datasets, let's do some exploratory analysis to see what the arms dataframe has to tell us.¶

Part 1b: Initial Exploration into Arms Transfer Database¶

Let's start simple. Let's make a lineplot showing the TIV delivery values across time. I prefer Seaborn so I will be using for many of the graphs in this article.¶

#Total TIV delivery values across the globe since 1950
sns.lineplot(x='Delivery year', y="TIV delivery values", data=arms)

<matplotlib.axes._subplots.AxesSubplot at 0x284c2049448>

After a spike in the early 1950's, total sales of armament has seemed to fluctuate between 25 and 50 million delivery values per year.¶

To align with our original inquiries about who are the top buyers and sellers of armaments across the world, let's look at the top 10 buyers and sellers as defined by the greatest total number of TIV delivery values.¶

Let's do the top 10 sellers first.¶

# #filtering data for the 10 ten sellers as dictated by total sales since 1950

arms = arms.replace({"Soviet Union" : "Soviet Union/Russia", "Russia" : "Soviet Union/Russia"})#noticing the Soviet Union and Russia are seperated, I combine them here.

sum_seller = arms.groupby(['Seller']).sum() #groupby seller and aggregate by sum
sum_seller = sum_seller.sort_values(by=["TIV delivery values"], ascending=False) #sort the result by TIV delivery values to get the top sellers on top
sum_seller["rank"]= sum_seller["TIV delivery values"].rank(ascending= False) #create a rank column to make it easy to pull a certain slice of the data
top10_seller = sum_seller.loc[sum_seller["rank"]<=10] #pull just the top 10 sellers

top10_seller = top10_seller.reset_index()

top10_seller_list = top10_seller['Seller'].tolist() 
arms_seller = arms.groupby(['Seller',"Delivery year"]).sum()

arms_seller = arms_seller.reset_index()

arms_seller = arms_seller.loc[arms_seller["Seller"].isin(top10_seller_list)]

#plot showing the top 10 sellers and how much they have sold since 1950
sns.set_style("whitegrid")

f, ax = plt.subplots(figsize=(20, 10))
ax.set(xlim=(1950, 2018))
ax.set_title('Top 10 ten sellers by total TIV delivery values since 1950')
sns.lineplot(x='Delivery year', y="TIV delivery values", hue='Seller', ax=ax, data=arms_seller)

<matplotlib.axes._subplots.AxesSubplot at 0x284c212f548>

Not surprisingly, we can see that the United States and Russia are far and beyond the leading sellers of armaments in the world. To demonstrate that further we can plot the the percentage of world sales that is attributed to just the US and Russia per year.¶

#take USA and Russia per year and then divide total per year and get new column of percentage per year
usa_russia = arms.loc[(arms["Seller"]=="United States")|(arms["Seller"]=="Soviet Union/Russia")]
usa_russia = usa_russia.groupby(["Delivery year"]).sum()
usa_russia = usa_russia["TIV delivery values"]
usa_russia = usa_russia.reset_index()
world = arms.groupby(["Delivery year"]).sum()
world = world["TIV delivery values"]
world = world.reset_index()

usa_russia= pd.merge(usa_russia, world, left_on='Delivery year', right_on='Delivery year')

usa_russia["(USA+Russia) / World"] = (usa_russia["TIV delivery values_x"])/(usa_russia["TIV delivery values_y"])

usa_russia.head(2)

fig, ax = plt.subplots(figsize=(13,8))

usa_russia.plot(x="Delivery year", y="(USA+Russia) / World", ax=ax)

ax.set_title('Proportion of world armament sales by USA and Russia alone')
ax.set_xlabel("Year")
ax.set_ylabel("Ratio of (USA+Russia) sales to total world sales")

Text(0, 0.5, 'Ratio of (USA+Russia) sales to total world sales')

At one point (around 1963-64), you can see that US and Russia alone accounted for over 80% of world armament sales! With the industrialization of other nations having occured since this time, it is not surprising to see a general decrease in this proportion over time.¶

Though seeing just how much the US and Russia dominate world arms sales is interesting, to answer our question of how the buying of armaments affects the number of casualites in war, let's look at the top 10 buyers.¶

#filtering data for the 10 ten buyers as dictated by total sales since 1950

sum_buyer = arms.groupby(['Buyer']).sum()
sum_buyer = sum_buyer.sort_values(by=["TIV delivery values"], ascending=False)
sum_buyer["rank"] = sum_buyer["TIV delivery values"].rank(ascending= False)
top10_buyer = sum_buyer.loc[sum_buyer["rank"]<=10]

top10_buyer = top10_buyer.reset_index()

top10_buyer_list = top10_buyer['Buyer'].tolist()

arms_buyer = arms.groupby(['Buyer',"Delivery year"]).sum()

arms_buyer = arms_buyer.reset_index()

arms_buyer = arms_buyer.loc[arms_buyer["Buyer"].isin(top10_buyer_list)]

arms_buyer.head()

#plot showing the top 10 buyers and how much they have bought since 1950.
sns.set_style("whitegrid")

f, ax = plt.subplots(figsize=(20, 10))
ax.set_title('Top 10 ten buyers by total TIV delivery values since 1950')

sns.lineplot(x='Delivery year', y="TIV delivery values", hue='Buyer', palette=sns.color_palette('Paired', n_colors=10), ax=ax, data=arms_buyer)

<matplotlib.axes._subplots.AxesSubplot at 0x284c2439208>

This graph paints a very different picture. Where all sellers were dwarfed by the US and Russia, here we can see no one country is consistently the top buyer. These changes in the level of armament purchasing show that countries go through periods of higher and lower military-related purchasing. Later we will see how these periods of purchasing relate to conflicts that the countries participate in.¶

Below you can see the top 10 lists of both buyers and sellers in tabular form.¶

top10_buyer #top 10 buyers all-time (1950-2018)

top10_seller #top 10 sellers all-time (1950-2018)

Before we move on to the other datasets, let's just look at something other than the TIV delivery values. Personally, I was interested to see how the types of armaments purchased changed from 1950 to 2018.¶

fig, ax = plt.subplots(figsize=(14, 8))

armament_count_1950 = arms.loc[arms['Delivery year']==1950]['Armament category'].value_counts()
armament_count_2018 = arms.loc[arms['Delivery year']==2018]['Armament category'].value_counts()

ax.set_title("Armament Categories in 1950")
armament_count_1950.plot.pie(ax=ax, legend=False)

<matplotlib.axes._subplots.AxesSubplot at 0x284c38686c8>

fig, ax = plt.subplots(figsize=(14, 8))

ax.set_title("Armament Categories in 2018")

armament_count_2018.plot.pie(ax=ax)

<matplotlib.axes._subplots.AxesSubplot at 0x284c45cc108>

Now that we have taken a look at the arms dataset, let's load and look at the UCDP Batle-Related Deaths Data.

Part 2a: Load & Wrangle the Dataset¶

deaths = pd.read_csv('BattleDeaths_v19_1.csv')
deaths.head()

There are a lot of columns we don't need, let's drop them now.¶

deaths = deaths.drop(columns=['dyad_id', 'location_inc', 'side_a_id', 'side_a_2nd',
                            'side_b_id', 'side_b_2nd', 'territory_name', 'battle_location',
                            'gwno_a', 'gwno_a_2nd', 'gwno_b', 'gwno_b_2nd', 'gwno_loc', 
                            'gwno_battle', 'version'])
deaths.head()

deaths.describe(include='all')

deaths.dtypes

conflict_id          int64
side_a              object
side_b              object
incompatibility      int64
year                 int64
bd_best              int64
bd_low               int64
bd_high              int64
type_of_conflict     int64
region              object
dtype: object

As shown above this data has now has no nulls and all the correct data types. However, we will still have to do some wrangling to get this dataset ready for analyzing. First, the "side_a" column that displays the government side of the given conflict needs to be trimmed down. Every cell starts with "Government of..." and then the name of the country. We just want the country name. Also, some of the conflicts have more than one country cited in the 'side_a" column, we just want the first country listed (the main one).¶

#remove the 'Government of..' from the beginning of every side_a value
deaths['side_a'] = deaths.side_a.apply(lambda x: x[14:])

#the 'side_a column' in a handful of instances has more than one country/government. I split the column and
#just kept the first(main) country
deaths[['side_a','others']] = pd.DataFrame(deaths['side_a'].str.split(',',1).tolist(), 
                                                                    columns = ['side_a','others'])
deaths = deaths.drop(columns=['others'])

deaths.head()

A similar problem that we encountered occurs in the "region" column, there are some rows with more than one region cited. Let's split the column and do two things, keep the main region (the first one cited) and create a dataframe with the counts of conflicts per region per year (to be used in some interesting exploratory analysis).¶

#I want to break up the region of the conflict for later analysis (some of the rows have multiple regions divided by commas)
#the first region cited is the main region, so I will keep it in the data frame and pull the rest out into a seperate dataframe.
deaths[['region_main','region2','region3','region4',]] = pd.DataFrame(deaths['region'].str.split(',',4).tolist(), 
                                                                    columns = ['region_main','region2','region3','region4'])

deaths["region1_bool"] = (deaths["region_main"] == "1") | (deaths["region2"] == "1") | (deaths["region3"] == "1") | (deaths["region4"] == "1")
deaths["region2_bool"] = (deaths["region_main"] == "2") | (deaths["region2"] == "2") | (deaths["region3"] == "2") | (deaths["region4"] == "2")
deaths["region3_bool"] = (deaths["region_main"] == "3") | (deaths["region2"] == "3") | (deaths["region3"] == "3") | (deaths["region4"] == "3")
deaths["region4_bool"] = (deaths["region_main"] == "4") | (deaths["region2"] == "4") | (deaths["region3"] == "4") | (deaths["region4"] == "4")
deaths["region5_bool"] = (deaths["region_main"] == "5") | (deaths["region2"] == "5") | (deaths["region3"] == "5") | (deaths["region4"] == "5")

regions =deaths.copy()
deaths = deaths.drop(columns=["region","region2","region3","region4", "region1_bool", "region2_bool", "region3_bool", "region4_bool", "region5_bool"])
deaths.head()

#To create our seperate dataframe with the counts of the conflicts per region we will need to pull out only the "True" values from the dataframe, 
#as False is not very useful.

from functools import reduce

regions =regions[["year", "region1_bool", "region2_bool", "region3_bool", "region4_bool", "region5_bool"]]

regions1 = regions.groupby(["year","region1_bool"])
one = regions1.size().unstack().add_prefix('region1')
regions2 = regions.groupby(["year","region2_bool"])
two = regions2.size().unstack().add_prefix('region2')
regions3 = regions.groupby(["year","region3_bool"])
three = regions3.size().unstack().add_prefix('region3')
regions4 = regions.groupby(["year","region4_bool"])
four = regions4.size().unstack().add_prefix('region4')
regions5 = regions.groupby(["year","region5_bool"])
five = regions5.size().unstack().add_prefix('region5')

data_frames = [one,two,three,four,five]


regions = reduce(lambda  left,right: pd.merge(left,right,on=['year'],
                                            how='outer'), data_frames).fillna(0)

#The region codes are laid out in the codebook on my github.
columns = ["region1True","region5True", "region2True","region4True","region3True"]
regions = regions[columns]
regions = regions.rename (columns={
    "region1True" : "Europe",
    "region2True" : "Middle East",
    "region3True" : "Asia",
    "region4True" : "Africa",
    "region5True" : "Americas"
})
regions.head()

Now we are ready for some exploratory data analysis!¶

Let's start with the 'regions' dataframe we just created and plot a stack plot¶

We see that Africa and Asia are usually the regions with the most conflicts!¶

fig, ax = plt.subplots(figsize=(12,7))

regions.plot(kind='area', figsize=[16,6], stacked=True, colormap='rainbow', ax=ax)

ax.set_title('Conflict by region from 1990-2018')
ax.set_ylabel('Number of conflicts')

Text(0, 0.5, 'Number of conflicts')

Now, let's check how these regions compare in terms of the number of battle-related deaths per year. To do this we will just need to relabel the regions to their propers labels and groupby region_main and year¶

deaths['region_main'] = deaths['region_main'].map({
    "1" : "Europe",
    "2" : "Middle East",
    "3" : "Asia",
    "4" : "Africa",
    "5" : "Americas"
})

plot = deaths.groupby(["region_main","year"]).sum().reset_index()

fig, ax = plt.subplots(figsize=(12,7))

sns.lineplot(x='year', y="bd_best", hue='region_main', palette=sns.color_palette('Paired', n_colors=5), ax=ax, data=plot)

<matplotlib.axes._subplots.AxesSubplot at 0x284c5dfb148>

Interestingly, the Middle East, despite having lower numbers of conflicts in recent years compared to Africa and Asia, clearly shows that its battles are far more deadly.¶

Part 3: Merging the two datasets ('deaths','arms')¶

To merge the datasets, we will need to edit the spellings of the countries to make sure they match¶

# edit names of countries in deaths dataset to match names in arms dataset
side_a_list = deaths['side_a'].unique()
np.sort(side_a_list, axis=None)

array(['Afghanistan', 'Algeria', 'Angola', 'Australia', 'Azerbaijan',
       'Bangladesh', 'Bosnia-Herzegovina', 'Burkina Faso', 'Burundi',
       'Cambodia (Kampuchea)', 'Cameroon', 'Central African Republic',
       'Chad', 'China', 'Colombia', 'Comoros', 'Congo', 'Croatia',
       'DR Congo (Zaire)', 'Djibouti', 'Ecuador', 'Egypt', 'El Salvador',
       'Eritrea', 'Ethiopia', 'Georgia', 'Guatemala', 'Guinea',
       'Guinea-Bissau', 'Haiti', 'India', 'Indonesia', 'Iran', 'Iraq',
       'Israel', 'Ivory Coast', 'Jordan', 'Kenya', 'Laos', 'Lebanon',
       'Lesotho', 'Liberia', 'Libya', 'Macedonia', 'Malaysia', 'Mali',
       'Mauritania', 'Mexico', 'Moldova', 'Morocco', 'Mozambique',
       'Myanmar (Burma)', 'Nepal', 'Nicaragua', 'Niger', 'Nigeria',
       'Pakistan', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru',
       'Philippines', 'Rumania', 'Russia (Soviet Union)', 'Rwanda',
       'Senegal', 'Serbia (Yugoslavia)', 'Sierra Leone', 'Somalia',
       'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Syria',
       'Tajikistan', 'Thailand', 'Trinidad and Tobago', 'Tunisia',
       'Turkey', 'Uganda', 'Ukraine', 'United Kingdom',
       'United States of America', 'Uzbekistan', 'Venezuela',
       'Yemen (North Yemen)'], dtype=object)

#edit the deaths dataframes spellings to match they way they are presented above
deaths['side_a']= deaths['side_a'].replace({
    "Yemen (North Yemen)" : "Yemen",
    "United States of America" : "United States",
    "Serbia (Yugoslavia)" : "Serbia",
    "Russia (Soviet Union)" : "Soviet Union/Russia",
    "Myanmar (Burma)" : "Myanmar",
    "DR Congo (Zaire)" : "DR Congo",
    "Cambodia (Kampuchea)" : "Cambodia"
})

#the deaths df starts in 1989 so we can just pull the arms transfer data from those years before merging. Then merge!
arms_merge = arms.loc[arms['Delivery year']>=1989]
arms_merge = arms_merge.groupby(['Buyer','Delivery year']).sum().reset_index()

merged = pd.merge(arms_merge, deaths, how='outer', left_on=['Buyer','Delivery year'],
                  right_on=['side_a','year'])
merged.head(2)

#notice the thousands of nulls in the merged dataframe. Time to eliminate all nulls!
merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4248 entries, 0 to 4247
Data columns (total 18 columns):
Buyer                  3921 non-null object
Delivery year          3921 non-null float64
Deal ID                3921 non-null float64
Order date             3921 non-null float64
Numbers delivered      3921 non-null float64
SIPRI estimate         3921 non-null float64
TIV deal unit          3921 non-null float64
TIV delivery values    3921 non-null float64
conflict_id            1565 non-null float64
side_a                 1565 non-null object
side_b                 1565 non-null object
incompatibility        1565 non-null float64
year                   1565 non-null float64
bd_best                1565 non-null float64
bd_low                 1565 non-null float64
bd_high                1565 non-null float64
type_of_conflict       1565 non-null float64
region_main            1565 non-null object
dtypes: float64(14), object(4)
memory usage: 630.6+ KB

#Fill all NaNs in the new merged dataset. 

merged['Buyer'] = merged['Buyer'].fillna(merged['side_a']) #Buyer and Side_a are the same 
merged['year'] = merged['year'].fillna(merged['Delivery year']) #Delivery year and year of conflict are the same

#All columns in 'cols' are columns that are zero when null (either no purchase or no conflict).
#Note, '0' in the columns 'incompatibility' and 'type_of_conflict' will become new categories that represent
#"no conflict"
cols = ['Numbers delivered','TIV deal unit','TIV delivery values', 'incompatibility',
       'bd_best', 'bd_low', 'bd_high', 'type_of_conflict']
merged[cols] = merged[cols].replace({np.nan:0})

#For year of no conflict, set 'conflict_id' and 'side_b' to 'no conflict'
merged[['conflict_id','side_b', 'region_main']] = merged[['conflict_id','side_b', 'region_main']].replace({np.nan:"no conflict"})

#drop unnecessary columns
merged = merged.drop(columns=['Delivery year', 'Order date', 'side_a', 'Deal ID', 'SIPRI estimate'])

#No more NaNs! We just have to change 'incompatibility', 'region_main' and 'type_of_conflict' to category data types.

for col in ['incompatibility', "type_of_conflict", 'region_main']:
    merged[col] = merged[col].astype('category')
    
print(merged.info())
    
#Now we're ready to analyze the data.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4248 entries, 0 to 4247
Data columns (total 13 columns):
Buyer                  4248 non-null object
Numbers delivered      4248 non-null float64
TIV deal unit          4248 non-null float64
TIV delivery values    4248 non-null float64
conflict_id            4248 non-null object
side_b                 4248 non-null object
incompatibility        4248 non-null category
year                   4248 non-null float64
bd_best                4248 non-null float64
bd_low                 4248 non-null float64
bd_high                4248 non-null float64
type_of_conflict       4248 non-null category
region_main            4248 non-null category
dtypes: category(3), float64(7), object(3)
memory usage: 378.1+ KB
None

merged.head(2)

#The dataframe has yet to be grouped by buyer/year/conflict
#we can do that now by pulling out the columns that identify conflicts (Buyer->year->type_of_conflict->incompatibility->region_main) and summing the rest
merged = merged.groupby(['Buyer','year','type_of_conflict','incompatibility','region_main']).sum()
merged = merged.loc[merged['bd_best'].notnull()].reset_index()

#drop the null rows and we're done!
merged = merged.dropna()
merged.head()

Above is our final dataframe that we will now use to complete some more advanced analysis with.¶

Part 4: K-Nearest Neighbors Regression¶

Now that we have successfully loaded, wrangled and merged our dataframes, we can try to answer the question that brought us here in the first place: is there a correlation between how much a country spends on armaments and the number of casualties in the wars they are fighting in on the year the weapons are delivered? In other words, if a country purchased, for example, 20 fighter jets in 2015, do we see an uptick in the number of deaths in the war(s) that the country is involved in?¶

To accomplish this, we will run a K-Nearest Neighbors regression using 10, 50, and 100 nearest neighbors.¶

my_dpi = 150

X_train = merged[["TIV delivery values"]]
y_train = merged["bd_best"]

X_new = pd.DataFrame()
X_new["TIV delivery values"] = np.arange(0, 15000, 100)

def get_NN_prediction(x_new, n):
    """Given new observation, returns n-nearest neighbors prediction
    """
    dists = ((X_train - x_new) ** 2).sum(axis=1)
    inds_sorted = dists.sort_values().index[:n]
    return y_train.loc[inds_sorted].mean()

fig = plt.figure(figsize=(680/my_dpi, 480/my_dpi), dpi=my_dpi)

plt.scatter(merged['TIV delivery values'], merged['bd_best'], c="black", alpha=.3)
# plt.xscale('log')
# plt.yscale('log')
plt.ylim(0,25000)
plt.ylabel('deaths from conflict')
plt.xlabel('TIV Delivery Values')
plt.title('TIV Delivery Values Per Year Per Country vs Death Toll From Conflict (1989-2018)')

colors=['blue','green','red']

for i,k in enumerate ([10,50,100]):
    y_new_pred = X_new.apply(get_NN_prediction, axis=1, args=(k,))
    y_new_pred.index = X_new
    y_new_pred.plot.line(color=colors[i], label=str(k), legend=True)

from sklearn.feature_extraction import DictVectorizer
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score

# get the features (in dict format) and the labels
# (do not split into training and validation sets)
features = ["TIV delivery values"]
X_dict = merged[features].to_dict(orient="records")
y = merged["bd_best"]

# specify the pipeline
vec = DictVectorizer(sparse=False)
scaler = StandardScaler()

def get_cv_error(k):
    model = KNeighborsRegressor(n_neighbors=k)
    pipeline = Pipeline([("vectorizer", vec), ("scaler", scaler), ("fit", model)])
    rmse = np.sqrt((-cross_val_score(
        pipeline, X_dict, y, 
        cv=5, scoring="neg_mean_squared_error")).mean())
    return rmse
    
ks = pd.Series(range(1, 100))
ks.index = range(1, 100)
test_errs = ks.apply(get_cv_error)

test_errs.plot.line()
test_errs.sort_values()
print(test_errs.min())
print(test_errs.idxmin())

2476.3939796958857
99

It is immediately clear that there is no correlation between TIV delivery values and the number of deaths that occur in conflicts!¶

Part 5: Conclusion¶

We began this project with the question, "Is increased military spending (increased TIV delivery values), correlated with increased casualties in war". Given the skills I have acquired in this course, I could not find a correlation between these two variables! Interestingly, it appeared that some of the bloodiest conflicts involved countries that spent the least on weapons. My rationalization for this phenomenon is that the countries that spend the most on weapons are actually using the weapons as war deterrents instead of actually using the weapons for war. Much like how countries will produce nuclear weapons to deter invaders, countries that spend the most on weapons may looking to avoid war. I will be taking the machine learning course next semester and will look forward to revisiting this project with more data analysis skills!¶

	Deal ID	Seller	Buyer	Designation	Description	Armament category	Order date	Order date is estimate	Numbers delivered	Numbers delivered is estimate	Delivery year	Delivery year is estimate	Status	SIPRI estimate	TIV deal unit	TIV delivery values	Local production
0	1929	United States	Mexico	Cessna-150	Trainer/light ac	Aircraft	1967	Yes	2	No	1968	No	New	0.10	0.10	0.20	No
1	1943	Soviet Union	Algeria	T-54	Tank	Armoured vehicles	1965	Yes	25	No	1966	No	Second hand	1.10	0.44	11.00	No
2	1944	Soviet Union	Algeria	T-55	Tank	Armoured vehicles	1965	Yes	25	No	1966	No	New	1.25	1.25	31.25	No
3	1957	United States	Morocco	DC-3/C-47 Skytrain	Transport aircraft	Aircraft	1964	Yes	5	No	1966	No	Second hand	2.50	1.00	5.00	No
4	1968	Soviet Union	Morocco	Il-28	Bomber aircraft	Aircraft	1960	Yes	2	No	1961	No	Second hand	7.20	2.88	5.76	No

	Delivery year	TIV delivery values_x	TIV delivery values_y	(USA+Russia) / World
0	1950	5762.5456	8467.3282	0.680562
1	1951	8577.0240	12442.6120	0.689327

	Buyer	Delivery year	Deal ID	Order date	Numbers delivered	SIPRI estimate	TIV deal unit	TIV delivery values
1237	China	1950	269649	33139	2150	27.27	19.200	2608.368
1238	China	1951	283683	35090	1650	37.09	30.736	2272.350
1239	China	1952	200077	25347	1807	30.99	29.676	3504.600
1240	China	1953	182350	25357	1655	34.41	33.096	3219.100
1241	China	1954	304845	37092	1553	209.47	122.128	3500.444

	Buyer	Deal ID	Order date	Numbers delivered	Delivery year	SIPRI estimate	TIV deal unit	TIV delivery values	rank
0	India	57887627	4468473	187608	4483608	26054.56	25031.9922	121103.7664	1.0
1	China	30574155	2031451	49224	2039464	12618.17	11023.3800	78690.3560	2.0
2	Saudi Arabia	33467199	2215622	137250	2219039	9032.50	8925.7442	65447.7042	3.0
3	Egypt	31782730	2636362	113208	2640598	11080.65	9076.3044	65303.4236	4.0
4	Japan	44979667	3235545	37685	3248327	12850.80	12542.8300	64616.6400	5.0
5	Turkey	33063476	2634744	89147	2640060	16281.01	12331.0512	56468.6808	6.0
6	Germany	22384407	1774168	111758	1778040	4956.51	4377.0516	55870.5744	7.0
7	Iraq	25869597	2202360	121918	2205350	4591.87	4203.1516	54047.2000	8.0
8	South Korea	32065888	2423808	47588	2429460	10840.70	9403.7300	53230.0040	9.0
9	Iran	19612021	1812797	106839	1816485	5864.09	5382.2480	46671.1400	10.0

	Seller	Deal ID	Order date	Numbers delivered	Delivery year	SIPRI estimate	TIV deal unit	TIV delivery values	rank
0	United States	419502244	33358573	1095372	33415508	140837.60	111734.3926	682606.7928	1.0
1	Soviet Union/Russia	246221641	22883356	1045335	22922271	81985.82	73158.9280	593358.3480	2.0
2	United Kingdom	85694167	7602670	111228	7617535	39548.38	31624.8866	141384.8036	3.0
3	France	115408766	10100470	390869	10123095	30082.55	28199.4768	122557.5860	4.0
4	Germany	80282433	5376753	186632	5390688	34997.34	32559.2224	86740.1198	5.0
5	China	59288476	3725235	87779	3732447	11183.03	10897.0980	54394.8700	6.0
6	Italy	52317122	3630452	20109	3638111	11759.66	11177.4400	32854.5300	7.0
7	Czechoslovakia	6165839	671462	21344	672885	587.30	565.5860	29328.9300	8.0
8	Netherlands	34332690	2207456	8802	2212278	17475.41	12420.5500	24302.4430	9.0
9	Israel	44022856	2418445	50978	2422255	4811.15	4188.0110	17456.6030	10.0

	Deal ID	Seller	Buyer	Designation	Description	Armament category	Order date	Order date is estimate	Numbers delivered	Numbers delivered is estimate	Delivery year	Delivery year is estimate	Status	SIPRI estimate	TIV deal unit	TIV delivery values	Local production
count	54321.000000	54321	54321	54321	54321	54321	54321.000000	54321	54321.000000	54321	54321.000000	54321	54321	54321.000000	54321.000000	54321.000000	54321
unique	NaN	132	257	3827	248	11	NaN	2	NaN	2	NaN	2	3	NaN	NaN	NaN	2
top	NaN	United States	India	M-113	APC	Aircraft	NaN	Yes	NaN	Yes	NaN	No	New	NaN	NaN	NaN	No
freq	NaN	16834	2252	439	2891	15991	NaN	31031	NaN	37156	NaN	33436	44682	NaN	NaN	NaN	46079
mean	25898.554040	NaN	NaN	NaN	NaN	NaN	1983.341249	NaN	59.816020	NaN	1986.905285	NaN	NaN	7.812927	6.569592	35.408285	NaN
std	17247.522631	NaN	NaN	NaN	NaN	NaN	17.974236	NaN	216.611329	NaN	18.349691	NaN	NaN	28.270133	23.833243	85.932376	NaN
min	2.000000	NaN	NaN	NaN	NaN	NaN	1940.000000	NaN	1.000000	NaN	1950.000000	NaN	NaN	0.010000	0.004000	0.020000	NaN
25%	14184.000000	NaN	NaN	NaN	NaN	NaN	1970.000000	NaN	2.000000	NaN	1973.000000	NaN	NaN	0.250000	0.200000	4.000000	NaN
50%	24108.000000	NaN	NaN	NaN	NaN	NaN	1982.000000	NaN	10.000000	NaN	1986.000000	NaN	NaN	1.100000	1.000000	10.000000	NaN
75%	36902.000000	NaN	NaN	NaN	NaN	NaN	1999.000000	NaN	38.000000	NaN	2003.000000	NaN	NaN	5.100000	4.750000	30.000000	NaN
max	62537.000000	NaN	NaN	NaN	NaN	NaN	2018.000000	NaN	7500.000000	NaN	2018.000000	NaN	NaN	1579.000000	1250.000000	2150.860000	NaN

	Deal ID	Seller	Buyer	Designation	Description	Armament category	Order date	Order date is estimate	Numbers delivered	Numbers delivered is estimate	Delivery year	Delivery year is estimate	Status	SIPRI estimate	TIV deal unit	TIV delivery values	Local production
783	38809	Germany	Singapore	BR-710	Turbofan	Engines	2007	No	2	Yes	2009	No	New	2.5	2.50	5.00	No
11319	36197	Germany	Singapore	MTU-8000	Diesel engine	Engines	2000	Yes	8	No	2009	No	New	4.0	4.00	32.00	No
11597	31953	Germany	Singapore	Leopard-2A4	Tank	Armoured vehicles	2007	No	12	No	2009	No	Second hand but modernized	4.0	2.64	31.68	No

	conflict_id	dyad_id	location_inc	side_a	side_a_id	side_a_2nd	side_b	side_b_id	side_b_2nd	incompatibility	territory_name	year	bd_best	bd_low	bd_high	type_of_conflict	battle_location	gwno_a	gwno_a_2nd	gwno_b	gwno_b_2nd	gwno_loc	gwno_battle	region	version
0	205	406	Iran	Government of Iran	114	NaN	KDPI	164	NaN	1	Kurdistan	1990	31	31	335	3	Iran	630	NaN	NaN	NaN	630	630	2	19.1
1	205	406	Iran	Government of Iran	114	NaN	KDPI	164	NaN	1	Kurdistan	1993	110	110	110	3	Iran, Iraq	630	NaN	NaN	NaN	630	630, 645	2	19.1
2	205	406	Iran	Government of Iran	114	NaN	KDPI	164	NaN	1	Kurdistan	1996	27	27	29	3	Iran, Iraq	630	NaN	NaN	NaN	630	630, 645	2	19.1
3	205	406	Iran	Government of Iran	114	NaN	KDPI	164	NaN	1	Kurdistan	2016	30	30	137	3	Iran, Iraq	630	NaN	NaN	NaN	630	630, 645	2	19.1
4	205	406	Iran	Government of Iran	114	NaN	KDPI	164	NaN	1	Kurdistan	2018	44	33	78	3	Iran, Iraq	630	NaN	NaN	NaN	630	630, 645	2	19.1

	conflict_id	side_a	side_b	incompatibility	year	bd_best	bd_low	bd_high	type_of_conflict	region
count	1565.000000	1565	1565	1565.000000	1565.000000	1565.000000	1565.000000	1565.000000	1565.000000	1565
unique	NaN	86	349	NaN	NaN	NaN	NaN	NaN	NaN	7
top	NaN	Government of India	IS	NaN	NaN	NaN	NaN	NaN	NaN	3
freq	NaN	161	71	NaN	NaN	NaN	NaN	NaN	NaN	564
mean	1441.824281	NaN	NaN	1.549521	2003.270927	900.244089	786.077955	1156.854952	3.150799	NaN
std	3582.614254	NaN	NaN	0.500263	9.082850	3722.826610	2862.735593	4064.040425	0.411170	NaN
min	205.000000	NaN	NaN	1.000000	1989.000000	25.000000	1.000000	24.000000	2.000000	NaN
25%	288.000000	NaN	NaN	1.000000	1995.000000	43.000000	42.000000	60.000000	3.000000	NaN
50%	333.000000	NaN	NaN	2.000000	2003.000000	118.000000	112.000000	173.000000	3.000000	NaN
75%	388.000000	NaN	NaN	2.000000	2012.000000	489.000000	454.000000	728.000000	3.000000	NaN
max	14609.000000	NaN	NaN	3.000000	2018.000000	68614.000000	50000.000000	68627.000000	4.000000	NaN

	Europe	Americas	Middle East	Africa	Asia
year
1989	3	12	7	17	21
1990	3	9	8	18	27
1991	8	7	9	23	20
1992	10	6	6	17	20
1993	11	5	7	15	16

	Buyer	Delivery year	Deal ID	Order date	Numbers delivered	SIPRI estimate	TIV deal unit	TIV delivery values	conflict_id	side_a	side_b	incompatibility	year	bd_best	bd_low	bd_high	type_of_conflict	region_main
0	Afghanistan	1989.0	488573.0	43675.0	2427.0	98.56	69.85	2175.52	333.0	Afghanistan	Jam'iyyat-i Islami-yi Afghanistan	2.0	1989.0	1545.0	1374.0	1745.0	4.0	Asia
1	Afghanistan	1989.0	488573.0	43675.0	2427.0	98.56	69.85	2175.52	333.0	Afghanistan	Hizb-i Islami-yi Afghanistan	2.0	1989.0	181.0	1.0	181.0	4.0	Asia

Government armament purchases and their effects on the number of casualties in armed conflicts around the world¶

By: John Salter-Cid¶

Published: 12/6/2019¶

Last Edited: 12/6/2019¶

Table of Contents:¶

Introduction¶

Part 1a: Load & Wrangle Arms Transfer Dataset¶

Part 1b: Initial Exploration into Arms Transfer Database¶

Part 2a: Load & Wrangle Arms Transfer Dataset¶

Part 2b: Initial Exploration into Arms Transfer Database¶

Part 3: Merging the Datasets¶

Part 4: K-Nearest Neighbors Regression¶

Part 5: Conclusion¶

Introduction:¶

Part 1a: Load & Wrangle Arms Transfer Dataset¶

As we can see, SIPRI does a very good job of curating and maintaining their data. It doesn't appear that a lot of wrangling will be needed to get this dataset ready for analyzing. However, we should still look for null values and make sure the data types are how we want them.¶

Below you can see in the count row that there are no nulls in this dataset! SIPRI helps us out yet again.¶

Looking at the data types listed above, we can see that they are almost all where we want them. The only change we need is to "Armament category". It is currently a object and it should be a category data type. A simple "astype" command is all we need to fix that.¶

Now, the arms dataset is ready to be analyzed! Before we load the other datasets, let's do some exploratory analysis to see what the arms dataframe has to tell us.¶

Part 1b: Initial Exploration into Arms Transfer Database¶

Let's start simple. Let's make a lineplot showing the TIV delivery values across time. I prefer Seaborn so I will be using for many of the graphs in this article.¶

After a spike in the early 1950's, total sales of armament has seemed to fluctuate between 25 and 50 million delivery values per year.¶

To align with our original inquiries about who are the top buyers and sellers of armaments across the world, let's look at the top 10 buyers and sellers as defined by the greatest total number of TIV delivery values.¶

Let's do the top 10 sellers first.¶

Not surprisingly, we can see that the United States and Russia are far and beyond the leading sellers of armaments in the world. To demonstrate that further we can plot the the percentage of world sales that is attributed to just the US and Russia per year.¶

At one point (around 1963-64), you can see that US and Russia alone accounted for over 80% of world armament sales! With the industrialization of other nations having occured since this time, it is not surprising to see a general decrease in this proportion over time.¶

Though seeing just how much the US and Russia dominate world arms sales is interesting, to answer our question of how the buying of armaments affects the number of casualites in war, let's look at the top 10 buyers.¶

Below you can see the top 10 lists of both buyers and sellers in tabular form.¶

Before we move on to the other datasets, let's just look at something other than the TIV delivery values. Personally, I was interested to see how the types of armaments purchased changed from 1950 to 2018.¶

Introducing the UCDP Battle-Related Deaths Data¶

Part 2a: Load & Wrangle the Dataset¶

There are a lot of columns we don't need, let's drop them now.¶

Now we are ready for some exploratory data analysis!¶

Part 2b: Initial Exploration into UCDP Battle-Related Deaths Data¶

Let's start with the 'regions' dataframe we just created and plot a stack plot¶

We see that Africa and Asia are usually the regions with the most conflicts!¶

Now, let's check how these regions compare in terms of the number of battle-related deaths per year. To do this we will just need to relabel the regions to their propers labels and groupby region_main and year¶

Interestingly, the Middle East, despite having lower numbers of conflicts in recent years compared to Africa and Asia, clearly shows that its battles are far more deadly.¶

Part 3: Merging the two datasets ('deaths','arms')¶

To merge the datasets, we will need to edit the spellings of the countries to make sure they match¶

Above is our final dataframe that we will now use to complete some more advanced analysis with.¶

Part 4: K-Nearest Neighbors Regression¶

To accomplish this, we will run a K-Nearest Neighbors regression using 10, 50, and 100 nearest neighbors.¶

It is immediately clear that there is no correlation between TIV delivery values and the number of deaths that occur in conflicts!¶

Part 5: Conclusion¶

Government armament purchases and their effects on the number of casualties in armed conflicts around the world ¶