

By Robert F. Chestnutt, Dublin City University
1.0 Introduction
Vladimir Putin has, arguably, been the scrutinised about global leader since his unexpected ascension to the Russian presidency in 1999. Interestingly, even since the election of Donald Trump in November 2016, analysis and discussion of Putin has grown even more.
Putin succeeded Boris Yeltsin as Russian president almost two decades ago, and only Stalin has ruled longer. Few would defend accusations that he has flagrantly feathering his nest from Russia’s vast resource riches, terrorised his own people to protect his position, that he has been a sporadic threat to global security, and a range of other incriminations. Yet, we can be certain about a couple of extremes, Putin has been known to be openly mean-spirited and spiteful, and a small window into his penchant for intimidation and domination was demonstrated by how he manipulated German Chancellor Angela Merkel’s well-known dog phobia at his Sochi residence in 2007. Photos recording the event suggest he was quite entertained by her clear discomfort. Conversely, there has been the much less frequent commentary about how Putin has effectively contained the numerous undesirable domestic groups in the Russian ‘wild west’, and we may only know the significance of this in the post-Putin era. As such, he is understandably one of the most polarising political figures in recent history. Conversely, he has been called a tragic figure, and the ideal ruler for the current period of Russia1. Long term foe, Boris Beresovsky called him a traditionalist who believes that the only way to sustain order and protect the state is through authoritarianism.
Yet, whether one loves or hates him, whether one feel threatened or more secure by the thought of him (given his substantial domestic popularity), he will still go down as one of the most interesting political figures in history. Vladimir Putin will be remembered and analysed long after his departure from Russian and global politics. Given this fascination, what else can we learn about him from the vast amount of books, journal articles, news sources and documentaries about him? How will history remember him? Given these questions, over time, what have been the most common topics and themes commentators have chosen to discuss? What has been the dominating sentiment expressed towards Putin by commentators, and how has it fluctuated over time? How do the negative (and positive?) individual themes rank, and how has this fluctuated over time? and can we identify camps of authors whose commentaries are significantly different or similar?
2.0 Data, tools and method
2.1 Data
This piece dissects a corpus of books about Vladimir Putin (or where he is the principal focus). They are written by a range of academics, authors, former political figures, journalists and experts from international organisations. Thus, there is variation on the range of themes surveyed and emphasised and varying levels of sentiment. The corpus is comprised of 31 books published over a 14 year span, starting a few years after his appointment as Russian President in 2004 and up to the present day, 2018. A chronology of the books used is listed in Appendix 1.
2.2 Method
The piece uses a range of Data Science methods, including Topic Modelling and a Sentiment analysis; as well as a range of graphics to illustrate significant findings.
2.2.1 The Descriptives
Section 3 outlines some basic descriptive statistics, surveying the most common terms: overall, by book and by year. In addition, significant terms by book and by year will be analysed using the ‘TF-IDF’ statistic. Instead of simple term frequency, the ‘TF-IDF’ value increases with term frequency, but is offset by the overall frequency of the word in the corpus. This removes commonly occurring terms that may not yield much information. From a common sense perspective, if a term appears often, it must be important, represented in frequency. However, if it appears in all documents, it is likely to not be overly insightful or informative. This post analyses Vladimir Putin, and as such ‘Vladimir’, ‘Putin’ and ‘Russia’ are mentioned very often across all books. Thus, in reality their overall value to the analysis is comparitively low compared to other terms, and as such are automatically omitted from the output of any of the respective machine learning models.
2.2.2 Topic Modelling
Section 4 employs Topic modelling to estimate the themes that are most frequently addressed in analysis of Vladimir Putin.
Topic modelling is a machine learning method used for unsupervised classification of text documents such as books, text webdata, journal articles and news articles; similar to clustering on numeric data2. While clustering seeks to establish groups of documents within a corpus, topic modelling aims to isolate core themes from a set of texts3. Clustering is deductive, while topic modelling is inductive. As such, it is exploratory in nature as it discovers natural groups of items even when the investigator is not totally sure what they are looking for.
Latent Dirichlet allocation (LDA) is a method for fitting a topic model. In this case it will treat each book as a mixture of topics, and each topic as a mixture of words. This allows the books to overlap each other in terms of content, rather than being separated into discrete groups. Theoretically, it seeks to mirror the typical use of language. This post clusters the books where Vladimir Putin is the predominant theme, this process generates a model and ‘learns’ to tell the difference between the books based on the text content. This section also demonstrates a number of options to fine tune a topic model in order to secure the most coherent and usable output.
2.2.3 Sentiment Analysis
Sentiment analysis is a method of opinion mining, which analyses the emotional content of text programmatically. Essentially, it distils an author’s emotional intent into distinct categories. It borrows from a number of disciplines, including linguistics, psychology and natural language processing (NLP)4. Section 5 uses Sentiment analysis to gauge feeling towards President Putin, surveying variation by publication and over time.
There are two approaches to Sentiment Analysis: the machine learning classification approach and the lexicon-based approach. The machine learning classification method involves training a model based on existing pre-labeled data, and then using the model to determine polarity on unlabeled data. This method is very useful and efficient when disecting mammoth datasets, but offers minimal texture outside binary polarity. As such, it may be more useful in scenarios where there is voluminous loads of data to interpret but the output does not require anything outside a binary positive or negative, perhaps like in gauging sentiment trajectory of a share price or summarising many millions of reviews.
The lexicon-based approach offers more texture in two ways, and perhaps offers more fruitful output options for political analysis. As with machine learning binary classification, polarity can be estimated using the ‘Bing’ lexicon5. In addition, the depth of polarity can be estimated using the ‘AFINN’ lexicon6 which estimates the magnitude of polarity, scaled between -5 to +5. Lastly, the ‘NRC’ lexicon7 offers further texture by grouping emotive adjectives into eight distinct categories (as well as negative-positive polarity): sadness, joy, trust, fear, anger, surprise, disgust and anticipation.
As such, this section will seek to estimate depths of polarity measured by book and also by year of publishing. What can changes in magnitude of polarity tell us about attitudes of political commentators to Putin over time?
2.3 Tools
Machine learning methods offer a whole new world of analytical options, but to put it into operation requires specialised tools. This post a mixed programming language approach to answering the array of questions outlined above. Although it will predominately use the ‘R’ statistical programming language, it will also embed Python code chunks depending on the specific task, using the ‘reticulate’ package8 in R. Although, it is highly unusual to use a mixed programming approach to a single project, each programming language holds a number of positives and contentious drawbacks. At other times, as in the case of this piece, it just comes down to personal preference. ‘R’ can be slow and a memory glutton. Python can also be sluggish when dealing with exceptionally large amounts of data but, does not fully rely on RAM to the same extent as ‘R’. Ultimately, some tasks are more lucid and straightforward based on the programming language used.
The descriptives and lexicon-based sentiment analysis will be executed through ‘R’, and the Topic Modelling and Cosine similarity tasks will be performed using the GenSim9 library in Python. Everything will be coded together in Markdown through the RStudio integrated development environment (IDE).
3.0 The Descriptives
3.1 Most common words by frequency (overall)
Section 3 presents some basic descriptives using the ‘R’ statistical programming language. It graphs an overall term count by frequency, with common ‘stopwords’ such as ‘the’, ‘is’, ‘a’ and a host of other non-contributing non-emotive words already removed from the corpus.
Even surveying the most frequent terms of the corpus using basic word counting, the tone is quite clear. Both, ‘power’ and ‘war’ are in the top 5 terms and fascinatingly, Khororkovsky and Yukos make it into the top 10 of frequent terms. Other interesting terms such as ‘Chechnya’, ‘KGB’, ‘FSB’, ‘control’ and ‘security’ are among the top 50, being provocative terms within the context of Putin’s reign.
3.2 Most significant terms by Book (tf-idf)
## Selecting by tf_idf
3.3 Most significant terms by Year (tf-idf)
4.0 Topic modelling Putin: A hint to his legacy?
4.1 Introduction
Section 4 uses topic modelling, through the ‘Python’ programming language, to estimate the most dominant themes about Russian President Vladimir Putin, from this corpus. The process was also completed using the ‘R’ programming language, but ultimately the ‘Python’ output was used as the ultimate model was slightly more coherent. The core library used in this piece is ‘GenSim’, which was used to compile and refine the topic models. In addition, Python’s ‘Natural Language Toolkit’ (NLTK) and ‘SpaCy’ libraries are used to process and clean the text data. Some regular expression (RegEx) gymnastics are used to deal with formatting issues which manifest during the scraping process. Lastly, the ‘Matplotlib’ library was used for data visualisation.
4.3 What does LDA do?
The Latent Dirichlet Allocation (LDA) algorithm’s approach to topic modelling considers each document as a collection of topics in a certain proportion. Similarly, it treats each topic as a collection of keywords, again, in a certain proportion. A topic is a collection of dominant keywords that are typically representative. In theory, by surveying the keywords one should be able to work out what the topic is all about.
Key factors in securing coherent topics include the quality of text pre-processing and cleaning; the variety of topics in the corpus; the choice of topic modelling algorithm and the number of topics fed to the algorithm.
The year of publishing for the books in the corpus
## [2014 2016 2018 2012 2015 2004 2008 2017 2009 2010]
4.4 Building the Topic Model
After importing of the data and the various data clean-up processes, the model(s) can be constructed. For the initial model, one is required to arbitrarily choose the number of topics to be assigned. The base LDA model for this piece is built with 20 different topics, where each topic is a combination of keywords, and each keyword contributes a weight to the topic.
The base model is built using GenSim’s inbuilt version of the LDA algorithm and is not particularly coherent or helpful. A quick survey of the topics output and also its coherence score of 0.54 suggests as such. Yet, there are a number of terms that look appropriate and could easily be identifiable to Putin and Russia in the contemporary period. Topic 14, in particular, appears to be the most coherently estimated topic and is a represented as:
0.063“kremlin” + 0.062“people” + 0.057“ukraine” + 0.034“yanukovych” + 0.025“surkov” + 0.020“decide” + 0.019“speak” + 0.018“donetsk” + 0.018*“event”
These are the top 10 terms which contribute to this topic and the corresponding weight of importance attached to each term. This suggests the level of importance a keyword is to that topic.
##
## Coherence Score: 0.5439246334382405
4.5 Increase the coherence score of the model
The Mallet version of the LDA algorithm developed at UMASS Amherst sometimes can offer a better quality of topics. Gensim provides a wrapper to implement Mallet’s LDA algorithm. The Mallet topics are outlined:
## [(16,
## [('russia', 0.040677385796494554),
## ('world', 0.02700851999631283),
## ('american', 0.018659713717589118),
## ('russian', 0.01709266648230817),
## ('war', 0.013365991124455155),
## ('united_state', 0.011733101568364082),
## ('international', 0.011351216107665364),
## ('nation', 0.010245065118055282),
## ('west', 0.010166054333083133),
## ('georgia', 0.010008032763138835)]),
## (13,
## [('putin', 0.26757780129275727),
## ('president', 0.05223300228208626),
## ('yeltsin', 0.043703864247740226),
## ('kremlin', 0.02799699121588026),
## ('medvedev', 0.020526027257544273),
## ('prime_minister', 0.016293330953504086),
## ('vladimir', 0.012583347144842357),
## ('term', 0.011410431301554114),
## ('presidency', 0.009791297474406211),
## ('begin', 0.007815189260170583)]),
## (11,
## [('year', 0.029341526429718594),
## ('time', 0.019956706933873334),
## ('life', 0.0192863626841701),
## ('live', 0.01890929404371203),
## ('play', 0.01444033237902381),
## ('child', 0.013909643181342085),
## ('work', 0.013309126457649605),
## ('family', 0.012806368270372182),
## ('friend', 0.012052230989456044),
## ('great', 0.011270162698135605)]),
## (0,
## [('moscow', 0.02935525192143467),
## ('day', 0.021215841161400514),
## ('murder', 0.016438941076003417),
## ('litvinenko', 0.013850341588385995),
## ('meet', 0.012996370623398805),
## ('death', 0.01259607173356106),
## ('call', 0.01194225021349274),
## ('fsb', 0.010941502988898377),
## ('find', 0.010421114432109309),
## ('die', 0.010127561912894961)]),
## (2,
## [('party', 0.047414735461303016),
## ('election', 0.042495627459554),
## ('support', 0.023065150852645388),
## ('campaign', 0.021042850896370792),
## ('vote', 0.0207559029296021),
## ('duma', 0.018514975951027548),
## ('result', 0.0157548097944906),
## ('presidential', 0.013636860515959772),
## ('candidate', 0.01344556187144731),
## ('region', 0.013144949715784872)]),
## (19,
## [('soviet', 0.058932804680243436),
## ('year', 0.022805096141849085),
## ('soviet_union', 0.017315237131410033),
## ('history', 0.01502779587706043),
## ('end', 0.013627604563791884),
## ('stalin', 0.011991737484923683),
## ('collapse', 0.010854948497913576),
## ('time', 0.010660862573302095),
## ('country', 0.010549956330666962),
## ('era', 0.010452913368361221)]),
## (15,
## [('head', 0.03739505219362715),
## ('work', 0.027579162410623085),
## ('kgb', 0.0267196492189641),
## ('city', 0.018398564986671317),
## ('office', 0.01812451730237425),
## ('security', 0.01806223373776128),
## ('putin', 0.01746431151747677),
## ('deputy', 0.01619372679937218),
## ('foreign', 0.015184733052642068),
## ('service', 0.014935598794190189)]),
## (7,
## [('khodorkovsky', 0.046778029116640564),
## ('case', 0.03857084855866829),
## ('court', 0.021315468940316686),
## ('year', 0.021228467026274577),
## ('arrest', 0.020459950118902615),
## ('charge', 0.01832840322487095),
## ('prison', 0.01075923670320747),
## ('human_right', 0.010541731918102199),
## ('trial', 0.010048721071863581),
## ('lawyer', 0.00997621947682849)]),
## (14,
## [('political', 0.022395571212883745),
## ('policy', 0.01986348766985405),
## ('project', 0.016309134373427278),
## ('freedom', 0.015947408152994464),
## ('politic', 0.015711499748364367),
## ('idea', 0.014233140412682435),
## ('liberal', 0.012487418218419729),
## ('argue', 0.011496602918973326),
## ('approach', 0.010647332662304982),
## ('tion', 0.009373427277302466)]),
## (18,
## [('moscow', 0.032439729464383116),
## ('people', 0.024244572924620924),
## ('opposition', 0.020535616886658666),
## ('protest', 0.018081160685065996),
## ('group', 0.01744027489909458),
## ('kremlin', 0.01643122068288426),
## ('activist', 0.013363150430893422),
## ('leader', 0.01232682447910985),
## ('street', 0.01209501472673721),
## ('call', 0.011426857205192539)])]
##
## Coherence Score: 0.7158272132296972
By amending this model, the coherence score increased enormously from 0.54 to 0.7155.
4.6 Developing the model further by finding the optimal number of topics for the LDA
To create the best topic model which can create the most coherent output possible, the next step attempts to estimate the optimal number of topics. To do this, one can train multiple models, amending the topic number setting. Then graph the coherence score of each model against the topic number setting.
In this part, the graph illustrates how the coherence score increases sharply with the topic setting increases, but levels out and the acceleration slows. The coherence value reaches its peak at a k-value (number of topics) of 20, but a k value of 8 is chosen as this is where the graph starts to level out. The first model built with Mallet represented a considerable improvement in performance from the generic GenSim model. However, several keywords appeared across multiple topics, suggesting that the k-value was too big.
Graph the coherence scores by k-value (topic number setting)
Print the coherence scores
## Num Topics = 2 has Coherence Value of 0.5581
## Num Topics = 8 has Coherence Value of 0.6994
## Num Topics = 14 has Coherence Value of 0.7073
## Num Topics = 20 has Coherence Value of 0.7159
## Num Topics = 26 has Coherence Value of 0.7046
## Num Topics = 32 has Coherence Value of 0.7157
## Num Topics = 38 has Coherence Value of 0.6977
Select the optimal model and print the topics
## [(0,
## '0.018*"moscow" + 0.012*"day" + 0.011*"man" + 0.010*"woman" + '
## '0.009*"chechnya" + 0.009*"chechen" + 0.007*"people" + 0.007*"police" + '
## '0.006*"leave" + 0.006*"begin"'),
## (1,
## '0.023*"soviet" + 0.020*"year" + 0.019*"time" + 0.015*"people" + '
## '0.012*"good" + 0.010*"life" + 0.010*"thing" + 0.009*"man" + 0.009*"make" + '
## '0.007*"work"'),
## (2,
## '0.018*"party" + 0.017*"election" + 0.015*"kremlin" + 0.012*"support" + '
## '0.012*"leader" + 0.010*"make" + 0.010*"opposition" + 0.010*"liberal" + '
## '0.010*"russia" + 0.009*"law"'),
## (3,
## '0.019*"state" + 0.019*"company" + 0.018*"khodorkovsky" + 0.013*"business" + '
## '0.013*"yukos" + 0.011*"oil" + 0.010*"russian" + 0.009*"money" + '
## '0.009*"oligarch" + 0.008*"share"'),
## (4,
## '0.079*"russia" + 0.054*"russian" + 0.024*"country" + 0.017*"world" + '
## '0.015*"war" + 0.010*"great" + 0.010*"ukraine" + 0.010*"west" + '
## '0.009*"western" + 0.008*"american"'),
## (5,
## '0.119*"putin" + 0.023*"president" + 0.018*"yeltsin" + 0.016*"head" + '
## '0.013*"government" + 0.012*"work" + 0.011*"office" + 0.011*"security" + '
## '0.010*"kgb" + 0.010*"vladimir"'),
## (6,
## '0.038*"political" + 0.022*"state" + 0.021*"power" + 0.017*"people" + '
## '0.014*"percent" + 0.013*"authority" + 0.010*"system" + 0.010*"image" + '
## '0.009*"change" + 0.008*"view"'),
## (7,
## '0.015*"case" + 0.012*"give" + 0.010*"russian" + 0.008*"report" + '
## '0.008*"court" + 0.008*"fsb" + 0.008*"arrest" + 0.008*"find" + '
## '0.007*"charge" + 0.007*"berezovsky"')]
Those were the topics for the chosen LDA model
4.7 To sum up….
Working through a number of stages demonstrates the refinement potential, from the base topic model using Gensim’s generic LDA algorithm, to mallet’s LDA implementation, to calculating the optimal topic value. There were substantial jumps in coherence score between stages, and the optimal model hows indisputably clear themes. Topic 0 appears to outline the Oligarch issue. Topic 1 suggests themes of the nature of state power in Putin’s Russia. Topic 2 presents Putins past. Topic 4 deals with Chechnya. Topic 5 may suggest tensions with the West. Topic 6 describes the Khordokovsky and Yukos affair.
There are also potential flaws, where there is little acknowledgement of Mikheil Saakashvili and the August 2008 war with Georgia. Similarly, although it is still quite recent, there is little acknowledgement of Trump and the alleged US election interference. Perhaps if the k-value was increased more on the significance of the 2004 Orange revolution in Ukraine would manifest, especially as it appears to have been the start of the decline of Putin’s relationship with the West.
In sum, the model has performed remarkably well. Much of this self-reflecting criticism should not be blamed on the model or process, but more on the relatively small size of the corpus of 31 books. A larger study should include, not only books, but academic journal articles, IO / think tank reports and news media articles.
5.0 Sentiment Analysis: more troughs than peaks?
5.1 Introduction
The previous sections approached questions about word frequency, exploring terms most frequently used within the corpus of books about Vladimir Putin. Then the key topics that he may be remembered for were estimated. This section addresses the topic of opinion mining or sentiment analysis, which involves gauging of the emotional intent of words. This involves infering whether a section of text is positive or negative, the magnitude of polarity and also estimating the dominant emotional categories by author and by time period. Instead of using machine learning classification techniques, this section will employ a lexicon-based approach, tokenising the text to approach the emotional content more programmatically. As mentioned in Section 2, three reputed lexicons will be employed based on the task: ‘Bing’ for polarity, ‘AFINN’ for magnitude of polarity and ‘NRC’ for emotional categorisation.
5.2 Sentiment by year (Bing and AFINN)
The above graphics representing the distribution of negative sentiment over time use the ‘Bing’ and ‘AFINN’ lexicons. Bing is a binary of positive to negative, and AFINN adds to the texture of polarity by measuring the magnitude of polarity on an individual score basis of -5 to +5. In programming the polarity for the Bing graphic, negative sentiment is calculated as a percentage of total terms, rather than volume. Similarly, the AFINN aggregation of sentiment is the average for the AFINN scale over time, thus showing a range of -0.46 to +0.04.
Although it is more subtle on the Bing graphic, we can see in both diagrams the start of a more negative trend from 2014 and accelerates in 2018. The small sample of literature (31 books) is a potential contributory factor in the choppiness of the trend. A larger sample including more books, newspaper articles (perhaps by scraping newsgroups and lexis-nexus) and journal articles would yield a more conclusive trend of sentiment. As it stands, the trend looks quite representative and shows spikes around the time of the Crimea crisis and also as the events surrounding the US elections became apparent.
5.3 Sentiment by book (Bing and AFINN)
Cross-referencing the three lexicons by book
Figures X and Y illustrate the distribution of negative sentiment by book. Interestingly, Julie Hemment’s book on Youth politics in Putin’s Russia has been ranked as the least negative book by the Bing and AFINN lexicon calculations. She examines the controversial nationalist youth projects that have proliferated in Russia in the Putin era, such as the pro-Kremlin Nashi. These state-sponsored organisations are used to mobilize Russian youth have been widely reviled in the West, seen as Soviet throwbacks and evidence of Russia’s authoritarian turn. Putin regularly attends their annual meetings.
Elena Shestopal’s book examines recent political and psychological changes in Russian society during Vladimir Putin’s third term. Instability in 2011-2012 and new domestic and international contexts are the main themes explored in this book, which is said to focus on popular perceptions of Russian politics during a new electoral cycle. As such, its less negative sentiment should not viewed with too much surprise.
5.4 The NRC emotions lexicion and adapt Loughran to track sentiment by emotion over time
This section attempts to adapt a technique mostly used by sentiment analysis of the financial sector. Analysts have been known to use the Loughran and McDonald dictionary of financial sentiment terms (Loughran and McDonald 2011) lexicon which can be used to segment sentiment by emotional categories relating to financial data. This dictionary was developed based on analyses of financial reports. The Loughran lexicon divides words into six sentiments: “positive”, “negative”, “litigious”, “uncertain”, “constraining”, and “superfluous”. Instead, we look to plot certain emotional categories from the ‘NRC’ lexicon over time. Specific emotional categories of interest include: fear, disgust and sadness.
The only weakness in this analysis is that it is measured on word frequency. In a deeper analysis I would calculate the emotional categories, but emotional category, as a percent of the total word count and investigate how the levels compare over time or by document.
5.4.1 Counts by emotional category by Book
## # A tibble: 31 x 11
## Author anger anticipation disgust fear joy negative positive sadness
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Benne… 2843 2132 1302 3197 1629 5032 5198 1972
## 2 Browd… 1713 2125 1116 2056 1510 3727 4537 1655
## 3 Dawis… 1822 2193 902 2237 1502 3845 6196 1355
## 4 Dugin 1449 1585 680 1667 1173 3272 5004 1156
## 5 Felsh… 3003 3029 1524 3938 1903 5890 8157 2483
## 6 Gessen 1430 1644 737 1972 1112 3074 4327 1395
## 7 Hemme… 1107 1673 419 1267 1468 1831 4211 590
## 8 Jack 2220 2463 1027 3066 1745 5122 6832 1926
## 9 Judah 2864 2347 1611 3506 2000 6193 6164 2403
## 10 Kaspa… 2173 1713 1071 2876 1339 4376 4915 1650
## # ... with 21 more rows, and 2 more variables: surprise <dbl>, trust <dbl>
5.4.2 Counts by emotional category by Year
## # A tibble: 10 x 11
## Year anger anticipation disgust fear joy negative positive sadness
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2004 3961 4026 1895 5488 2859 8224 10735 3415
## 2 2008 7035 6859 3500 9676 5150 14827 20492 6284
## 3 2009 2948 2921 1338 3600 2001 6224 8505 2592
## 4 2010 2132 2140 921 2387 1404 4117 5176 1698
## 5 2012 3104 3284 1556 4202 2348 6728 9232 2694
## 6 2014 11252 10327 5510 14029 7740 23160 28231 8528
## 7 2015 7973 9239 3850 9987 7183 16766 27269 6482
## 8 2016 10388 10114 5060 13775 7319 21355 27133 8315
## 9 2017 1233 1136 687 1764 1043 2621 3472 1125
## 10 2018 8690 7499 4639 11217 5474 16288 19939 6957
## # ... with 2 more variables: surprise <dbl>, trust <dbl>
As seen in Figure X, there appears to be a spike in ‘fear’ terms in 2014. As mentioned above this is potentially skewed as it is not represented proportionally. Moreover, the date of publishing should be considered. These titles were written approximately a year before the publishing date. However, the spike in and around 2014 could be attributed to the Crimea crisis. Similar spikes and troughs are seen in the other graphics for volume of ‘disgust’ and ‘sadness’ terms.
6.0 Conclusion
This piece used machine learning tools, one of the least frequently employed methods of political analysis, to summarise major themes about the reign of Russian President Vladimir Putin. By doing this, it posits the predominant topics and sentiment that may encompass his legacy when he eventually departs from the Russian and global political scenes.
Section 3 used basic counting methods to present the most frequently used terms in analysis of President Putin. It sets a negative and fearful stage where terms such as war, power, security and KGB are at the forefront. Section 4 estimates the dominant topics from the corpus. The war in Chechnya and the Khordokovsky affair are clearly generated by the optimal model. Section 5 presents a overall negative sentiment, with distinctive spikes in negativity appearing around the publishing date of 2014, which is consistent across different methods of measurement (3 lexicons).
Although, the post presents some useable output and interesting observations, there are gaps and weaknesses which offer scope for deeper analysis. The scope of the corpus is small and narrow – only focusing on 31 books. Additional sources, such as newspaper articles, IO/think tank reports, academic articles and more books would serve to add further texture and perhaps offer more granularity to sentiment over time. Another tool which would potentially offer further interesting insight would be a ‘Cosine similarity’ which could cluster groups of authors by the similarity in their publications. What further insight and hypotheses could be obtained if authors could be clustered?
7.0 Appendix 1
Bennetts, Marc. 2016. I’m Going to Ruin Their Lives: Inside Putin’s War on Russia’s Opposition. Oneworld Publications
Browder, Bill. 2016. Red Notice: How I Became Putin’s No. 1 Enemy. Corgi publications
Dawisha, Karen. 2015. Putin’s Kleptocracy: Who Owns Russia? Simon & Schuster; Reprint edition (22 Sept. 2015)
Dugin, Alexander. 2014. Putin vs Putin: Vladimir Putin Viewed from the Right. Arktos Media Ltd (30 Sept. 2014)
Felshtinsky, Yuri. 2018. The Putin Corporation: How to Poison Elections. Gibson Square Books Ltd; Revised ed. edition (29 Nov. 2012)
Gessen, Masha. 2013. The Man without a Face: The Unlikely Rise of Vladimir Putin. Granta (3 Jan. 2013)
Hemment, Julie. 2015. Youth Politics in Putin’s Russia: Producing Patriots and Entrepreneurs (New Anthropologies of Europe). Indiana University Press (14 Sept. 2015)
Jack, Andrew. 2005. Inside Putin’s Russia: Can There Be Reform Without Democracy? Oxford University Press, USA (15 Dec. 2005)
Judah, Ben. 2014. Fragile Empire: How Russia Fell In and Out of Love with Vladimir Putin. Yale University Press (4 Feb. 2014)
Kasparov, Gary. 2016. Winter Is Coming: Why Vladimir Putin and the Enemies of the Free World Must Be Stopped. Atlantic Books; Main edition (3 Mar. 2016)
King, M. S. 2014. The War Against Putin: What the Government-Media Complex Isn’t Telling You About Russia. CreateSpace Independent Publishing Platform (1 April 2014)
Knight, Amy. 2018. Orders To Kill: The Putin Regime and Political Murder. Biteback Publishing (22 Jan. 2018)
Levine, Steve. 2009. Putin’s Labyrinth: Spies, Murder, and the Dark Heart of the New Russia. Random House Inc; Reprint edition (15 July 2009)
Lourie, Robert. 2017. Putin: His Downfall and Russia’s Coming Crash. Macmillan USA (12 Sept. 2017)
McNabb, David E. 2015. Vladimir Putin and Russia’s Imperial Revival. Routledge; 1 edition (21 Sept. 2015)
Myers, Stephen Lee. 2016. The New Tsar: The Rise and Reign of Vladimir Putin. Simon & Schuster UK; UK ed. edition (8 Sept. 2016)
Ostrovsky, Arkady. 2016. The Invention of Russia: The Journey from Gorbachev’s Freedom to Putin’s War. Atlantic Books; Main edition (2 Jun. 2016)
Politkovskaya, Anna. 2004. Putin’s Russia. Harvill Press; Reprint edition (14 Oct. 2004)
Politkovskaya, Anna. 2008. A Russian Diary: With a Foreword by Jon Snow. Vintage (3 April 2008)
Roxburgh, Angus. 2013. The Strongman: Vladimir Putin and the Struggle for Russia. I.B.Tauris; New, Updated edition (28 Feb. 2013)
Sakwa, Richard. 2008. Putin: Russia’s Choice. Routledge; 2 edition (10 Sept. 2007)
Sakwa, Richard. 2009. The Quality of Freedom: Khodorkovsky, Putin and the Yukos Affair. OUP Oxford (7 May 2009)
Satter, David. 2017. The Less You Know, the Better You Sleep: Russia’s Road to Terror and Dictatorship under Yeltsin and Putin. Yale University Press; Reprint edition (3 Oct. 2017)
Shestopal, Elena. 2017. New Trends in Russian Political Mentality: Putin 3.0. Lexington Books (12 April 2017)
Sixsmith, Martin. 2010. Putin’s Oil. Continuum (15 Feb. 2010)
Sperling, Valerie. 2014. Sex, Politics, and Putin: Political Legitimacy In Russia (Oxford Studies In Culture And Politics). Oxford University Press (5 Dec. 2014)
Stuermer, Michael. 2009. Putin And The Rise Of Russia: The Country That Came in from the Cold. W&N; UK ed. edition (25 Jun. 2009)
Unger, Craig. 2018. House of Trump, House of Putin: The Untold Story of Donald Trump and the Russian Mafia. Bantam Press (14 Aug. 2018)
Van Herpen, Marcel. 2014. Putin’s Wars: The Rise of Russia’s New Imperialism. Rowman & Littlefield Publishers (27 Feb. 2014)
Walker, Shaun. 2018. The Long Hangover: Putin’s New Russia and the Ghosts of the Past. OUP USA; 1st Edition edition (22 Feb. 2018)
Zygar, Mikhail. 2017. All the Kremlin’s Men: Inside the Court of Vladimir Putin. PublicAffairs; Reprint edition (30 Nov. 2017)
- Dugin, Alexander. 2014. Putin vs Putin: Vladimir Putin viewed from the right. Arktos↩
- Silge, Julia and Robinson, David. 201X. Text Mining in R A tidy approach. O’Reilly publishing↩
- Bengfort, Bilbro & Ojeda. 201X. Applied Text Analysis with Python. O’Reilly publishing↩
- Kwartler, Ted. 2017. Text Mining in practise with R. John Wiley & Sons Ltd↩
- Lui, Bing. Opinion Mining, Sentiment Analysis, and Opinion Spam Detection. https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html↩
- Nielson, Finn Arup. AFINN-96. http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010↩
- Mohammad, Saif. NRC Word-Emotion Association Lexicon (aka EmoLex). http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm↩
- Allaire, JJ et al. Reticulate: Interface to Python. https://cran.r-project.org/web/packages/reticulate/index.html↩
- Radim, Rehurek and Sojka, Petr. GenSim: Software Framework for Topic Modelling with Large Corpora. https://radimrehurek.com/gensim/↩