

Robert F. Chestnutt, School of Law & Government, Dublin City University [email protected]
1.0 Introduction
This policy paper serves as a follow up to to a previous blog post1, which sought to estimate Russian President Vladimir Putin’s legacy by topic modelling a corpus of books where Mr Putin is either the central subject, or a major focus. This paper shifts the focus to efforts to interpret and gain a deeper understanding of Mr Putin’s own output, a corpus of his annual addresses to the Russian federal assembly between 2000 and 2017 (omitting the Medvedev years).
This paper has a specific focus and is only interested in analysing the eight most dominant and coherent themes generated by the optimised topic model (using the Mallet library2 in Python) from the previous post. These themes included Crimea, Chechnya, the Russian Oligarchs, Yukos and the oil sector, his style of leadership and rule, the transition from Yeltsin, elections and democracy. This paper seeks to use these topics as keyword starting points, for the language modelling technique of word embedding, to elicit attitudes and will compare the performance of three machine learning algorithms.
In sum, this post asks if machine learning methods can offer any additional information and texture to the Russian system under Vladimir Putin through deeper analysis of these themes? Are there additional nuances that can be garnered from his speeches on the Crimea / Ukraine conflict, Chechnya, the Oligarchs, and how he approaches leading this major global power?
2.0 Data, tools and method
2.1 The data
The data analysed are President Putin’s annual speeches to the federal assembly, for each year from 2000 through to 2017, omitting 2007 to 2011 when Dmitry Medvedev was President. The speeches can be accessed from the official website of the President of Russia3.
2.2 The tools
The speeches were scraped from the website using Python’s Beautiful Soup library4. The NLTK5 library was used, in conjunction with some RegEx gymnastics, to normalise the text; by stripping any remaining html code, punctuation (and other stray tokens and symbols), and to convert all text to lower case. The NLTK library was also used to tokenise the text by sentence, being the required format for word embedding processing. Python’s GenSim6 library offers wrappers for the main algorithms. This is, again, a multi-programming language project, as the R programming language’s tidyverse7 suite, for data manipulation and visualisation, are seamless to use and offer better quality options than Python’s alternatives. The final document is compiled using RMarkdown8 within the RStudio integrated development environment, as it has the capacity to work in a multi-programming working environment.
Although, Python is the primary programming language used in the piece, the final document is compiled using RMarkdown within the RStudiointegrated development environment.
2.3 The method
2.3.1 Word embeddings
This paper uses word embedding to capture the context of a term, its semantic and syntactic similarity, and its relationship with other words. Word embeddings are an improvement over the simpler bag-of-words model schemes, such as word counts and frequencies which result in large and sparse vectors (mostly 0 values) that describe documents, but not the meaning of the words.
As such, it is particularly useful for this paper where the keywords from the topics generated in the previous post are used to generate a dictionary of closely affiliated terms. Similarly, in this post the algorithms are trained on pre-selected, such as Putin’s presidential speeches in this case, a deeper understanding of his attitude to these topics can be better estimated.
It is possible to use existing pre-trained word embedding models, Google allows access through its Word2Vec Google Code Project9, trained on Google news data (about 100 billion words); it contains 3 million words and phrases and was fit using 300-dimensional word vectors. Instead, this post will train pre-sourced data, which is accepted as the best approach for such an NLP problem. However, it can be time consuming, require substantial RAM and disk space, and more intense wrangling and cleaning of the data in order to input and train it.
2.3.2 The algorithms
2.3.2.1 Word2Vec by Google
Google’s Word2Vec10 is a two-layer neural net that processes text. Given enough data, usage and contexts, it can make highly accurate estimates about a word’s meaning based on past appearances. Its input is a text corpus and its output is a set of feature vectors for words from that corpus. The vectors used to represent words are called neural word embeddings, which represent words with numbers.
Word2vec trains words against other words that neighbour them in the input corpus. The purpose and usefulness of it is that it groups vectors of similar words together in a vectorspace, that is, it detects similarities between words mathematically. It creates vectors that are distributed numerical representations of word features, as such it elicits the context of individual words and does it without human intervention. In sum, it can be queried to detect relationships between words.
It does so in one of two ways, either using context to predict a target word (a method known as continuous bag of words, or CBOW), or using a word to predict a target context, which is called skip-gram. A well trained set of word vectors will place similar words close to each other in that space. Word2vec can be applied to genes, code, likes, playlists, social media graphs and a range of other verbal or symbolic series in which patterns may be distinguished. Words are simply discrete states like the other data mentioned above, and the transitional probabilities between those states are sought. Not only can the estimates produced by word2vec models be used to establish a word’s association with other words, they can be used to cluster documents and classify them by topic.
2.3.2.2 GloVe by Stanford NLP
StanfordNLP’s GloVe11 stands for global vectors for word representation and possesses many parallels with Word2Vec. It is an unsupervised machine learning algorithm for generating word embeddings by aggregating global word-to-word co-occurrence matrix from a corpus. Essentially, it obtains vector representations for words. The resulting embeddings show usable linear substructures of the word in vector space.
Similar to Word2Vec, GloVe creates a continuous multi-dimensional representation of a word that is learned from its surrounding context words within a training corpus. Trained on a large corpus of text, these co-occurance statistics cause semantically similar words to appear near each-other in their resulting multi-dimensional embedding space. For example, dog and cat would be expected to appear near a region of other pet related words in the embedding space, because the context words that surround them in the training corpus are similar.
2.3.2.3 FastText by Facebook
Text classification has become an essential component of the commercial world. Consistent with how Google analyses a users search patterns to channel its marketing, Facebook has its own systems for leveraging users text data to direct more targetted ads.
FastText12 is an open-source library created by Facebook AI Research (FAIR), exclusively dedicated to the purpose of efficient learning of word representations and simplifying text classification. Recently, it has gained considerable traction within the Data Science community. This has been in part due to it’s capacity to expeditious performance in training and prediction of large amounts of computationally expensive text data, but moreover, due to its unique approach which appears to give it a major advantage over other algorithms. In contrast to the Word2Vec and GloVe algorithms, which treat single words as the smallest unit of analysis within vector representation, FastText splits individual words into n-gram chunks. Thus, the size of the unit of analysis could range from 1 to the length of the word. Not only is this is especially helpful for smaller datasets as it expands the volume of vector representations, it can also help with rarer words, for example rare diseases in a medical context.
This approach also offers FastText an unprecedented advantage to enable text analysis in more grammatically complex languages, and it already supports multi-language capacity reported to be 294 languages. The Eurasian languages are problematic to perform text analysis on as their cases are differentiated by a range of suffixes added to the end of words making them a challenge to stem and lemmatise. FastText’s unique approach potentially nullifies this barrier. Given these advantages, it is of particular interest for my PhD project and offers unusual advantages working with Russian and a number of other Eurasian languages (Azeri, Kyrgyz, Kazakh and Ukrainian).
3.0 Descriptives
This section presents some basic descriptives introducing a general breakdown of the prominent terms by frequency and using the tf-idf (term-frequency inverse-document-frequency). Tf-idf identifies notable terms by document while omitting common stopwords, which add no analytical value. Like the previous post, the descriptives for this paper will also be done using the tidy suite of tools from the R statistical programming language. The powerful Pandas13 library allows a quick and efficient set of tools to perform data manipulation operations within Python, but the visualisation tools are not as efficient. Matplotlib14 is frequently eulogised for its publishable quality graphics and extensive capacity for customisation. However, the visualisations are seen as far less sophisticated than R’s ggplot215 library, and the code quickly becomes very turgid and cumbersome as one attempts to bring visualisations to the standard offered by basic out-of-the-box ggplot2 graphics.
Putin’s addresses have remained of similar length over time. Fig 3.1 presents the overall word frequency after the omission of stopwords. Even a quick survey of the top 5 words suggest the focus over time has remained relatively constant over time.
A survey of the tf-idf calculations on Putin’s speeches (fig 3.2) offer a surprising level of depth. The term with the highest tf-idf value was communaland came from the 2002 address drawing to Housing and communal services reforms about that time. The term Administration from 2002 also received an overall high ranking. The terms missile and Ukraine from 2017 and 2014 respectively were the second and third highest ranking. Turkey from 2015 reminds of the tensions of that time. Other interesting high ranking terms include nanotechnology from 2007, ballistic from 2017 and Chechen from 2003. Quite a bit of texture was produced from a view of the tf-idf rankings.
4.0 Word embedding: the deeper meaning behind Putin’s speeches?
As described in the methodology section, this section attempts to gain added insight into important global themes and events based on Vladimir Putin’s published speeches. It uses word embedding and compares the performance of three major algorithms when trained on Putin’s yearly presidential addresses. A caveat for this approach is that word embedding would normally require substantial amounts of data to be trained prior to applying the model. Many of the pre-trained models available online (for example the google models) are trained on gigabytes of data. As such, the coherence and quality of the output of these three proposed models may not be as coherent. As with my previous post, for a larger study more voluminous and diverse sources of data would be used. However, this is a very interesting example of the potential of such algorithms as tools for political research. I am particularly interested to see how FastText performs compared to Word2Vec and GloVe, given its unique approach and implications for my PhD project.
The goal is to assess the effectiveness of the algorithms. For this, three of the algorithms will be tested with keywords from three of the eight themes generated by the optimised model in the previous post: attitudes to business and the oligarchs, to the West and to the security of the nation. These three sets of keywords will then be passed through each of the three algorithms.
4.1 Attitudes to business and the Oligarchs
4.1.1 Keyword = Khodorkovsky
# FastText
print(model_ft.wv.most_similar('khodorkovsky'))
## [('platforms', 0.9896156787872314), ('professionalism', 0.9867763519287109), ('mechanisms', 0.9867491126060486), ('instead', 0.9857164621353149), ('mechanism', 0.9854127764701843), ('professionals', 0.9852334260940552), ('efficiency', 0.9849071502685547), ('conflicts', 0.9846477508544922), ('jobs', 0.9844048023223877), ('professions', 0.9840933084487915)]
##
## /home/rfche704/anaconda3/lib/python3.5/site-packages/gensim/matutils.py:737: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int64 == np.dtype(int).type`.
## if np.issubdtype(vec.dtype, np.int):
Interestingly, only the FastText algorithm returned any output for Mikhail Khodorkovsky, the oligarch, head of the combined Siberian oilfields (Yukos) and once reported richest man in Russia. Khodorkovsky was famously imprisoned on politically motivated charges of fraud, and has his prison sentence spuriously extended. The output returned does appear to suggest a code of behaviour expected, suggested by terms like servants, consequences, privileges, code and conduct.
4.1.2 Keyword = Business
# Word2Vec
print(model_w2v.wv.most_similar('business'))
## [('protection', 0.9997005462646484), ('sector', 0.9996916651725769), ('companies', 0.9996762275695801), ('education', 0.9996694922447205), ('market', 0.9996658563613892), ('strategic', 0.9996495842933655), ('industry', 0.9996424317359924), ('civil', 0.9996154308319092), ('community', 0.9996147751808167), ('infrastructure', 0.9996131062507629)]
# GloVe
print(glove.most_similar('business', number=10))
## [('legislation', 0.9906777766099785), ('legal', 0.984852409088495), ('both', 0.984349364662433), ('sector', 0.9838278585127997), ('using', 0.9827867492123683), ('russian', 0.9812287419903728), ('including', 0.9788674909227848), ('industry', 0.9787609484056462), ('budget', 0.9779893895792523)]
# FastText
print(model_ft.wv.most_similar('business'))
## [('businesses', 0.9808726906776428), ('initiatives', 0.9718659520149231), ('protect', 0.9705683588981628), ('associations', 0.9680231809616089), ('organisations', 0.9678270816802979), ('actual', 0.9677289724349976), ('activity', 0.9660955667495728), ('intellectual', 0.9658936858177185), ('entrepreneurial', 0.9638667106628418), ('guarantees', 0.9637526869773865)]
For business, the output is much better represented. Although, word2vec or GloVe fail to offer much coherence, FastText does continue what one could perhaps interpret as a veiled or subtle expectation with terms such as, partners, partnership, protect, mutual and communal. It is notable that themes of property and engineers materialise. In the early days of Putins reign, there were considerable issues around housing and communal services.
4.2 Attitudes to the West
4.2.1 Keyword = West
# GloVe
print(glove.most_similar('west', number=10))
## [('accomplishing', 0.9923090445807695), ('experienced', 0.9873431291288517), ('quick', 0.9864517378728053), ('nature.to', 0.9855582438368751), ('petitions', 0.9852363836532348), ('born', 0.9851693786406268), ('exceedingly', 0.9832271566830941), ('subsidised', 0.9830094495798605), ('obligated', 0.9825413415771306)]
# FastText
print(model_ft.wv.most_similar('west'))
## [('gross', 0.9885681867599487), ('biggest', 0.9885213375091553), ('soviet', 0.9821631908416748), ('foremost', 0.9817773103713989), ('popular', 0.9791515469551086), ('largest', 0.97861647605896), ('latest', 0.9775289297103882), ('gdp', 0.9773699641227722), ('demographic', 0.9763599634170532), ('eastern', 0.9762721061706543)]
Despite not producing any output for West, Word2Vec has probably performed with the most coherency overall, with its output for America. Terms such as resources, military, foreign, market and influence allude to a suspicious attitude towards the West and the US. Interestingly, Kaliningradwas part of FastText’s output. The fact that it is affiliated to the West is not surprising as it is detached from mainland Russia.
4.2.2 Keyword = America
# Word2Vec
print(model_w2v.wv.most_similar('america'))
## [('process', 0.9992364048957825), ('third', 0.9992163777351379), ('directly', 0.9992102384567261), ('legislative', 0.9992088675498962), ('road', 0.9992068409919739), ('en.kremlin.ru/d/', 0.9991971850395203), ('among', 0.9991769194602966), ('reform', 0.9991734027862549), ('using', 0.9991711378097534), ('information', 0.9991642832756042)]
# GloVe
print(glove.most_similar('america', number=10))
## [('borders', 0.8884175400923867), ('owners', 0.870303796931066), ('talented', 0.8697185963637865), ('provision', 0.8693753145725024), ('belarus', 0.8688407936992295), ('commitments', 0.8678023755708467), ('hundreds', 0.8640938375637772), ('exports', 0.8639719411292334), ('industrial', 0.8636662199077186)]
# FastText
print(model_ft.wv.most_similar('america'))
## [('alcohol', 0.9965674877166748), ('ngos', 0.994460940361023), ('kazakhstan', 0.9942743182182312), ('hundreds', 0.9941031336784363), ('faiths', 0.9940779209136963), ('maximum', 0.9930719137191772), ('campaigns', 0.9928117990493774), ('january', 0.9927479028701782), ('campaign', 0.9926661849021912), ('gap', 0.9925445914268494)]
4.3 Attitude to the security
4.3.1 Keyword = Security
# Word2Vec
print(model_w2v.wv.most_similar('security'))
## [('political', 0.9996907711029053), ('national', 0.9994895458221436), ('its', 0.9993956089019775), ('rights', 0.9993816018104553), ('society', 0.9993573427200317), ('part', 0.9992894530296326), ('website', 0.9991859197616577), ('east', 0.9991275072097778), ('own', 0.9991114139556885), ('freedoms', 0.9990881085395813)]
# GloVe
print(glove.most_similar('security', number=10))
## [('industry', 0.9968301872937739), ('international', 0.9900705298955806), ('public', 0.989728754835081), ('space', 0.9895477565012066), ('social', 0.9892554013242136), ('healthcare', 0.9887413328851776), ('quality', 0.988124394778794), ('construction', 0.9876858273772054), ('services', 0.9869981019782441)]
# FastText
print(model_ft.wv.most_similar('security'))
## [('unity', 0.983215868473053), ('strength', 0.9823357462882996), ('community', 0.9792585372924805), ('city', 0.9779577851295471), ('capability', 0.975475549697876), ('integrity', 0.9746112823486328), ('entry', 0.974355161190033), ('prosperity', 0.9733220338821411), ('strengthen', 0.9725943207740784), ('commodity', 0.9691153168678284)]
Once more, FastText appears to have produced the most coherent output. One could envisage Putin framing security in terms of culture, community and prosperity. Similarly, it would not be unexpected for Putin to use strong and decisive terms such as strength, guarantee and capability in terms of security.
4.4 Performance and conclusions
As seen above, not all the algorithms produced output for every case. Moreover, the output suffers from a lack of coherence in many areas, especially for word2vec and GloVe. It should be acknowledged that much of this lack of coherence is likely to be attributed to the comparatively small volume of training data used. There also appears to be a slight software discrepancy between Jupyter Notebooks and Rstudio. The initial coding was performed in Jupyter Notebooks for testing, but there were slight discrepancies with the subsequent output from Rstudio. The final draft is compiled in RStudio as it allows one to work with R and Python code simultaneously. Similarly, RMarkdown within RStudio generates a report of publishable quality. Further testing will be done during the NLP chapters of my PhD project.
Interestingly, the performance of FastText stood out in terms of generating output for each query and providing marginally better coherence. It is likely to have benefited from artificially expanding its training data by using n-gram chunks of the individual words, as opposed to the unit words used by Word2Vec and GloVe, thus explaining its marginally better performance. This is encouraging given the size of the training data. For my own PhD project this is particularly valuable given its potential for working with the grammatically challenging Eurasian languages.
5.0 Key findings / Policy implications
Once more, acknowledging the unspectacular performance of the algorithms, some interesting findings still materialised. Firstly, there are unmistakable signals to business and the oligarchs in the guise of expectations. It is well known that from early on in his tenure, Putin has been very clear with business elites about his expectations for them, explaining the opportunities but also the potential perils of butting into the political arena. Similarly, the output suggests that he views business as a mutually beneficial activity, one of partnership, even social.
Secondly, Putin’s attitude to the West as estimated by the above output is unsurprising, given the strained relationship over the past decade or so. There is a definite sense of suspicion, alluded to by economic and military competition. Lastly, it was interesting to see the common terms affiliated to security appeared to be directed towards the public. Key themes of community and togetherness coupled with quite decisive and strong adjectives.
In sum, there does not appear to be anything too surprising or novel, but the output bodes well for further analysis as a larger dataset could offer more nuanced output.
For access to the source code as an Rmarkdown file, please contact me by email at [email protected]
- Text Mining Platov: Can Data Science tools hint at Vladimir Putin’s legacy? available at: http://caspianet.eu/blog/↩
- McCallum, Andrew Kachites. “MALLET: A Machine Learning for Language Toolkit.” http://mallet.cs.umass.edu. 2002.↩
- Transcriptions of Vladimir Putin’s annual addresses to the Federal Assembly are available at: http://en.kremlin.ru/↩
- Richardson, Leonard. MIT. Beautiful Soup. https://pypi.org/project/beautifulsoup4/↩
- Bird, Steven. Apache. NLTK. https://www.nltk.org/ & https://pypi.org/project/nltk/↩
- Rehurek, Radim & Sojka, Petr. 2010. Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks↩
- Grolemund, Garrett & Wickham, Hadley. 2016. R for Data Science. O’Reilly Media; 1 edition (25 July 2016)↩
- RMarkdown, available at: https://rmarkdown.rstudio.com/articles.html↩
- Word2Vec Google Code Project, available at: (https://code.google.com/archive/p/word2vec/)↩
- Mikolov, Tomas et al. 2013. Distributed Representations of Words and Phrases and their Compositionality. https://arxiv.org/pdf/1310.4546.pdf↩
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. https://nlp.stanford.edu/projects/glove/↩
- Facebook AI Research (FAIR) 2018. https://fasttext.cc/↩
- McKinney, Wes. 2017. Python for Data Analysis, 2e. O′Reilly; 2nd ed. edition (3 Nov. 2017)↩
- Matplotlib: Python plotting — Matplotlib 3.0.2 documentation. https://matplotlib.org/↩
- Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. url = http://ggplot2.org↩