The analysis of unconventional economic datasets
The doctoral thesis at hand encompasses five research papers on three subject areas. Two manuscripts discuss the suitability of latency as measure for Internet quality across countries. The following two are concerned with the application of topic models and automatic classification of texts in an economic context, while the last paper suggests to ... combine social network analysis with survival analysis in order to estimate the impact of centrality on professional success.At first glance the three research areas are very different and have little in common in terms of content. While the study of Internet quality fits into the macro-development and growth literature, the two papers on topic models are only similar in terms of applied method but address questions on monetary policy and economic history respectively. Finally, the study on social networks and success belongs in the field of labor economics, sociology or business economics.At second sight one may realize that there are nonetheless some issues, which are common to the individual manuscripts: the datasets used in this analysis all consist of secondary data. This means that they were not originally intended to be used in economic analysis. Consequently, a lot of data preparation and cleaning was necessary before any econometric methods could be applied. As the data had not been used for this kind of research before, their economic analysis provides interesting new insights, which may not have been possible with conventional data. Further, the datasets stand out for their complexity and size. Which, along with fast-speed of change, are the classical criteria for Big Data. This made it necessary to carefully select methods and tools to work around the associated difficulties. The choice of the title "The Analysis of Unconventional Economic Datasets" shall emphasize the complexity and secondary nature of the data, the latter being a feature rather than a criterion of Big Data. Laney (2001) came up with three dimensions along which data might be big, which could also serve as criteria for a definition of Big Data. According to Laney (2001) the data can be changing fast, be large in size and/or of high complexity due to which the use of conventional tools and methods will be challenging. Based on the aforementioned criteria the manuscripts in this thesis deal with Big Data problems. However, I refrain from including Big Data in the dissertations title, as it has become a widely used buzzword, whose meaning has been diluted in the public perception. In addition, a comprehensive overview of Big Data applications in economics would be beyond the scope of this thesis.