Skip to main content

Data and Dimension Reduction

  • Chapter
  • First Online:
Mapping Financial Stability

Part of the book series: Computational Risk Management ((Comp. Risk Mgmt))

  • 983 Accesses

Abstract

Data and dimension reduction techniques hold promise for representing data in easily understandable formats, as has been shown by their wide scope of applications. Data reductions provide summarizations of data by compressing information into fewer partitions, whereas dimension reductions provide low-dimensional overviews of similarity relations in data. Thus, these techniques provide means for exploratory data analysis (EDA). From a broader perspective, EDA is only one approach out of many in data mining, and knowledge discovery includes data mining as only one of its steps. To provide a holistic view in a top-down manner, we start by the broader concepts, and end with discussions of data and dimension reductions and their combination. As the aim of Chap. 5 is to provide a comparison of early dimension reduction methods, the focus of this chapter is also on more detailed presentations of so-called first-generation methods, including Multidimensional Scaling (MDS), Sammon’s mapping and the Self-Organizing Map (SOM).

The eye, which is called the window of the soul, is the principal means by which the central sense can most completely and abundantly appreciate the infinite works of nature

– Leonardo da Vinci

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    There are several software implementations of the SOM. The seminal packages—SOM_PAK, SOM Toolbox for Matlab, Nenet, etc—are not regularly updated or adapted to their environment. Out of the newer implementations, Viscovery SOMine provides the needed means for interactive exploratory analysis. The most recent addition to the list of implementations is the interactive, web-based implementation provided by infolytika (http://risklab.fi/demo/macropru/). For a description, see Sarlin (2014a). For a practical discussion of SOM software and an early version of the implementation in Viscovery SOMine, see Deboeck (1998b, a). See also Moehrmann et al. (2011), for a comparison of SOM implementations. The first analyses of this book were performed in the Viscovery SOMine 5.1 package due to its easily interpretable visual representation and interaction features, not the least when introducing it to practitioners in general and policymakers in particular. Recently, the packages available in the statistical computing environment R have significantly improved, in particular regarding the visualization of SOM outputs. Thus, the final parts of the research in this book, including the figures, have been produced in R. Moreover, the above mentioned interface by infolytika provides an interactive implementation of the R-based models.

  2. 2.

    In the literature, learning of the SOM has been defined through the entire spectrum of supervision. For instance, van Heerden and Engelbrecht (2008) define semi-supervised SOMs as similar to the supervised ones, except for them not being included in the matching phase (Eq. 4.9), whereas the semi-supervised version herein is their supervised SOM. However, as the SOM is never fully supervised, we stick to the definition of an unsupervised and a semi-supervised version.

References

  • Anand, S., & Buchner, A. (1998). Decision support using data mining. London: Financial Time Management.

    Google Scholar 

  • Baddeley, A., & Logie, R. (1999). Working memory: The multiple-component model. In A. Miyake & P. Shah (Eds.), Models of working memory (pp. 28–61). New York: Cambridge University Press.

    Google Scholar 

  • Barreto, G. (2007). Time series prediction with the self-organizing map: A review. In P. Hitzler & B. Hammer (Eds.), Perspectives on neural-symbolic integration. Heidelberg: Springer-Verlag.

    Google Scholar 

  • Bederson, B., & Shneiderman, B. (2003). The craft of information visualization: Readings and reflections. San Francisco, CA: Morgan Kaufman.

    Google Scholar 

  • Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In T. Dietterich, S. Becker & Z. Ghahramani (Eds.), Advances in neural information processing systems (Vol. 14, pp. 586–691). Cambridge, MA: MIT Press.

    Google Scholar 

  • Bertin, J. (1983). Semiology of graphics. Madison, WI: The University of Wisconsin Press.

    Google Scholar 

  • Bezdek, J. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.

    Google Scholar 

  • Bishop, C., Svensson, M., & Williams, C. (1998). GTM: The generative topographic mapping. Neural Computation, 10(1), 215–234.

    Article  Google Scholar 

  • Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., & Zanasi, A. (1998). Discovering data mining: From concepts to implementation. New Jersey: Prentice Hall.

    Google Scholar 

  • Card, S., Mackinlay, J., & Schneidermann, B. (1999). Readings in information visualization, using vision to think. San Diego, CA: Academic Press.

    Google Scholar 

  • Card, S., Robertson, G., & Mackinlay, J. (1991). The information visualizer, an information workspace. In Proceedings of CHI ’91, ACM Conference on Human Factors in Computing Systems, New Orleans (pp. 181–188).

    Google Scholar 

  • Chapelle, O., Schölkopf, B., & Zien, A. (Eds.). (2006). Semisupervised learning. Cambridge, MA: MIT Press.

    Google Scholar 

  • Chen, L., & Buja, A. (2009). Local multidimensional scaling for nonlinear dimension reduction, graph drawing and proximity analysis. Journal of the American Statistical Association, 104, 209–219.

    Article  Google Scholar 

  • Cottrell, M., & Letrémy, P. (2005). Missing values: Processing with the Kohonen algorithm. In Proceedings of Applied Stochastic Models and Data Analysis (ASMDA 05), Brest, France (pp. 489–496).

    Google Scholar 

  • Cox, T., & Cox, M. (2001). Multidimensional scaling. Boca Raton, Florida: Chapman & Hall/CRC.

    Google Scholar 

  • Deboeck, G. (1998a). Best practices in data mining using self-organizing maps. In G. Deboeck & T. Kohonen (Eds.), Visual explorations in finance with self-organizing maps (pp. 201–229). Berlin: Springer-Verlag.

    Google Scholar 

  • Deboeck, G. (1998b). Software tools for self-organizing map. In G. Deboeck & T. Kohonen (Eds.), Visual explorations in finance with self-organizing maps (pp. 179–194). Berlin: Springer-Verlag.

    Google Scholar 

  • Demartines, P., & Hérault, J. (1997). Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transactions on Neural Networks, 8, 148–154.

    Article  Google Scholar 

  • Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society (Series B), 39(1), 1–38.

    Google Scholar 

  • Dunn, J. (1973). A fuzzy relative of the isodata process and its use in detecting compact, well-separated clusters. Cybernetics and Systems, 3, 32–57.

    Google Scholar 

  • Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simoudis, J. Han & U. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 96) (pp. 226–231). AAAI Press.

    Google Scholar 

  • Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996a). From data mining to knowledge discovery: An overview. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth & R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining (pp. 1–34). Menlo Park, CA: AAAI Press / The MIT Press.

    Google Scholar 

  • Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996b). The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11), 27–34.

    Google Scholar 

  • Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996c). Knowledge discovery and data mining: Towards a unifying framework. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR (pp. 82–88).

    Google Scholar 

  • Fekete, J.-D., van Wijk, J., Stasko, J., & North, C. (2008). The value of information visualization. In Information visualization: Human-centered issues and perspectives (pp. 1–18). Springer.

    Google Scholar 

  • Forte, J., Letrémy, P., & Cottrell, M. (2002). Advantages and drawbacks of the batch Kohonen algorithm. In Proceedings of the European Symposium on Artificial Neural Networks (ESANN 02), Bruges, Belgium (pp. 223–230).

    Google Scholar 

  • Frawley, W., Piatetsky-Shapiro, G., & Matheus, C. (1992). Knowledge discovery in databases: An overview. AI Magazine, 13(3), 57–70.

    Google Scholar 

  • Gisbrecht, A., Hofmann, D., & Hammer, B. (2012). Discriminative dimensionality reduction mappings. In Proceedings of the International Symposium on Intelligent Data Analysis (pp. 126–138). Helsinki, Finland: Springer-Verlag.

    Google Scholar 

  • Haroz, S., & Whitney, D. (2012). How capacity limits of attention influence information visualization effectiveness. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2402–2410.

    Article  Google Scholar 

  • Havre, S., Hetzler, B., & Nowell, L. (2000). Themeriver: Visualizing theme changes over time. In Proceedings of the IEEE Symposium on Information Visualization (pp. 115–123).

    Google Scholar 

  • Hoaglin, D., Mosteller, F., & Tukey, J. (1983). Understanding robust and exploratory data analysis. New York: Wiley.

    Google Scholar 

  • Jain, A. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651–666.

    Article  Google Scholar 

  • Jain, A., Murty, M., & Flynn, P. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.

    Article  Google Scholar 

  • Kaser, O., & Lemire, D. (2007). Tag-cloud drawing: Algorithms for cloud visualization. In Proceedings of the Tagging and Metadata for Social Information Organization Workshop, Banff, Alberta, Canada.

    Google Scholar 

  • Keim, D. (2001). Visual exploration of large data sets. Communications of the ACM, 44(8), 38–44.

    Article  Google Scholar 

  • Keim, D., Kohlhammer, J., Ellis, G., & Mannsmann, F. (2010). Mastering the information age. Solving problems with visual analytics. Goslar: Eurographics Association.

    Google Scholar 

  • Keim, D., & Kriegel, H.-P. (1996). Visualization techniques for mining large databases: A comparison. IEEE Transactions on Knowledge and Data Engineering, 8(6), 923–938.

    Article  Google Scholar 

  • Keim, D., Mansmann, F., Schneidewind, J., & Ziegler, H. (2006). Challenges in visual data analysis. In Proceedings of the IEEE International Conference on Information Visualization (iV 13) (pp. 9–16). London, UK: IEEE Computer Society.

    Google Scholar 

  • Keim, D., Mansmann, F., & Thomas, J. (2009). Visual analytics: How much visualization and how much analytics? SIGKDD Explorations, 11(2), 5–8.

    Article  Google Scholar 

  • Koffa, K. (1935). Principles of gestalt psychology. London: Routledge & Kegan Paul.

    Google Scholar 

  • Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 59–69.

    Article  Google Scholar 

  • Kohonen, T. (1991). The hypermap architecture. In T. Kohonen, K. Mäkisara, O. Simula & J. Kangas (Eds.), Artificial neural networks (Vol. II, pp. 1357–1360). Amsterdam, Netherlands: Elsevier.

    Google Scholar 

  • Kohonen, T. (2001). Self-organizing maps (3rd ed.). Berlin: Springer-Verlag.

    Google Scholar 

  • Kruskal, J. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29, 1–27.

    Article  Google Scholar 

  • Kurgan, L., & Musilek, P. (2006). A survey of knowledge discovery and data mining process models. The Knowledge Engineering Review, 21(1), 1–24.

    Article  Google Scholar 

  • Lampinen, J., & Oja, E. (1992). Clustering properties of hierarchical self-organizing maps. Journal of Mathematical Imaging and Vision, 2(2–3), 261–272.

    Article  Google Scholar 

  • Larkin, J., & Simon, H. (1987). Why a diagram is (sometimes) worth ten thousand words. Cognitive Science, 11, 65–99.

    Article  Google Scholar 

  • Lee, J., & Verleysen, M. (2007). Nonlinear dimensionality reduction. Information science and statistics series. Heidelberg, Germany: Springer-Verlag.

    Google Scholar 

  • Lin, X. (1997). Map displays for information retrieval. Journal of the American Society for Information Science, 48(1), 40–54.

    Article  Google Scholar 

  • Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 702–710.

    Article  Google Scholar 

  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). Berkeley, CA: University of California Press.

    Google Scholar 

  • Moehrmann, J., Burkovski, A., Baranovskiy, E., Heinze, G., Rapoport, A., & Heideman, G. (2011). A discussion on visual interactive data exploration using self-organizing maps. In J. Laaksonen & T. Honkela (Eds.), Proceedings of the 8th International Workshop on Self-Organizing Maps (pp. 178–187). Helsinki, Finland: Springer-Verlag.

    Google Scholar 

  • Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(6), 559–572.

    Article  Google Scholar 

  • Pölzlbauer, G. (2004). Survey and comparison of quality measures for self-organizing maps. In Proceedings of the 5th Workshop on Data Analysis (WDA 2004), Sliezsky dom, Vysoké Tatry, Slovakia (pp. 67–82).

    Google Scholar 

  • Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326.

    Article  Google Scholar 

  • Rubin, D. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley & Sons.

    Google Scholar 

  • Sammon, J. (1969). A non-linear mapping for data structure analysis. IEEE Transactions on Computers, 18(5), 401–409.

    Article  Google Scholar 

  • Sarlin, P. (2014a) Macroprudential oversight, risk communication and visualization. arXiv:1404.4550.

  • Shannon, C., & Weaver, W. (1963). A mathematical theory of communication. Champaign: University of Illinois Press.

    Google Scholar 

  • Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining. Journal of Data Warehousing, 15(4), 13–19.

    Google Scholar 

  • Shepard, R. (1962). The analysis of proximities: Multidimensional scaling with an unknown distance function. Psychometrika, 27(125–140), 219–246.

    Article  Google Scholar 

  • Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings of the IEEE Symposium on Visual Languages, Boulder, CO (pp. 336–343).

    Google Scholar 

  • Tenenbaum, J., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319–2323.

    Article  Google Scholar 

  • Thomas, J., & Cook, K. (2005). Illuminating the path: Research and development agenda for visual analytics. Los Alamitos: IEEE Press.

    Google Scholar 

  • Torgerson, W. S. (1952). Multidimensional scaling: I. theory and method. Psychometrika, 17, 401–419.

    Article  Google Scholar 

  • Triesman, A. (1985). Preattentive processing in vision. Computer Vision, Graphics and Image Processing, 31(2), 156–177.

    Article  Google Scholar 

  • Tufte, E. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press.

    Google Scholar 

  • Tukey, J. (1977). Exploratory data analysis. Reading, PA: Addison-Wesley.

    Google Scholar 

  • van der Maaten, L., & Hinton, G. (2008). Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

    Google Scholar 

  • van Heerden, W., & Engelbrecht, A. (2008). A comparison of map neuron labeling approaches for unsupervised self-organizing feature maps. In Proceedings of the IEEE International Joint Conference on Neural Networks (pp. 2139–2146). Hong Kong: IEEE Computer Society.

    Google Scholar 

  • Venna, J., & Kaski, S. (2006). Local multidimensional scaling. Neural Networks, 19, 889–899.

    Article  Google Scholar 

  • Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (2000). SOM toolbox for Matlab 5. Technical Report: Helsinki University of Technology. A57.

    Google Scholar 

  • Ward, J. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.

    Article  Google Scholar 

  • Ware, C. (2004). Information visualization: Perception for design. San Francisco, CA: Morgan Kaufman.

    Google Scholar 

  • Ware, C. (2005). Visual queries: The foundation of visual thinking. In S. Tergan & T. Keller (Eds.), Knowledge and information visualization (pp. 27–35). Berlin, Germany: Springer.

    Google Scholar 

  • Weinberger, K., & Saul, L. (2005). Unsupervised learning of image manifolds by semidefinite programming. International Journal of Computer Vision, 70(1), 77–90.

    Article  Google Scholar 

  • Wismüller, A. (2009). A computational framework for non-linear dimensionality reduction and clustering. In J. Principe & R. Miikkulainen (Eds.), Proceedings of the Workshop on Self-Organizing Maps (WSOM 09) (pp. 334–343). St. Augustine, Florida, USA: Springer.

    Google Scholar 

  • Yin, H. (2008). The self-organizing maps: Background, theories, extensions and applications. In J. Fulcher & L. Jain (Eds.), Computational intelligence: A compendium (pp. 715–762). Heidelberg, Germany: Springer-Verlag.

    Google Scholar 

  • Young, G., & Householder, A. S. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3, 19–22.

    Article  Google Scholar 

  • Zhang, J., & Liu, Y. (2005). SVM decision boundary based discriminative subspace induction. Pattern Recognition, 38(10), 1746–1758.

    Article  Google Scholar 

  • Zhang, L., Stoffel, A., Behrisch, M., Mittelstädt, S., Schreck, T., Pompl, R., et al. (2012). Visual analytics for the big data era—a comparative review of state-of-the-art commercial systems. In Proceedings of the IEEE Conference on Visual Analytics Science and Technology (VAST), Seattle, WA (pp. 173–182).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Sarlin .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sarlin, P. (2014). Data and Dimension Reduction. In: Mapping Financial Stability. Computational Risk Management. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54956-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54956-4_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54955-7

  • Online ISBN: 978-3-642-54956-4

  • eBook Packages: Business and EconomicsEconomics and Finance (R0)

Publish with us

Policies and ethics