Research Paper Data Presentation Tools

Data analysis and presentation

Scope and purpose
Principles
Guidelines
Quality indicators
References

Scope and purpose

Data analysis is the process of developing answers to questions through the examination and interpretation of data.  The basic steps in the analytic process consist of identifying issues, determining the availability of suitable data, deciding on which methods are appropriate for answering the questions of interest, applying the methods and evaluating, summarizing and communicating the results. 

Analytical results underscore the usefulness of data sources by shedding light on relevant issues. Some Statistics Canada programs depend on analytical output as a major data product because, for confidentiality reasons, it is not possible to release the microdata to the public. Data analysis also plays a key role in data quality assessment by pointing to data quality problems in a given survey. Analysis can thus influence future improvements to the survey process.

Data analysis is essential for understanding results from surveys, administrative sources and pilot studies; for providing information on data gaps; for designing and redesigning surveys; for planning new statistical activities; and for formulating quality objectives.

Results of data analysis are often published or summarized in official Statistics Canada releases. 

Principles

A statistical agency is concerned with the relevance and usefulness to users of the information contained in its data. Analysis is the principal tool for obtaining information from the data.

Data from a survey can be used for descriptive or analytic studies. Descriptive studies are directed at the estimation of summary measures of a target population, for example, the average profits of owner-operated businesses in 2005 or the proportion of 2007 high school graduates who went on to higher education in the next twelve months.  Analytical studies may be used to explain the behaviour of and relationships among characteristics; for example, a study of risk factors for obesity in children would be analytic. 

To be effective, the analyst needs to understand the relevant issues both current and those likely to emerge in the future and how to present the results to the audience. The study of background information allows the analyst to choose suitable data sources and appropriate statistical methods. Any conclusions presented in an analysis, including those that can impact public policy, must be supported by the data being analyzed.

Guidelines

Initial preparation

  • Prior to conducting an analytical study the following questions should be addressed:

    • Objectives. What are the objectives of this analysis? What issue am I addressing? What question(s) will I answer?

    • Justification. Why is this issue interesting?  How will these answers contribute to existing knowledge? How is this study relevant?

    • Data. What data am I using? Why it is the best source for this analysis? Are there any limitations?

    • Analytical methods. What statistical techniques are appropriate? Will they satisfy the objectives?

    • Audience. Who is interested in this issue and why?

 Suitable data

  • Ensure that the data are appropriate for the analysis to be carried out.  This requires investigation of a wide range of details such as whether the target population of the data source is sufficiently related to the target population of the analysis, whether the source variables and their concepts and definitions are relevant to the study, whether the longitudinal or cross-sectional nature of the data source is appropriate for the analysis, whether the sample size in the study domain is sufficient to obtain meaningful results and whether the quality of the data, as outlined in the survey documentation or assessed through analysis is sufficient.

  •  If more than one data source is being used for the analysis, investigate whether the sources are consistent and how they may be appropriately integrated into the analysis.

Appropriate methods and tools

  • Choose an analytical approach that is appropriate for the question being investigated and the data to be analyzed. 

  • When analyzing data from a probability sample, analytical methods that ignore the survey design can be appropriate, provided that sufficient model conditions for analysis are met. (See Binder and Roberts, 2003.) However, methods that incorporate the sample design information will generally be effective even when some aspects of the model are incorrectly specified.

  • Assess whether the survey design information can be incorporated into the analysis and if so how this should be done such as using design-based methods.  See Binder and Roberts (2009) and Thompson (1997) for discussion of approaches to inferences on data from a probability sample.

    • See Chambers and Skinner (2003), Korn and Graubard (1999), Lehtonen and Pahkinen (1995), Lohr (1999), and Skinner, Holt and Smith (1989) for a number of examples illustrating design-based analytical methods.

    • For a design-based analysis consult the survey documentation about the recommended approach for variance estimation for the survey. If the data from more than one survey are included in the same analysis, determine whether or not the different samples were independently selected and how this would impact the appropriate approach to variance estimation.

    • The data files for probability surveys frequently contain more than one weight variable, particularly if the survey is longitudinal or if it has both cross-sectional and longitudinal purposes. Consult the survey documentation and survey experts if it is not obvious as to which might be the best weight to be used in any particular design-based analysis.

    • When analyzing data from a probability survey, there may be insufficient design information available to carry out analyses using a full design-based approach.  Assess the alternatives.

  • Consult with experts on the subject matter, on the data source and on the statistical methods if any of these is unfamiliar to you.

  • Having determined the appropriate analytical method for the data, investigate the software choices that are available to apply the method. If analyzing data from a probability sample by design-based methods, use software specifically for survey data since standard analytical software packages that can produce weighted point estimates do not correctly calculate variances for survey-weighted estimates.

  • It is advisable to use commercial software, if suitable, for implementing the chosen analyses, since these software packages have usually undergone more testing than non-commercial software.

  • Determine whether it is necessary to reformat your data in order to use the selected software.

  • Include a variety of diagnostics among your analytical methods if you are fitting any models to your data.

  • Data sources vary widely with respect to missing data.  At one extreme, there are data sources which seem complete - where any missing units have been accounted for through a weight variable with a nonresponse component and all missing items on responding units have been filled in by imputed values.  At the other extreme, there are data sources where no processing has been done with respect to missing data.  The work required by the analyst to handle missing data can thus vary widely. It should be noted that the handling of missing data in analysis is an ongoing topic of research.
    • Refer to the documentation about the data source to determine the degree and types of missing data and the processing of missing data that has been performed.  This information will be a starting point for what further work may be required.

    • Consider how unit and/or item nonresponse could be handled in the analysis, taking into consideration the degree and types of missing data in the data sources being used.

    • Consider whether imputed values should be included in the analysis and if so, how they should be handled.  If imputed values are not used, consideration must be given to what other methods may be used to properly account for the effect of nonresponse in the analysis.

    • If the analysis includes modelling, it could be appropriate to include some aspects of nonresponse in the analytical model.

    • Report any caveats about how the approaches used to handle missing data could have impact on results

Interpretation of results

  • Since most analyses are based on observational studies rather than on the results of a controlled experiment, avoid drawing conclusions concerning causality.

  • When studying changes over time, beware of focusing on short-term trends without inspecting them in light of medium-and long-term trends. Frequently, short-term trends are merely minor fluctuations around a more important medium- and/or long-term trend.

  • Where possible, avoid arbitrary time reference points. Instead, use meaningful points of reference, such as the last major turning point for economic data, generation-to-generation differences for demographic statistics, and legislative changes for social statistics.

Presentation of results

  • Focus the article on the important variables and topics. Trying to be too comprehensive will often interfere with a strong story line.

  • Arrange ideas in a logical order and in order of relevance or importance. Use headings, subheadings and sidebars to strengthen the organization of the article.

  • Keep the language as simple as the subject permits. Depending on the targeted audience for the article, some loss of precision may sometimes be an acceptable trade-off for more readable text.

  • Use graphs in addition to text and tables to communicate the message. Use headings that capture the meaning (e.g. "Women's earnings still trail men's") in preference to traditional chart titles (e.g."Income by age and sex"). Always help readers understand the information in the tables and charts by discussing it in the text.

  • When tables are used, take care that the overall format contributes to the clarity of the data in the tables and prevents misinterpretation.  This includes spacing; the wording, placement and appearance of titles; row and column headings and other labeling. 

  • Explain rounding practices or procedures. In the presentation of rounded data, do not use more significant digits than are consistent with the accuracy of the data.

  • Satisfy any confidentiality requirements (e.g. minimum cell sizes) imposed by the surveys or administrative sources whose data are being analysed.

  • Include information about the data sources used and any shortcomings in the data that may have affected the analysis.  Either have a section in the paper about the data or a reference to where the reader can get the details.

  • Include information about the analytical methods and tools used.  Either have a section on methods or a reference to where the reader can get the details.

  • Include information regarding the quality of the results. Standard errors, confidence intervals and/or coefficients of variation provide the reader important information about data quality. The choice of indicator may vary depending on where the article is published.

  • Ensure that all references are accurate, consistent and are referenced in the text.

  • Check for errors in the article. Check details such as the consistency of figures used in the text, tables and charts, the accuracy of external data, and simple arithmetic.

  • Ensure that the intentions stated in the introduction are fulfilled by the rest of the article. Make sure that the conclusions are consistent with the evidence.

  • Have the article reviewed by others for relevance, accuracy and comprehensibility, regardless of where it is to be disseminated.  As a good practice, ask someone from the data providing division to review how the data were used.  If the article is to be disseminated outside of Statistics Canada, it must undergo institutional and peer review as specified in the Policy on the Review of Information Products (Statistics Canada, 2003). 

  • If the article is to be disseminated in a Statistics Canada publication make sure that it complies with the current Statistics Canada Publishing Standards. These standards affect graphs, tables and style, among other things.

  • As a good practice, consider presenting the results to peers prior to finalizing the text. This is another kind of peer review that can help improve the article. Always do a dry run of presentations involving external audiences.

  • Refer to available documents that could provide further guidance for improvement of your article, such as Guidelines on Writing Analytical Articles (Statistics Canada 2008 ) and the Style Guide (Statistics Canada 2004)

Quality indicators

Main quality elements:  relevance, interpretability, accuracy, accessibility

An analytical product is relevant if there is an audience who is (or will be) interested in the results of the study.

For the interpretability of an analytical article to be high, the style of writing must suit the intended audience. As well, sufficient details must be provided that another person, if allowed access to the data, could replicate the results.

For an analytical product to be accurate, appropriate methods and tools need to be used to produce the results.

For an analytical product to be accessible, it must be available to people for whom the research results would be useful.

References

Binder, D.A. and G.R. Roberts. 2003. "Design-based methods for estimating model parameters."  In Analysis of Survey Data. R.L. Chambers and C.J. Skinner (eds.) Chichester. Wiley. p. 29-48.

Binder, D.A. and G. Roberts. 2009. "Design and Model Based Inference for Model Parameters." In Handbook of Statistics 29B: Sample Surveys: Inference and Analysis. Pfeffermann, D. and Rao, C.R. (eds.) Vol. 29B. Chapter 24. Amsterdam.Elsevier. 666 p.

Chambers, R.L. and C.J. Skinner (eds.) 2003. Analysis of Survey Data. Chichester. Wiley. 398 p.

Korn, E.L. and B.I. Graubard. 1999. Analysis of Health Surveys. New York. Wiley. 408 p.

Lehtonen, R. and E.J. Pahkinen. 2004. Practical Methods for Design and Analysis of Complex Surveys.Second edition. Chichester. Wiley.

Lohr, S.L. 1999. Sampling: Design and Analysis. Duxbury Press. 512 p.

Skinner, C.K., D.Holt and T.M.F. Smith. 1989. Analysis of Complex Surveys. Chichester. Wiley. 328 p.

Thompson, M.E. 1997. Theory of Sample Surveys. London. Chapman and Hall. 312 p.

Statistics Canada. 2003. "Policy on the Review of Information Products." Statistics Canada Policy Manual. Section 2.5. Last updated March 4, 2009.

Statistics Canada. 2004. Style Guide.  Last updated October 6, 2004.

Statistics Canada. 2008. Guidelines on Writing Analytical Articles. Last updated September 16, 2008.

"Dataviz" redirects here. For the software company, see DataViz.

Data visualization or data visualisation is viewed by many disciplines as a modern equivalent of visual communication. It involves the creation and study of the visual representation of data, meaning "information that has been abstracted in some schematic form, including attributes or variables for the units of information".[1]

A primary goal of data visualization is to communicate information clearly and efficiently via statistical graphics, plots and information graphics. Numerical data may be encoded using dots, lines, or bars, to visually communicate a quantitative message.[2] Effective visualization helps users analyze and reason about data and evidence. It makes complex data more accessible, understandable and usable. Users may have particular analytical tasks, such as making comparisons or understanding causality, and the design principle of the graphic (i.e., showing comparisons or showing causality) follows the task. Tables are generally used where users will look up a specific measurement, while charts of various types are used to show patterns or relationships in the data for one or more variables.

Data visualization is both an art and a science.[3] It is viewed as a branch of descriptive statistics by some, but also as a grounded theory development tool by others. Increased amounts of data created by Internet activity and an expanding number of sensors in the environment are referred to as "big data" or Internet of things. Processing, analyzing and communicating this data present ethical and analytical challenges for data visualization.[4] The field of data science and practitioners called data scientists help address this challenge.[5]

Overview[edit]

Data visualization refers to the techniques used to communicate data or information by encoding it as visual objects (e.g., points, lines or bars) contained in graphics. The goal is to communicate information clearly and efficiently to users. It is one of the steps in data analysis or data science. According to Friedman (2008) the "main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesn't mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information".[6]

Indeed, Fernanda Viegas and Martin M. Wattenberg suggested that an ideal visualization should not only communicate clearly, but stimulate viewer engagement and attention.[7]

Data visualization is closely related to information graphics, information visualization, scientific visualization, exploratory data analysis and statistical graphics. In the new millennium, data visualization has become an active area of research, teaching and development. According to Post et al. (2002), it has united scientific and information visualization.[8]

Characteristics of effective graphical displays[edit]

The greatest value of a picture is when it forces us to notice what we never expected to see.

John Tukey[9]

Professor Edward Tufte explained that users of information displays are executing particular analytical tasks such as making comparisons or determining causality. The design principle of the information graphic should support the analytical task, showing the comparison or causality.[10]

In his 1983 book The Visual Display of Quantitative Information, Edward Tufte defines 'graphical displays' and principles for effective graphical display in the following passage: "Excellence in statistical graphics consists of complex ideas communicated with clarity, precision and efficiency. Graphical displays should:

  • show the data
  • induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production or something else
  • avoid distorting what the data has to say
  • present many numbers in a small space
  • make large data sets coherent
  • encourage the eye to compare different pieces of data
  • reveal the data at several levels of detail, from a broad overview to the fine structure
  • serve a reasonably clear purpose: description, exploration, tabulation or decoration
  • be closely integrated with the statistical and verbal descriptions of a data set.

Graphics reveal data. Indeed graphics can be more precise and revealing than conventional statistical computations."[11]

For example, the Minard diagram shows the losses suffered by Napoleon's army in the 1812–1813 period. Six variables are plotted: the size of the army, its location on a two-dimensional surface (x and y), time, direction of movement, and temperature. The line width illustrates a comparison (size of the army at points in time) while the temperature axis suggests a cause of the change in army size. This multivariate display on a two dimensional surface tells a story that can be grasped immediately while identifying the source data to build credibility. Tufte wrote in 1983 that: "It may well be the best statistical graphic ever drawn."[11]

Not applying these principles may result in misleading graphs, which distort the message or support an erroneous conclusion. According to Tufte, chartjunk refers to extraneous interior decoration of the graphic that does not enhance the message, or gratuitous three dimensional or perspective effects. Needlessly separating the explanatory key from the image itself, requiring the eye to travel back and forth from the image to the key, is a form of "administrative debris." The ratio of "data to ink" should be maximized, erasing non-data ink where feasible.[11]

The Congressional Budget Office summarized several best practices for graphical displays in a June 2014 presentation. These included: a) Knowing your audience; b) Designing graphics that can stand alone outside the context of the report; and c) Designing graphics that communicate the key messages in the report.[12]

Quantitative messages[edit]

Author Stephen Few described eight types of quantitative messages that users may attempt to understand or communicate from a set of data and the associated graphs used to help communicate the message:

  1. Time-series: A single variable is captured over a period of time, such as the unemployment rate over a 10-year period. A line chart may be used to demonstrate the trend.
  2. Ranking: Categorical subdivisions are ranked in ascending or descending order, such as a ranking of sales performance (the measure) by sales persons (the category, with each sales person a categorical subdivision) during a single period. A bar chart may be used to show the comparison across the sales persons.
  3. Part-to-whole: Categorical subdivisions are measured as a ratio to the whole (i.e., a percentage out of 100%). A pie chart or bar chart can show the comparison of ratios, such as the market share represented by competitors in a market.
  4. Deviation: Categorical subdivisions are compared against a reference, such as a comparison of actual vs. budget expenses for several departments of a business for a given time period. A bar chart can show comparison of the actual versus the reference amount.
  5. Frequency distribution: Shows the number of observations of a particular variable for given interval, such as the number of years in which the stock market return is between intervals such as 0-10%, 11-20%, etc. A histogram, a type of bar chart, may be used for this analysis. A boxplot helps visualize key statistics about the distribution, such as median, quartiles, outliers, etc.
  6. Correlation: Comparison between observations represented by two variables (X,Y) to determine if they tend to move in the same or opposite directions. For example, plotting unemployment (X) and inflation (Y) for a sample of months. A scatter plot is typically used for this message.
  7. Nominal comparison: Comparing categorical subdivisions in no particular order, such as the sales volume by product code. A bar chart may be used for this comparison.
  8. Geographic or geospatial: Comparison of a variable across a map or layout, such as the unemployment rate by state or the number of persons on the various floors of a building. A cartogram is a typical graphic used.[2][13]

Analysts reviewing a set of data may consider whether some or all of the messages and graphic types above are applicable to their task and audience. The process of trial and error to identify meaningful relationships and messages in the data is part of exploratory data analysis.

Visual perception and data visualization[edit]

A human can distinguish differences in line length, shape, orientation, and color (hue) readily without significant processing effort; these are referred to as "pre-attentive attributes". For example, it may require significant time and effort ("attentive processing") to identify the number of times the digit "5" appears in a series of numbers; but if that digit is different in size, orientation, or color, instances of the digit can be noted quickly through pre-attentive processing.[14]

Effective graphics take advantage of pre-attentive processing and attributes and the relative strength of these attributes. For example, since humans can more easily process differences in line length than surface area, it may be more effective to use a bar chart (which takes advantage of line length to show comparison) rather than pie charts (which use surface area to show comparison).[14]

Human perception/cognition and data visualization[edit]

Almost all data visualizations are created for human consumption. Knowledge of human perception and cognition is necessary when designing intuitive visualizations.[15] Cognition refers to processes in human beings like perception, attention, learning, memory, thought, concept formation, reading, and problem solving.[16] Human visual processing is efficient in detecting changes and making comparisons between quantities, sizes, shapes and variations in lightness. When properties of symbolic data are mapped to visual properties, humans can browse through large amounts of data efficiently. It is estimated that 2/3 of the brain's neurons can be involved in visual processing.[17] Proper visualization provides a different approach to show potential connections, relationships, etc. which are not as obvious in non-visualized quantitative data. Visualization can become a means of data exploration.

History of data visualization[edit]

There is no comprehensive 'history' of data visualization. There are no accounts that span the entire development of visual thinking and the visual representation of data, and which collate the contributions of disparate disciplines.[18] Michael Friendly and Daniel J Denis of York University are engaged in a project that attempts to provide a comprehensive history of visualization. Contrary to general belief, data visualization is not a modern development. Stellar data, or information such as location of stars were visualized on the walls of caves (such as those found in Lascaux Cave in Southern France) since the Pleistocene era.[19] Physical artefacts such as Mesopotamian clay tokens (5500 BC), Inca quipus (2600 BC) and Marshall Islands stick charts (n.d.) can also be considered as visualizing quantitative information [20][21].

First documented data visualization can be tracked back to 1160 B.C. with Turin Papyrus Map which accurately illustrates the distribution of geological resources and provides information about quarrying of those resources.[22] Such maps can be categorized as Thematic Cartography, which is a type of data visualization that presents and communicates specific data and information through a geographical illustration designed to show a particular theme connected with a specific geographic area. Earliest documented forms of data visualization were various thematic maps from different cultures and ideograms and hieroglyphs that provided and allowed interpretation of information illustrated. For example, Linear B tablets of Mycenae provided a visualization of information regarding Late Bronze Age era trades in the Mediterranean. The idea of coordinates was used by ancient Egyptian surveyors in laying out towns, earthly and heavenly positions were located by something akin to latitude and longitude at least by 200 BC, and the map projection of a spherical earth into latitude and longitude by Claudius Ptolemy [c.85–c. 165] in Alexandria would serve as reference standards until the 14th century.[22]

Invention of paper and parchment allowed further development of visualizations throughout history. Figure shows a graph from the 10th, possibly 11th century that is intended to be an illustration of the planetary movement, used in an appendix of a textbook in monastery schools.[23] The graph apparently was meant to represent a plot of the inclinations of the planetary orbits as a function of the time. For this purpose the zone of the zodiac was represented on a plane with a horizontal line divided into thirty parts as the time or longitudinal axis. The vertical axis designates the width of the zodiac. The horizontal scale appears to have been chosen for each planet individually for the periods cannot be reconciled. The accompanying text refers only to the amplitudes. The curves are apparently not related in time.

By the 16th century, techniques and instruments for precise observation and measurement of physical quantities, and geographic and celestial position were well-developed (for example, a “wall quadrant” constructed by Tycho Brahe [1546–1601], covering an entire wall in his observatory) Particularly important were the development of triangulation and other methods to determine mapping locations accurately.[18]

French philosopher and mathematician René Descartes and Pierre de Fermat developed analytic geometry and two-dimensional coordinate system which heavily influenced the practical methods of displaying and calculating values. Fermat and Blaise Pascal's work on statistics and probability theory laid the groundwork for what we now conceptualize as data.[18] According to the Interaction Design Foundation, these developments allowed and helped William Playfair, who saw potential for graphical communication of quantitative data, to generate and develop graphical methods of statistics.[15]

In the second half of the 20th century, Jacques Bertin used quantitative graphs to represent information "intuitively, clearly, accurately, and efficiently".[15]

John Tukey and Edward Tufte pushed the bounds of data visualization; Tukey with his new statistical approach of exploratory data analysis and Tufte with his book "The Visual Display of Quantitative Information" paved the way for refining data visualization techniques for more than statisticians. With the progression of technology came the progression of data visualization; starting with hand drawn visualizations and evolving into more technical applications – including interactive designs leading to software visualization.[24]

Programs like SAS, SOFA, R, Minitab, Cornerstone and more allow for data visualization in the field of statistics. Other data visualization applications, more focused and unique to individuals, programming languages such as D3, Python and JavaScript help to make the visualization of quantitative data a possibility. Private schools have also developed programs to meet the demand for learning data visualization and associated programming libraries, including free programs like The Data Incubator or paid programs like General Assembly.[25]

Terminology[edit]

Data visualization involves specific terminology, some of which is derived from statistics. For example, author Stephen Few defines two types of data, which are used in combination to support a meaningful analysis or visualization:

  • Categorical: Text labels describing the nature of the data, such as "Name" or "Age". This term also covers qualitative (non-numerical) data.
  • Quantitative: Numerical measures, such as "25" to represent the age in years.

Two primary types of information displays are tables and graphs.

  • A table contains quantitative data organized into rows and columns with categorical labels. It is primarily used to look up specific values. In the example above, the table might have categorical column labels representing the name (a qualitative variable) and age (a quantitative variable), with each row of data representing one person (the sampled experimental unit or category subdivision).
  • A graph is primarily used to show relationships among data and portrays values encoded as visual objects (e.g., lines, bars, or points). Numerical values are displayed within an area delineated by one or more axes. These axes provide scales (quantitative and categorical) used to label and assign values to the visual objects. Many graphs are also referred to as charts.[26]

Eppler and Lengler have developed the "Periodic Table of Visualization Methods," an interactive chart displaying various data visualization methods. It includes six types of data visualization methods: data, information, concept, strategy, metaphor and compound.[27]

Examples of diagrams used for data visualization[edit]

NameVisual DimensionsExample Usages
Bar chart of tips by day of week
Bar chart
  • length/count
  • category
  • (color)
  • Comparison of values, such as sales performance for several persons or businesses in a single time period. For a single variable measured over time (trend) a line chart is preferable.
Histogram of housing prices
Histogram
  • bin limits
  • count/length
  • (color)
  • Determining frequency of annual stock market percentage returns within particular ranges (bins) such as 0-10%, 11-20%, etc. The height of the bar represents the number of observations (years) with a return % in the range represented by the bin.
Basic scatterplot of two variables
Scatter plot
  • x position
  • y position
  • (symbol/glyph)
  • (color)
  • (size)
  • Determining the relationship (e.g., correlation) between unemployment (x) and inflation (y) for multiple time periods.
Scatter plot (3D)
  • position x
  • position y
  • position z
  • color
Network
  • Finding clusters in the network (e.g. grouping Facebook friends into different clusters).
  • Discovering bridges (information brokers or boundary spanners) between clusters in the network
  • Determining the most influential nodes in the network (e.g. A company wants to target a small group of people on Twitter for a marketing campaign).
  • Finding outlier actors who does not fit in any cluster or in the periphery of a network.
Streamgraph
Treemap
  • disk space by location / file type
Gantt chart
Heat map
  • Analyzing risk, with green, yellow and red representing low, medium, and high risk, respectively.

Other perspectives[edit]

There are different approaches on the scope of data visualization. One common focus is on information presentation, such as Friedman (2008). Friendly (2008) presumes two main parts of data visualization: statistical graphics, and thematic cartography.[1] In this line the "Data Visualization: Modern Approaches" (2007) article gives an overview of seven subjects of data visualization:[28]

All these subjects are closely related to graphic design and information representation.

On the other hand, from a computer science perspective, Frits H. Post in 2002 categorized the field into sub-fields:[8][29]

Data presentation architecture[edit]

Data presentation architecture (DPA) is a skill-set that seeks to identify, locate, manipulate, format and present data in such a way as to optimally communicate meaning and proper knowledge.

Historically, the term data presentation architecture is attributed to Kelly Lautt:[30] "Data Presentation Architecture (DPA) is a rarely applied skill set critical for the success and value of Business Intelligence. Data presentation architecture weds the science of numbers, data and statistics in discovering valuable information from data and making it usable, relevant and actionable with the arts of data visualization, communications, organizational psychology and change management in order to provide business intelligence solutions with the data scope, delivery timing, format and visualizations that will most effectively support and drive operational, tactical and strategic behaviour toward understood business (or organizational) goals. DPA is neither an IT nor a business skill set but exists as a separate field of expertise. Often confused with data visualization, data presentation architecture is a much broader skill set that includes determining what data on what schedule and in what exact format is to be presented, not just the best way to present data that has already been chosen. Data visualization skills are one element of DPA."

Objectives[edit]

DPA has two main objectives:

  • To use data to provide knowledge in the most efficient manner possible (minimize noise, complexity, and unnecessary data or detail given each audience's needs and roles)
  • To use data to provide knowledge in the most effective manner possible (provide relevant, timely and complete data to each audience member in a clear and understandable manner that conveys important meaning, is actionable and can affect understanding, behavior and decisions)

Scope[edit]

With the above objectives in mind, the actual work of data presentation architecture consists of:

  • Creating effective delivery mechanisms for each audience member depending on their role, tasks, locations and access to technology
  • Defining important meaning (relevant knowledge) that is needed by each audience member in each context
  • Determining the required periodicity of data updates (the currency of the data)
  • Determining the right timing for data presentation (when and how often the user needs to see the data)
  • Finding the right data (subject area, historical reach, breadth, level of detail, etc.)
  • Utilizing appropriate analysis, grouping, visualization, and other presentation formats

Related fields[edit]

DPA work shares commonalities with several other fields, including:

  • Business analysis in determining business goals, collecting requirements, mapping processes.
  • Business process improvement in that its goal is to improve and streamline actions and decisions in furtherance of business goals
  • Data visualization in that it uses well-established theories of visualization to add or highlight meaning or importance in data presentation.
  • Information architecture, but information architecture's focus is on unstructured data and therefore excludes both analysis (in the statistical/data sense) and direct transformation of the actual content (data, for DPA) into new entities and combinations.
  • HCI and interaction design, since the many of the principles in how to design interactive data visualisation have been developed cross-disciplinary with HCI.
  • Visual journalism and data-driven journalism or data journalism: Visual journalism is concerned with all types of graphic facilitation of the telling of news stories, and data-driven and data journalism are not necessarily told with data visualisation. Nevertheless, the field of journalism are at the forefront in developing new data visualisations to communicate data.
  • Graphic design, conveying information through styling, typography, position, and other aesthetic concerns.

See also[edit]

People (historical)[edit]

People (active today)[edit]

References[edit]

  1. ^ abMichael Friendly (2008). "Milestones in the history of thematic cartography, statistical graphics, and data visualization".
  2. ^ abStephen Few-Perceptual Edge-Selecting the Right Graph for Your Message-2004
  3. ^Manuela Aparicio and Carlos J. Costa (November 2014). "Data visualization". Communication Design Quarterly Review. 3 (1): 7–11. doi:10.1145/2721882.2721883. 
  4. ^Nikos Bikaks (2018) "Big Data Visualization Tools" Encyclopedia of Big Data Technologies, Springer 2018.
  5. ^Forbes-Gil Press-A Very Short History of Data Science-May 2013
  6. ^Vitaly Friedman (2008) "Data Visualization and Infographics" in: Graphics, Monday Inspiration, January 14th, 2008.
  7. ^Fernanda Viegas and Martin Wattenberg (April 19, 2011). "How To Make Data Look Sexy". CNN.com. Archived from the original on May 6, 2011. Retrieved May 7, 2017. 
  8. ^ abFrits H. Post, Gregory M. Nielson and Georges-Pierre Bonneau (2002). Data Visualization: The State of the Art. Research paper TU delft, 2002.Archived 2009-10-07 at the Wayback Machine..
  9. ^Tukey, John (1977). Exploratory Data Analysis. Addison-Wesley. ISBN 0-201-07616-0. 
  10. ^Edward Tufte-Presentation-August 2013
  11. ^ abcTufte, Edward (1983). The Visual Display of Quantitative Information. Cheshire, Connecticut: Graphics Press. ISBN 0-9613921-4-2. 
  12. ^CBO-Telling Visual Stories About Data-June 2014
  13. ^Stephen Few-Perceptual Edge-Graph Selection Matrix
  14. ^ abSteven Few-Tapping the Power of Visual Perception-September 2004
  15. ^ abc"Data Visualization for Human Perception". The Interaction Design Foundation. Retrieved 2015-11-23. 
  16. ^"Visualization"(PDF). SFU. SFU lecture. Retrieved 2015-11-22. 
  17. ^"How much of the brain is involved with vision? - Quora". www.quora.com. Retrieved 2015-11-23. 
  18. ^ abcFriendly, Michael. "A Brief History of Data Visualization". Springer-Verlag. Retrieved 19 November 2017. 
  19. ^Whitehouse, D. (9 August 2000). "Ice Age star map discovered". BBC News. Retrieved 20 January 2018. 
  20. ^Dragicevic, Pierre; Jansen, Yvonne (2012). "List of Physical Visualizations and Related Artefacts". Retrieved 2018-01-12. 
  21. ^Jansen, Yvonne; Dragicevic, Pierre; Isenberg, Petra; Alexander, Jason; Karnik, Abhijit; Kildal, Johan; Subramanian, Sriram; Hornbaek, Kasper (2015). "Opportunities and challenges for data physicalization". Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems: 3227–3236. 
  22. ^ abFriendly, Michael (2001). "Milestones in the history of thematic cartography, statistical graphics, and data visualization". Retrieved 2017-11-19. 
  23. ^Funkhouser, Howard Gray (Jan 1936). "A Note on a Tenth Century Graph"(PDF). Osiris. 1: 260–262. doi:10.1086/368425. Retrieved 19 November 2017. 
  24. ^Friendly, Michael (2006). "A Brief History of Data Visualization"(PDF). York University. Springer-Verlag. Retrieved 2015-11-22. 
  25. ^"NY gets new boot camp for data scientists: It's free but harder to get into than Harvard". Venture Beat. Retrieved 2016-02-21. 
  26. ^Steven Few-Selecting the Right Graph for Your Message-September 2004
  27. ^Lengler, Ralph; Eppler, Martin. J. "Periodic Table of Visualization Methods". www.visual-literacy.org. Retrieved 15 March 2013. 
  28. ^"Data Visualization: Modern Approaches". in: Graphics, August 2nd, 2007
  29. ^Frits H. Post, Gregory M. Nielson and Georges-Pierre Bonneau (2002). Data Visualization: The State of the ArtArchived 2009-10-07 at the Wayback Machine..
  30. ^The first formal, recorded, public usages of the term data presentation architecture were at the three formal Microsoft Office 2007 Launch events in Dec, Jan and Feb of 2007–08 in Edmonton, Calgary and Vancouver (Canada) in a presentation by Kelly Lautt describing a business intelligence system designed to improve service quality in a pulp and paper company. The term was further used and recorded in public usage on December 16, 2009 in a Microsoft Canada presentation on the value of merging Business Intelligence with corporate collaboration processes.

Further reading[edit]

  • Roels, Reinout, Baeten, Yves & Signer, Beat (2016) "Interactive and Narrative Data Visualisation for Presentation-based Knowledge Transfer"Communications in Computer and Information Science (CCIS), 739, 2017.
  • Chandrajit Bajaj, Bala Krishnamurthy (1999). Data Visualization Techniques.
  • William S. Cleveland (1993). Visualizing Data. Hobart Press.
  • William S. Cleveland (1994). The Elements of Graphing Data. Hobart Press.
  • Alexander N. Gorban, Balázs Kégl, Donald Wunsch, and Andrei Zinovyev (2008). Principal Manifolds for Data Visualization and Dimension Reduction. LNCSE 58. Springer.
  • John P. Lee and Georges G. Grinstein (eds.) (1994). Database Issues for Data Visualization: IEEE Visualization '93 Workshop, San Diego.
  • Peter R. Keller and Mary Keller (1993). Visual Cues: Practical Data Visualization.
  • Frits H. Post, Gregory M. Nielson and Georges-Pierre Bonneau (2002). Data Visualization: The State of the Art.
  • Stewart Liff and Pamela A. Posey, Seeing is Believing: How the New Art of Visual Management Can Boost Performance Throughout Your Organization, AMACOM, New York (2007), ISBN 978-0-8144-0035-7
  • Stephen Few (2009) Fundamental Differences in Analytical Tools - Exploratory, Custom, or Customizable.

External links[edit]

Data visualization is one of the steps in analyzing data and presenting it to users.
A time series illustrated with a line chart demonstrating trends in U.S. federal spending and revenue over time.
A scatterplot illustrating negative correlation between two variables (inflation and unemployment) measured at points in time.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *