• Accountancy
  • Business Studies
  • Organisational Behaviour
  • Human Resource Management
  • Entrepreneurship

Tabular Presentation of Data: Meaning, Objectives, Features and Merits

What is tabulation.

The systematic presentation of numerical data in rows and columns is known as Tabulation . It is designed to make presentation simpler and analysis easier. This type of presentation facilitates comparison by putting relevant information close to one another, and it helps in further statistical analysis and interpretation. One of the most important devices for presenting the data in a condensed and readily comprehensible form is tabulation. It aims to provide as much information as possible in the minimum possible space while maintaining the quality and usefulness of the data.

Tabular Presentation of Data

“Tabulation involves the orderly and systematic presentation of numerical data in a form designed to elucidate the problem under consideration.” – L.R. Connor

Objectives of Tabulation

The aim of tabulation is to summarise a large amount of numerical information into the simplest form. The following are the main objectives of tabulation:

  • To make complex data simpler: The main aim of tabulation is to present the classified data in a systematic way. The purpose is to condense the bulk of information (data) under investigation into a simple and meaningful form.
  • To save space: Tabulation tries to save space by condensing data in a meaningful form while maintaining the quality and quantity of the data.
  • To facilitate comparison: It also aims to facilitate quick comparison of various observations by providing the data in a tabular form.
  • To facilitate statistical analysis: Tabulation aims to facilitate statistical analysis because it is the stage between data classification and data presentation. Various statistical measures, including averages, dispersion, correlation, and others, are easily calculated from data that has been systematically tabulated.
  • To provide a reference: Since data may be easily identifiable and used when organised in tables with titles and table numbers, tabulation aims to provide a reference for future studies.

Features of a Good Table

Tabulation is a very specialised job. It requires a thorough knowledge of statistical methods, as well as abilities, experience, and common sense. A good table must have the following characteristics:

  • Title: The top of the table must have a title and it needs to be very appealing and attractive.
  • Manageable Size: The table shouldn’t be too big or too small. The size of the table should be in accordance with its objectives and the characteristics of the data. It should completely cover all significant characteristics of data.
  • Attractive: A table should have an appealing appearance that appeals to both the sight and the mind so that the reader can grasp it easily without any strain.
  • Special Emphasis: The data to be compared should be placed in the left-hand corner of columns, with their titles in bold letters.
  • Fit with the Objective: The table should reflect the objective of the statistical investigation.
  • Simplicity: To make the table easily understandable, it should be simple and compact.
  • Data Comparison: The data to be compared must be placed closely in the columns.
  • Numbered Columns and Rows: When there are several rows and columns in a table, they must be numbered for reference.
  • Clarity: A table should be prepared so that even a layman may make conclusions from it. The table should contain all necessary information and it must be self-explanatory.
  • Units: The unit designations should be written on the top of the table, below the title. For example, Height in cm, Weight in kg, Price in ₹, etc. However, if different items have different units, then they should be mentioned in the respective rows and columns.
  • Suitably Approximated: If the figures are large, then they should be rounded or approximated.
  • Scientifically Prepared: The preparation of the table should be done in a systematic and logical manner and should be free from any kind of ambiguity and overlapping. 

Components of a Table

A table’s preparation is an art that requires skilled data handling. It’s crucial to understand the components of a good statistical table before constructing one. A table is created when all of these components are put together in a systematic order. In simple terms, a good table should include the following components:

1. Table Number:

Each table needs to have a number so it may be quickly identified and used as a reference.

  • If there are many tables, they should be numbered in a logical order.
  • The table number can be given at the top of the table or the beginning of the table title.
  • The table is also identified by its location using subscripted numbers like 1.2, 2.1, etc. For instance, Table Number 3.1 should be seen as the first table of the third chapter.

Each table should have a suitable title. A table’s contents are briefly described in the title.

  • The title should be simple, self-explanatory, and free from ambiguity.
  • A title should be brief and presented clearly, usually below the table number.
  • In certain cases, a long title is preferable for clarification. In these cases, a ‘Catch Title’ may be placed above the ‘Main Title’. For instance , the table’s contents might come after the firm’s name, which appears as a catch title.
  • Contents of Title: The title should include the following information:  (i) Nature of data, or classification criteria (ii) Subject-matter (iii) Place to which the data relates  (iv) Time to which the data relates  (v) Source to which the data belongs  (vi) Reference to the data, if available.

3. Captions or Column Headings:

A column designation is given to explain the figures in the column at the top of each column in a table. This is referred to as a “Column heading” or “Caption”.

  • Captions are used to describe the names or heads of vertical columns.
  • To save space, captions are generally placed in small letters in the middle of the columns.

4. Stubs or Row Headings:

Each row of the table needs to have a heading, similar to a caption or column heading. The headers of horizontal rows are referred to as stubs. A brief description of the row headers may also be provided at the table’s left-hand top.

5. Body of Table:

The table’s most crucial component is its body, which contains data (numerical information).

  • The location of any one figure or data in the table is fixed and determined by the row and column of the table.
  • The columns and rows in the main body’s arrangement of numerical data are arranged from top to bottom.
  • The size and shape of the main body should be planned in accordance with the nature of the figures and the purpose of the study.
  • As the body of the table summarises the facts and conclusions of the statistical investigation, it must be ensured that the table does not have irrelevant information.

6. Unit of Measurement:

If the unit of measurement of the figures in the table (real data) does not change throughout the table, it should always be provided along with the title.

  • However, these units must be mentioned together with stubs or captions if rows or columns have different units.
  • If there are large figures, they should be rounded up and the rounding method should be stated.

7. Head Notes:

If the main title does not convey enough information, a head note is included in small brackets in prominent words right below the main title.

  • A head-note is included to convey any relevant information.
  • For instance, the table frequently uses the units of measurement “in million rupees,” “in tonnes,” “in kilometres,” etc. Head notes are also known as Prefatory Notes .

8. Source Note:

A source note refers to the place where information was obtained.

  • In the case of secondary data, a source note is provided.
  • Name of the book, page number, table number, etc., from which the data were collected should all be included in the source. If there are multiple sources, each one must be listed in the source note.
  • If a reader wants to refer to the original data, the source note enables him to locate the data. Usually, the source note appears at the bottom of the table. For example, the source note may be: ‘Census of India, 2011’.
  • Importance: A source note is useful for three reasons: -> It provides credit to the source (person or group), who collected the data; -> It provides a reference to source material that may be more complete; -> It offers some insight into the reliability of the information and its source.

9. Footnotes:

The footnote is the last part of the table. The unique characteristic of the data content of the table that is not self-explanatory and has not previously been explained is mentioned in the footnote.

  • Footnotes are used to provide additional information that is not provided by the heading, title, stubs, caption, etc.
  • When there are many footnotes, they are numbered in order.
  • Footnotes are identified by the symbols *, @, £, etc.
  • In general, footnotes are used for the following reasons: (i) To highlight any exceptions to the data (ii)Any special circumstances affecting the data; and (iii)To clarify any information in the data.

definition of tabular data presentation

Merits of Tabular Presentation of Data

The following are the merits of tabular presentation of data:

  • Brief and Simple Presentation: Tabular presentation is possibly the simplest method of data presentation. As a result, information is simple to understand. A significant amount of statistical data is also presented in a very brief manner.
  • Facilitates Comparison: By grouping the data into different classes, tabulation facilitates data comparison.
  • Simple Analysis: Analysing data from tables is quite simple. One can determine the data’s central tendency, dispersion, and correlation by organising the data as a table.
  • Highlights Characteristics of the Data:  Tabulation highlights characteristics of the data. As a result of this, it is simple to remember the statistical facts.
  • Cost-effective: Tabular presentation is a very cost-effective way to convey data. It saves time and space.
  • Provides Reference: As the data provided in a tabular presentation can be used for other studies and research, it acts as a source of reference.

Please Login to comment...

Similar reads.

  • Statistics for Economics
  • Commerce - 11th

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Data presentation: A comprehensive guide

Learn how to create data presentation effectively and communicate your insights in a way that is clear, concise, and engaging.

Raja Bothra

Building presentations

team preparing data presentation

Hey there, fellow data enthusiast!

Welcome to our comprehensive guide on data presentation.

Whether you're an experienced presenter or just starting, this guide will help you present your data like a pro. We'll dive deep into what data presentation is, why it's crucial, and how to master it. So, let's embark on this data-driven journey together.

What is data presentation?

Data presentation is the art of transforming raw data into a visual format that's easy to understand and interpret. It's like turning numbers and statistics into a captivating story that your audience can quickly grasp. When done right, data presentation can be a game-changer, enabling you to convey complex information effectively.

Why are data presentations important?

Imagine drowning in a sea of numbers and figures. That's how your audience might feel without proper data presentation. Here's why it's essential:

  • Clarity : Data presentations make complex information clear and concise.
  • Engagement : Visuals, such as charts and graphs, grab your audience's attention.
  • Comprehension : Visual data is easier to understand than long, numerical reports.
  • Decision-making : Well-presented data aids informed decision-making.
  • Impact : It leaves a lasting impression on your audience.

Types of data presentation:

Now, let's delve into the diverse array of data presentation methods, each with its own unique strengths and applications. We have three primary types of data presentation, and within these categories, numerous specific visualization techniques can be employed to effectively convey your data.

1. Textual presentation

Textual presentation harnesses the power of words and sentences to elucidate and contextualize your data. This method is commonly used to provide a narrative framework for the data, offering explanations, insights, and the broader implications of your findings. It serves as a foundation for a deeper understanding of the data's significance.

2. Tabular presentation

Tabular presentation employs tables to arrange and structure your data systematically. These tables are invaluable for comparing various data groups or illustrating how data evolves over time. They present information in a neat and organized format, facilitating straightforward comparisons and reference points.

3. Graphical presentation

Graphical presentation harnesses the visual impact of charts and graphs to breathe life into your data. Charts and graphs are powerful tools for spotlighting trends, patterns, and relationships hidden within the data. Let's explore some common graphical presentation methods:

  • Bar charts: They are ideal for comparing different categories of data. In this method, each category is represented by a distinct bar, and the height of the bar corresponds to the value it represents. Bar charts provide a clear and intuitive way to discern differences between categories.
  • Pie charts: It excel at illustrating the relative proportions of different data categories. Each category is depicted as a slice of the pie, with the size of each slice corresponding to the percentage of the total value it represents. Pie charts are particularly effective for showcasing the distribution of data.
  • Line graphs: They are the go-to choice when showcasing how data evolves over time. Each point on the line represents a specific value at a particular time period. This method enables viewers to track trends and fluctuations effortlessly, making it perfect for visualizing data with temporal dimensions.
  • Scatter plots: They are the tool of choice when exploring the relationship between two variables. In this method, each point on the plot represents a pair of values for the two variables in question. Scatter plots help identify correlations, outliers, and patterns within data pairs.

The selection of the most suitable data presentation method hinges on the specific dataset and the presentation's objectives. For instance, when comparing sales figures of different products, a bar chart shines in its simplicity and clarity. On the other hand, if your aim is to display how a product's sales have changed over time, a line graph provides the ideal visual narrative.

Additionally, it's crucial to factor in your audience's level of familiarity with data presentations. For a technical audience, more intricate visualization methods may be appropriate. However, when presenting to a general audience, opting for straightforward and easily understandable visuals is often the wisest choice.

In the world of data presentation, choosing the right method is akin to selecting the perfect brush for a masterpiece. Each tool has its place, and understanding when and how to use them is key to crafting compelling and insightful presentations. So, consider your data carefully, align your purpose, and paint a vivid picture that resonates with your audience.

What to include in data presentation?

When creating your data presentation, remember these key components:

  • Data points : Clearly state the data points you're presenting.
  • Comparison : Highlight comparisons and trends in your data.
  • Graphical methods : Choose the right chart or graph for your data.
  • Infographics : Use visuals like infographics to make information more digestible.
  • Numerical values : Include numerical values to support your visuals.
  • Qualitative information : Explain the significance of the data.
  • Source citation : Always cite your data sources.

How to structure an effective data presentation?

Creating a well-structured data presentation is not just important; it's the backbone of a successful presentation. Here's a step-by-step guide to help you craft a compelling and organized presentation that captivates your audience:

1. Know your audience

Understanding your audience is paramount. Consider their needs, interests, and existing knowledge about your topic. Tailor your presentation to their level of understanding, ensuring that it resonates with them on a personal level. Relevance is the key.

2. Have a clear message

Every effective data presentation should convey a clear and concise message. Determine what you want your audience to learn or take away from your presentation, and make sure your message is the guiding light throughout your presentation. Ensure that all your data points align with and support this central message.

3. Tell a compelling story

Human beings are naturally wired to remember stories. Incorporate storytelling techniques into your presentation to make your data more relatable and memorable. Your data can be the backbone of a captivating narrative, whether it's about a trend, a problem, or a solution. Take your audience on a journey through your data.

4. Leverage visuals

Visuals are a powerful tool in data presentation. They make complex information accessible and engaging. Utilize charts, graphs, and images to illustrate your points and enhance the visual appeal of your presentation. Visuals should not just be an accessory; they should be an integral part of your storytelling.

5. Be clear and concise

Avoid jargon or technical language that your audience may not comprehend. Use plain language and explain your data points clearly. Remember, clarity is king. Each piece of information should be easy for your audience to digest.

6. Practice your delivery

Practice makes perfect. Rehearse your presentation multiple times before the actual delivery. This will help you deliver it smoothly and confidently, reducing the chances of stumbling over your words or losing track of your message.

A basic structure for an effective data presentation

Armed with a comprehensive comprehension of how to construct a compelling data presentation, you can now utilize this fundamental template for guidance:

In the introduction, initiate your presentation by introducing both yourself and the topic at hand. Clearly articulate your main message or the fundamental concept you intend to communicate.

Moving on to the body of your presentation, organize your data in a coherent and easily understandable sequence. Employ visuals generously to elucidate your points and weave a narrative that enhances the overall story. Ensure that the arrangement of your data aligns with and reinforces your central message.

As you approach the conclusion, succinctly recapitulate your key points and emphasize your core message once more. Conclude by leaving your audience with a distinct and memorable takeaway, ensuring that your presentation has a lasting impact.

Additional tips for enhancing your data presentation

To take your data presentation to the next level, consider these additional tips:

  • Consistent design : Maintain a uniform design throughout your presentation. This not only enhances visual appeal but also aids in seamless comprehension.
  • High-quality visuals : Ensure that your visuals are of high quality, easy to read, and directly relevant to your topic.
  • Concise text : Avoid overwhelming your slides with excessive text. Focus on the most critical points, using visuals to support and elaborate.
  • Anticipate questions : Think ahead about the questions your audience might pose. Be prepared with well-thought-out answers to foster productive discussions.

By following these guidelines, you can structure an effective data presentation that not only informs but also engages and inspires your audience. Remember, a well-structured presentation is the bridge that connects your data to your audience's understanding and appreciation.

Do’s and don'ts on a data presentation

  • Use visuals : Incorporate charts and graphs to enhance understanding.
  • Keep it simple : Avoid clutter and complexity.
  • Highlight key points : Emphasize crucial data.
  • Engage the audience : Encourage questions and discussions.
  • Practice : Rehearse your presentation.

Don'ts:

  • Overload with data : Less is often more; don't overwhelm your audience.
  • Fit Unrelated data : Stay on topic; don't include irrelevant information.
  • Neglect the audience : Ensure your presentation suits your audience's level of expertise.
  • Read word-for-word : Avoid reading directly from slides.
  • Lose focus : Stick to your presentation's purpose.

Summarizing key takeaways

  • Definition : Data presentation is the art of visualizing complex data for better understanding.
  • Importance : Data presentations enhance clarity, engage the audience, aid decision-making, and leave a lasting impact.
  • Types : Textual, Tabular, and Graphical presentations offer various ways to present data.
  • Choosing methods : Select the right method based on data, audience, and purpose.
  • Components : Include data points, comparisons, visuals, infographics, numerical values, and source citations.
  • Structure : Know your audience, have a clear message, tell a compelling story, use visuals, be concise, and practice.
  • Do's and don'ts : Do use visuals, keep it simple, highlight key points, engage the audience, and practice. Don't overload with data, include unrelated information, neglect the audience's expertise, read word-for-word, or lose focus.

FAQ's on a data presentation

1. what is data presentation, and why is it important in 2024.

Data presentation is the process of visually representing data sets to convey information effectively to an audience. In an era where the amount of data generated is vast, visually presenting data using methods such as diagrams, graphs, and charts has become crucial. By simplifying complex data sets, presentation of the data may helps your audience quickly grasp much information without drowning in a sea of chart's, analytics, facts and figures.

2. What are some common methods of data presentation?

There are various methods of data presentation, including graphs and charts, histograms, and cumulative frequency polygons. Each method has its strengths and is often used depending on the type of data you're using and the message you want to convey. For instance, if you want to show data over time, try using a line graph. If you're presenting geographical data, consider to use a heat map.

3. How can I ensure that my data presentation is clear and readable?

To ensure that your data presentation is clear and readable, pay attention to the design and labeling of your charts. Don't forget to label the axes appropriately, as they are critical for understanding the values they represent. Don't fit all the information in one slide or in a single paragraph. Presentation software like Prezent and PowerPoint can help you simplify your vertical axis, charts and tables, making them much easier to understand.

4. What are some common mistakes presenters make when presenting data?

One common mistake is trying to fit too much data into a single chart, which can distort the information and confuse the audience. Another mistake is not considering the needs of the audience. Remember that your audience won't have the same level of familiarity with the data as you do, so it's essential to present the data effectively and respond to questions during a Q&A session.

5. How can I use data visualization to present important data effectively on platforms like LinkedIn?

When presenting data on platforms like LinkedIn, consider using eye-catching visuals like bar graphs or charts. Use concise captions and e.g., examples to highlight the single most important information in your data report. Visuals, such as graphs and tables, can help you stand out in the sea of textual content, making your data presentation more engaging and shareable among your LinkedIn connections.

Create your data presentation with prezent

Prezent can be a valuable tool for creating data presentations. Here's how Prezent can help you in this regard:

  • Time savings : Prezent saves up to 70% of presentation creation time, allowing you to focus on data analysis and insights.
  • On-brand consistency : Ensure 100% brand alignment with Prezent's brand-approved designs for professional-looking data presentations.
  • Effortless collaboration : Real-time sharing and collaboration features make it easy for teams to work together on data presentations.
  • Data storytelling : Choose from 50+ storylines to effectively communicate data insights and engage your audience.
  • Personalization : Create tailored data presentations that resonate with your audience's preferences, enhancing the impact of your data.

In summary, Prezent streamlines the process of creating data presentations by offering time-saving features, ensuring brand consistency, promoting collaboration, and providing tools for effective data storytelling. Whether you need to present data to clients, stakeholders, or within your organization, Prezent can significantly enhance your presentation-making process.

So, go ahead, present your data with confidence, and watch your audience be wowed by your expertise.

Thank you for joining us on this data-driven journey. Stay tuned for more insights, and remember, data presentation is your ticket to making numbers come alive! Sign up for our free trial or book a demo ! ‍

More zenpedia articles

definition of tabular data presentation

HR presentation: The all-in-one guide

definition of tabular data presentation

Strategy presentation: A comprehensive guide

definition of tabular data presentation

Mastering the art of persuasive presentations

Get the latest from Prezent community

Join thousands of subscribers who receive our best practices on communication, storytelling, presentation design, and more. New tips weekly. (No spam, we promise!)

websights

What is Tabular Data? (Definition & Example)

In statistics, tabular data refers to data that is organized in a table with rows and columns.

tabular data format

Within the table, the rows represent observations and the columns represent attributes for those observations.

For example, the following table represents tabular data:

example of tabular data

This dataset has 9 rows and 5 columns.

Each row represents one basketball player and the five columns describe different attributes about the player including:

  • Player name
  • Minutes played

The opposite of tabular data would be visual data , which would be some type of plot or chart that helps us visualize the values in a dataset.

For example, we might have the following bar chart that helps us visualize the total minutes played by each player in the dataset:

tabular data vs. visual data

This would be an example of visual data .

It contains the exact same information about player names and minutes played for the players in the dataset, but it’s simply displayed in a visual form instead of a tabular form.

Or we might have the following scatterplot that helps us visualize the relationship between minutes played and points scored for each player:

definition of tabular data presentation

This is another example of visual data .

When is Tabular Data Used in Practice?

In practice, tabular data is the most common type of data that you’ll run across in the real world.

In the real world, most data that is saved in an Excel spreadsheet is considered tabular data because the rows represent observations and the columns represent attributes for those observations.

For example, here’s what our basketball dataset from earlier might look like in an Excel spreadsheet:

definition of tabular data presentation

This format is one of the most natural ways to collect and store values in a dataset, which is why it’s used so often.

Additional Resources

The following tutorials explain other common terms in statistics:

Why is Statistics Important? Why is Sample Size Important in Statistics? What is an Observation in Statistics? What is Considered Raw Data in Statistics?

How to Write a Nested IFERROR Statement in Excel

How to use make.names function in r (with examples), related posts, how to normalize data between -1 and 1, vba: how to check if string contains another..., how to interpret f-values in a two-way anova, how to create a vector of ones in..., how to determine if a probability distribution is..., what is a symmetric histogram (definition & examples), how to find the mode of a histogram..., how to find quartiles in even and odd..., how to calculate sxy in statistics (with example), how to calculate sxx in statistics (with example).

10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23  Parsing
24  A First Look at Interpretation
25 
26 
27  A First Look at Types
28 
29 
30 
31  Structures and Variables
32  Interpretation and Types
33 
34 
35 
7.1 
7.2 
7.1
7.2
7.2.1
7.2.2
7.2.3
7.2.4
7.2.5
7.2.6
7.2.7 Wise Table Operations

7   Introduction to Tabular Data

     Creating Tabular Data

     Processing Rows

       Keeping

       Ordering

       Combining Keeping and Ordering

       Extending

       Transforming, Cleansing, and Normalizing

       Selecting

       Summary of Row-Wise Table Operations

An email inbox is a list of messages. For each message, your inbox stores a bunch of information: its sender, the subject line, the conversation it’s part of, the body, and quite a bit more.

definition of tabular data presentation

A music playlist. For each song, your music player maintains a bunch of information: its name, the singer, its length, its genre, and so on.

definition of tabular data presentation

A filesystem folder or directory. For each file, your filesystem records a name, a modification date, size, and other information.

definition of tabular data presentation

Do Now! Can you come up with more examples?

Responses to a party invitation.

A gradebook.

A calendar agenda.

They consists of rows and columns. For instance, each song or email message or file is a row. Each of their characteristics— the song title, the message subject, the filename— is a column.

Each row has the same columns as the other rows, in the same order.

A given column has the same type, but different columns can have different types. For instance, an email message has a sender’s name, which is a string; a subject line, which is a string; a sent date, which is a date; whether it’s been read, which is a Boolean; and so on.

The rows are usually in some particular order. For instance, the emails are ordered by which was most recently sent.

Exercise Find the characteristics of tabular data in the other examples described above, as well as in the ones you described.

We will now learn how to program with tables and to think about decomposing tasks involving them. You can also look up the full Pyret documentation for table operations .

7.1   Creating Tabular Data

table: name, age row: "Alice", 30 row: "Bob", 40 row: "Carol", 25 end

Exercise Change different parts of the above example— e.g., remove a necessary value from a row, add an extraneous one, remove a comma, add an extra comma, leave an extra comma at the end of a row— and see what errors you get.

check: table: name, age row: "Alice", 30 row: "Bob", 40 row: "Carol", 25 end is-not table: age, name row: 30, "Alice" row: 40, "Bob" row: 25, "Carol" end end

people = table: name, age row: "Alice", 30 row: "Bob", 40 row: "Carol", 25 end

create the sheet on your own,

create a sheet collaboratively with friends,

find data on the Web that you can import into a sheet,

create a Google Form that you get others to fill out, and obtain a sheet out of their responses

7.2   Processing Rows

Let’s now learn how we can actually process a table. Pyret offers a variety of built-in operations that make it quite easy to perform interesting computations over tables. In addition, as we will see later [ From Tables to Lists ], if we don’t find these sufficient, we can write our own. For now, we’ll focus on the operations Pyret provides.

Which emails were sent by a particular user?

Which songs were sung by a particular artist?

Which are the most frequently played songs in a playlist?

Which are the least frequently played songs in a playlist?

7.2.1   Keeping

email = table: sender, recipient, subject row: 'Matthias Felleisen', 'Pedro Diaz', 'Introduction' row: 'Joe Politz', 'Pedro Diaz', 'Class on Friday' row: 'Matthias Felleisen', 'Pedro Diaz', 'Book comments' row: 'Mia Minnes', 'Pedro Diaz', 'CSE8A Midterm' end

sieve email using sender: sender == 'Matthias Felleisen' end

sieve playlist using artist: (artist == 'Deep Purple') or (artist == 'Van Halen') end

Exercise Write a table for to use as playlist that works with the sieve expression above.
Exercise Write a sieve expression on the email table above that would result in a table with zero rows.

7.2.2   Ordering

order playlist: play-count ascending end

Note that what goes between the : and end is not an expression. Therefore, we cannot write arbitrary code here. We can only name columns and indicate which way they should be ordered.

7.2.3   Combining Keeping and Ordering

Of the emails from a particular person, which is the oldest?

Of the songs by a particular artist, which have we played the least often?

Do Now! Take a moment to think about how you would write these with what you have seen so far.

mf-emails = sieve email using sender: sender == 'Matthias Felleisen' end order mf-emails: sent-date ascending end

Exercise Write the second example as a composition of keep and order operations on a playlist table.

7.2.4   Extending

extend employees using hourly-wage, hours-worked: total-wage: hourly-wage * hours-worked end

ext-email = extend email using subject: subject-length: string-length(subject) end order ext-email: subject-length descending end

7.2.5   Transforming, Cleansing, and Normalizing

There are times when a table is “almost right”, but requires a little adjusting. For instance, we might have a table of customer requests for a free sample, and want to limit each customer to at most a certain number. We might get temperature readings from different countries in different formats, and want to convert them all to one single format. Because unit errors can be dangerous ! We might have a gradebook where different graders have used different levels of precision, and want to standardize all of them to have the same level of precision.

transform orders using count: count: num-min(count, 3) end

transform gradebook using total-grade: total-grade: num-round(total-grade) end

transform weather using temp, unit: temp: if unit == "F": fahrenheit-to-celsius(temp) else: temp end, unit: if unit == "F": "C" else: unit end end

Do Now! In this example, why do we also transform unit ?

7.2.6   Selecting

select name, total-grade from gradebook end

ss = select artist, song from playlist end order ss: artist ascending end

7.2.7   Summary of Row-Wise Table Operations

We’ve seen a lot in a short span. Specifically, we have seen several operations that consume a table and produce a new one according to some criterion. It’s worth summarizing the impact each of them has in terms of key table properties (where “-” means the entry is left unchanged):

Operation

  

Cell contents

  

Row order

  

Number of rows

  

Column order

  

Number of columns

Keeping

  

-

  

-

  

reduced

  

-

  

-

Ordering

  

-

  

changed

  

-

  

-

  

-

Extending

  

existing unchanged, new computed

  

-

  

-

  

-

  

augmented

Transforming

  

altered

  

-

  

-

  

-

  

-

Selecting

  

-

  

-

  

-

  

changed

  

reduced

The italicized entries reflect how the new table may differ from the old. Note that an entry like “reduced” or “altered” should be read as potentially reduced or altered; depending on the specific operation and the content of the table, there may be no change at all. (For instance, if a table is already sorted according to the criterion given in an order expression, the row order will not change.) However, in general one should expect the kind of change described in the above grid.

Observe that both dimensions of this grid provide interesting information. Unsurprisingly, each row has at least some kind of impact on a table (otherwise the operation would be useless and would not exist). Likewise, each column also has at least one way of impacting it. Furthermore, observe that most entries leave the table unchanged: that means each operation has limited impact on the table, careful to not overstep the bounds of its mandate.

On the one hand, the decision to limit the impact of each operation means that to achieve complex tasks, we may have to compose several operations together. We have already seen examples of this earlier this chapter. However, there is also a much more subtle consequence: it also means that to achieve complex tasks, we can compose several operations and get exactly what we want. If we had fewer operations that each did more, then composing them might have various undesired or (worse) unintended consequences, making it very difficult for us to obtain exactly the answer we want. Instead, the operations above follow the principle of orthogonality : no operation shadows what any other operation does, so they can be composed freely.

As a result of having these operations, we can think of tables also algebrically. Concretely, when given a problem, we should again begin with concrete examples of what we’re starting with and where we want to end. Then we can ask ourselves questions like, “Does the number of columns stay the same, grow, or shrink?”, “Does the number of rows stay the same or shrink?”, and so on. The grid above now provides us a toolkit by which we can start to decompose the task into individual operations. Of course, we still have to think: the order of operations matters, and sometimes we have to perform an operation mutiple times. Still, this grid is a useful guide to hint us towards the operations that might help solve our problem.

Reset password New user? Sign up

Existing user? Log in

Data Presentation - Tables

Already have an account? Log in here.

Tables are a useful way to organize information using rows and columns. Tables are a versatile organization tool and can be used to communicate information on their own, or they can be used to accompany another data representation type (like a graph). Tables support a variety of parameters and can be used to keep track of frequencies, variable associations, and more.

For example, given below are the weights of 20 students in grade 10: \[50, 45, 48, 39, 40, 48, 54, 50, 48, 48, \\ 50, 39, 41, 46, 44, 43, 54, 57, 60, 45.\]

To find the frequency of \(48\) in this data, count the number of times that \(48\) appears in the list. There are \(4\) students that have this weight.

The list above has information about the weight of \(20\) students, and since the data has been arranged haphazardly, it is difficult to classify the students properly.

To make the information more clear, tabulate the given data.

\[\begin{array} \\ \text{Weights in kg} & & & \text{Frequency} \\ 39 & & & 2 \\ 40 & & & 1 \\ 41 & & & 1 \\ 43 & & & 1 \\ 44 & & & 1 \\ 45 & & & 2 \\ 46 & & & 1 \\ 48 & & & 4 \\ 50 & & & 3 \\ 54 & & & 2 \\ 57 & & & 1 \\ 60 & & & 1 \end{array}\]

This table makes the data more easy to understand.

Making a Table

Making and using tables.

To make a table, first decide how many rows and columns are needed to clearly display the data. To do this, consider how many variables are included in the data set.

The following is an example of a table where there are two variables.

Jennifer15
Alex13
Paul38
Laura9

The following is an example of a table with three variables.

Jennifer15Pizza
Alex13Bananas
Paul38Steak
Laura9Watermelon

A table is good for organizing quantitative data in a way that it is easy to look things up. For example, a table would be good way to associate a person’s name, age, and favorite food. However, when trying to communicate relations, such as how a person’s favorite food changes over time, a graph would be a better choice.

Using the table below, determine the average age of the group?

NameAge (in years)
Robert15
Jane25
Steven23
Scott36
Lucy6
Good practices for making tables Label what each row or column represents Include units in labels when data is numerical Format data consistently (use consistent units and formatting)
What is wrong with this table? Flavor of Ice Cream Number Sold (cones) Chocolate 104 Vanilla two-hundred Strawberry 143 Coconut thirty Mango 126 Show answer Answer: The data isn’t consistently formatted. The number of cones sold is written in numbers in both symbols and words. It would be easier to understand if all entries were numerical symbols.
What is wrong with this table? Jack blue Sarah yellow Billy green Ron red Christina blue Margret purple Show answer Answer: There are no labels on the columns. It is not clear what the table is displaying — does the table show what color shirt each person is wearing? Do it show what each person's favorite color is? It isn't clear because labels are missing.

Many word processing softwares include tools for making tables. You can easily make tables in Microsoft Word and Excel and in Google Docs and Sheets.

Here is an example table (left blank) with which you could record information about a person's age, weight, and height.

Tables are used to present information in all types of fields. Geologists might make a table to record data about types of rocks they find while doing field work, political researchers might create a table to record information about potential voters, and physicists might make a table to record observations about the speed of a ball rolled on various surfaces.

Problem Loading...

Note Loading...

Set Loading...

A Guide to Effective Data Presentation

Key objectives of data presentation, charts and graphs for great visuals, storytelling with data, visuals, and text, audiences and data presentation, the main idea in data presentation, storyboarding and data presentation, additional resources, data presentation.

Tools for effective data presentation

Financial analysts are required to present their findings in a neat, clear, and straightforward manner. They spend most of their time working with spreadsheets in MS Excel, building financial models , and crunching numbers. These models and calculations can be pretty extensive and complex and may only be understood by the analyst who created them. Effective data presentation skills are critical for being a world-class financial analyst .

Data Presentation

It is the analyst’s job to effectively communicate the output to the target audience, such as the management team or a company’s external investors. This requires focusing on the main points, facts, insights, and recommendations that will prompt the necessary action from the audience.

One challenge is making intricate and elaborate work easy to comprehend through great visuals and dashboards. For example, tables, graphs, and charts are tools that an analyst can use to their advantage to give deeper meaning to a company’s financial information. These tools organize relevant numbers that are rather dull and give life and story to them.

Here are some key objectives to think about when presenting financial analysis:

  • Visual communication
  • Audience and context
  • Charts, graphs, and images
  • Focus on important points
  • Design principles
  • Storytelling
  • Persuasiveness

For a breakdown of these objectives, check out Excel Dashboards & Data Visualization course to help you become a world-class financial analyst.

Charts and graphs make any financial analysis readable, easy to follow, and provide great data presentation. They are often included in the financial model’s output, which is essential for the key decision-makers in a company.

The decision-makers comprise executives and managers who usually won’t have enough time to synthesize and interpret data on their own to make sound business decisions. Therefore, it is the job of the analyst to enhance the decision-making process and help guide the executives and managers to create value for the company.

When an analyst uses charts, it is necessary to be aware of what good charts and bad charts look like and how to avoid the latter when telling a story with data.

Examples of Good Charts

As for great visuals, you can quickly see what’s going on with the data presentation, saving you time for deciphering their actual meaning. More importantly, great visuals facilitate business decision-making because their goal is to provide persuasive, clear, and unambiguous numeric communication.

For reference, take a look at the example below that shows a dashboard, which includes a gauge chart for growth rates, a bar chart for the number of orders, an area chart for company revenues, and a line chart for EBITDA margins.

To learn the step-by-step process of creating these essential tools in MS Excel, watch our video course titled “ Excel Dashboard & Data Visualization .”  Aside from what is given in the example below, our course will also teach how you can use other tables and charts to make your financial analysis stand out professionally.

Financial Dashboard Screenshot

Learn how to build the graph above in our Dashboards Course !

Example of Poorly Crafted Charts

A bad chart, as seen below, will give the reader a difficult time to find the main takeaway of a report or presentation, because it contains too many colors, labels, and legends, and thus, will often look too busy. It also doesn’t help much if a chart, such as a pie chart, is displayed in 3D, as it skews the size and perceived value of the underlying data. A bad chart will be hard to follow and understand.

bad data presentation

Aside from understanding the meaning of the numbers, a financial analyst must learn to combine numbers and language to craft an effective story. Relying only on data for a presentation may leave your audience finding it difficult to read, interpret, and analyze your data. You must do the work for them, and a good story will be easier to follow. It will help you arrive at the main points faster, rather than just solely presenting your report or live presentation with numbers.

The data can be in the form of revenues, expenses, profits, and cash flow. Simply adding notes, comments, and opinions to each line item will add an extra layer of insight, angle, and a new perspective to the report.

Furthermore, by combining data, visuals, and text, your audience will get a clear understanding of the current situation,  past events, and possible conclusions and recommendations that can be made for the future.

The simple diagram below shows the different categories of your audience.

audience presentation

  This chart is taken from our course on how to present data .

Internal Audience

An internal audience can either be the executives of the company or any employee who works in that company. For executives, the purpose of communicating a data-filled presentation is to give an update about a certain business activity such as a project or an initiative.

Another important purpose is to facilitate decision-making on managing the company’s operations, growing its core business, acquiring new markets and customers, investing in R&D, and other considerations. Knowing the relevant data and information beforehand will guide the decision-makers in making the right choices that will best position the company toward more success.

External Audience

An external audience can either be the company’s existing clients, where there are projects in progress, or new clients that the company wants to build a relationship with and win new business from. The other external audience is the general public, such as the company’s external shareholders and prospective investors of the company.

When it comes to winning new business, the analyst’s presentation will be more promotional and sales-oriented, whereas a project update will contain more specific information for the client, usually with lots of industry jargon.

Audiences for Live and Emailed Presentation

A live presentation contains more visuals and storytelling to connect more with the audience. It must be more precise and should get to the point faster and avoid long-winded speech or text because of limited time.

In contrast, an emailed presentation is expected to be read, so it will include more text. Just like a document or a book, it will include more detailed information, because its context will not be explained with a voice-over as in a live presentation.

When it comes to details, acronyms, and jargon in the presentation, these things depend on whether your audience are experts or not.

Every great presentation requires a clear “main idea”. It is the core purpose of the presentation and should be addressed clearly. Its significance should be highlighted and should cause the targeted audience to take some action on the matter.

An example of a serious and profound idea is given below.

the main idea

To communicate this big idea, we have to come up with appropriate and effective visual displays to show both the good and bad things surrounding the idea. It should put emphasis and attention on the most important part, which is the critical cash balance and capital investment situation for next year. This is an important component of data presentation.

The storyboarding below is how an analyst would build the presentation based on the big idea. Once the issue or the main idea has been introduced, it will be followed by a demonstration of the positive aspects of the company’s performance, as well as the negative aspects, which are more important and will likely require more attention.

Various ideas will then be suggested to solve the negative issues. However, before choosing the best option, a comparison of the different outcomes of the suggested ideas will be performed. Finally, a recommendation will be made that centers around the optimal choice to address the imminent problem highlighted in the big idea.

storyboarding

This storyboard is taken from our course on how to present data .

To get to the final point (recommendation), a great deal of analysis has been performed, which includes the charts and graphs discussed earlier, to make the whole presentation easy to follow, convincing, and compelling for your audience.

CFI offers the Business Intelligence & Data Analyst (BIDA)® certification program for those looking to take their careers to the next level. To keep learning and developing your knowledge base, please explore the additional relevant resources below:

  • Investment Banking Pitch Books
  • Excel Dashboards
  • Financial Modeling Guide
  • Startup Pitch Book
  • See all business intelligence resources
  • Share this article

Excel Fundamentals - Formulas for Finance

Create a free account to unlock this Template

Access and download collection of free Templates to help power your productivity and performance.

Already have an account? Log in

Supercharge your skills with Premium Templates

Take your learning and productivity to the next level with our Premium Templates.

Upgrading to a paid membership gives you access to our extensive collection of plug-and-play Templates designed to power your performance—as well as CFI's full course catalog and accredited Certification Programs.

Already have a Self-Study or Full-Immersion membership? Log in

Access Exclusive Templates

Gain unlimited access to more than 250 productivity Templates, CFI's full course catalog and accredited Certification Programs, hundreds of resources, expert reviews and support, the chance to work with real-world finance and research tools, and more.

Already have a Full-Immersion membership? Log in

Probability, Statistics, and Data

Chapter 10 tabular data.

Tabular data is data on entities that has been aggregated in some way. A typical example would be to count the number of successes and failures in an experiment, and to report those aggregate numbers rather than the outcomes of the individual trials. Another way that tabular data arises is via binning, where we count the number of outcomes of an experiment that fall in certain groups, and report those numbers.

Inference on categorical variables has traditionally been performed by approximating counts with continuous variables and performing parametric methods such as the \(z\) -tests of proportions and the \(\chi^2\) -tests. With modern computing power, it is possible to calculate the probability of each experimental outcome exactly, leading to exact methods that do not rely on continuous approximation. These include the binomial test and the multinomial test. A third approach is to use Monte Carlo methods , where the computer performs simulations to estimate the probability of events under the null hypothesis.

10.1 Tables and plots

For categorical (factor) variables, the most basic information of interest is the count of observations in each value of the variable. Often, the data is better presented as proportions, which are the count divided by the total number of observations. For visual display, categorical variables are naturally shown as barplots or pie charts.

In this section, we demonstrate with two data sets. The first is fosdata::wrist , from a study of wrist fractures that recorded the fracture side and handedness of 104 elderly patients. The wrist data was used by Raittio et al. 76 to evaluate the effectiveness of two types of casts for treating a common type of wrist fracture. The second is fosdata::snails which records features of snail shells collected in England. The snails data was collected in 1950 by Cain and Sheppard 77 as an investigation into natural selection. They explored the relationship between the appearance of snails and the environment in which snails live.

Let’s begin with the wrist data set. Each row in the wrist data is an individual patient. Here we only pay attention to two variables, both coded as 1 for “right” and 2 for “left”:

For ease of interpretation, let’s change the variables into factors, which is really what they are.

The built-in command table can count the number of rows that take each value.

This table shows that there were 97 right-handed patients and 7 left-handed patients. The proportions 78 function converts the table of counts to a table of proportions:

So only 6.7% of patients in this study were left-handed. Does that sound reasonable for a random sample? We will investigate this question in the next section.

Passing two variables to table will produce a matrix of counts for each pair of values, but a better tool for the job is the xtabs function. The xtabs function builds a table, called a contingency table or cross table . The first argument to xtabs is a formula, with the factor variables to be tabulated on the right of the ~ (tilde).

One could ask if people are more likely to fracture their wrist on their non-dominant side, since more right-handed patients fractured their left hand (56) than their right hand (41).

Categorical data is often given as counts, rather than individual observations in rows. The snails data gives a count for each combination of Location, Color, and Banding. It does not have a row for each individual snail.

To make a table of Color vs. Banding for snails, use xtabs and give the Count for each group on the left side of the formula:

Frequently when creating tables of this type, we will want to know the row and column sums as well. These are generated by the function addmargins .

Other times, we are interested in the proportions that are in each cell. The proportions function could convert these counts to overall proportions, but more interesting here is to ask what the color distribution was for each type of banding. This is called a marginal distribution , and proportions will compute it with the margin option:

The sum of proportions is 1 across each row. We see that 38% of unbanded (X0000) snails were brown, but only 2% of five-banded (X12345) snails were brown. The comparison of different banding types is easier to see with a plot. Tables produced by xtabs are not tidy, and therefore not suitable for sending to ggplot. Converting the table to a data frame with as.data.frame works, but instead we compute the counts with dplyr:

Snail color and banding.

Figure 10.1: Snail color and banding.

A common approach to visualizing categorical variables is with a pie chart. Pie charts are out of favor among data scientists because colors are hard to distinguish and the sizes of wedges are difficult to compare visually. In fact, ggplot2 does not include a built-in pie chart geometry. Instead, one applies polar coordinates to a barplot. Here is an example showing the proportions of snails found in each habitat:

definition of tabular data presentation

Can you tell whether there were more snails in the Hedgerows or in the Mixed Deciduous Wood? If you have colorblindness or happen to be reading a black and white copy of this text, you probably cannot even tell which wedge is which.

10.2 Inference on a proportion

The simplest setting for categorical data is that of a single binomial random variable. Here \(X \sim \text{Binom}(n,p)\) is a count of successes on \(n\) independent identical trials with probability of success \(p\) . A typical experiment would fix a value of \(n\) , perform \(n\) Bernoulli trials, and produce a value of \(X\) . From this single value of \(X\) , we are interested in learning about the unknown population parameter \(p\) . For example, we may want to test whether the true proportion of times a die shows a 6 when rolled is actually 1/6. We might choose to toss the die \(n = 1000\) times and count the number of times \(X\) that a 6 occurs.

Polling is an important application. Before an election, a polling organization will sample likely voters and ask them whether they prefer a particular candidate. The results of the poll should give an estimate for the true proportion of voters \(p\) who prefer that candidate. The case of a voter poll is not formally a Bernoulli trial unless you allow the possibility of asking the same person twice; however, if the population is large then polling approximates a Bernoulli trial well enough to use these methods.

If \(X\) is the number of successes on \(n\) trials, the point estimate for the true proportion \(p\) is given by \[ \hat p = \frac{X}{n} \]

Recall that \(E[\hat{p}] = \frac 1nE[X] = \frac{1}{n}np = p\) , so \(\hat{p}\) is an unbiased estimate of \(p\) . The standard deviation \(\sigma(\hat{p})\) is \(\sqrt{p(1-p)/n}\) , so that a larger sample size \(n\) will lead to less variation in \(\hat{p}\) and therefore a better estimate of \(p\) .

Our goal is to use the sample statistic \(\hat{p}\) to calculate confidence intervals and perform hypothesis testing with regards to \(p\) .

This section introduces one sample tests of proportions . Here, we present the theory associated with performing exact binomial hypothesis tests using binom.test , as well as prop.test , which uses the normal approximation.

A one sample test of proportions requires a hypothesized value of \(p_0\) . Often \(p_0 = 0.5\) , meaning we expect success and failure to be equally likely outcomes of the Bernoulli trial. Or, \(p_0\) may come from historic values or a known larger population. The hypotheses are:

\[ H_0: p = p_0; \qquad H_a: p \not= p_0\]

You run \(n\) trials and obtain \(x\) successes, so your estimate for \(p\) is given by \(\hat p = x/n\) . (We are thinking of \(x\) as data rather than as a random variable.) Presumably, \(\hat p\) is not exactly equal to \(p_0\) , and you wish to determine the probability of obtaining an estimate that unlikely or more unlikely, assuming \(H_0\) is true.

10.2.1 Exact binomial test

Our first approach is the binomial test , which is an exact test in that it calculates a \(p\) -value using probabilities coming from the binomial distribution.

For the \(p\) -value, we are going to add the probabilities of all outcomes that are no more likely than the outcome that was obtained, since if we are going to reject when we obtain \(x\) successes, we would also reject if we obtain a number of successes that was even less likely to occur. Formally, the \(p\) -value for the exact binomial test is given by:

\[ \sum_{y:\ P(X = y) \leq P(X = x)} P(X=y)\]

Consider the wrist data. Approximately 10.6% of the world’s population is left-handed 79 . Is this sample of elderly Finns consistent with the proportion of left-handers in the world? In this binomial random variable, we choose left-handedness as success. Then \(p\) is the true proportion of elderly Finns who are left-handed and \(p_0 = 0.106\) . Our hypotheses are:

\[ H_0: p = 0.106; \qquad H_a: p \not= 0.106 \]

The sample contains 104 observations and has 7 left-handed patients, giving \(\hat{p} = 7/104 \approx 0.067\) , which is lower than \(p_0\) . The probability of getting exactly 7 successes under \(H_0\) is dbinom(7, 104, 0.106) , or 0.061. Anything less than 7 successes is less likely under the null hypothesis, so we would add all of those to get part of the \(p\) -value. To determine which values we add for successes greater than 7, we look for all outcomes that have probability of occurring (under the null hypothesis) less than 0.061. That is all outcomes 15 through 104, since \(X = 14\) is more likely than \(X = 7\) ( dbinom(14, 104, 0.106) = 0.075 > 0.061) while \(X = 15\) is less likely than \(X = 7\) ( dbinom(15, 104, 0.106) = 0.053 < 0.061).

The calculation is illustrated in Figure 10.2 , where the dashed red line indicates the probability of observing exactly 7 successes. We sum all of the probabilities that are at or below the dashed red line.

The pmf for $X \sim \text{Binom}(104, 0.106)$, with a line at $P(X = 7)$. $X$ values past 25 are negligible and not shown.

Figure 10.2: The pmf for \(X \sim \text{Binom}(104, 0.106)\) , with a line at \(P(X = 7)\) . \(X\) values past 25 are negligible and not shown.

The \(p\) -value is \[ P(X \le 7) + P(X \ge 15) \] where \(X\sim \text{Binom}(n = 104, p = 0.106)\) .

R will make these computations for us, naturally, in the following way.

With a \(p\) -value of 0.26, we fail to reject the null hypothesis. There is not sufficient evidence to conclude that elderly Finns have a different proportion of left-handers than the world’s proportion of lefties.

The binom.test function also produces the 95% confidence interval for \(p\) . In this example, we are 95% confident that the true proportion of left-handed elderly Finns is in the interval \([0.027, 0.134]\) . Since \(0.106\) lies in the 95% confidence interval, we failed to reject the null hypothesis at the \(\alpha = 0.05\) level.

10.2.2 One sample test of proportions

When \(n\) is large and \(p\) isn’t too close to 0 or 1, binomial random variables with \(n\) trials and probability of success \(p\) are well approximated by normal random variables with mean \(np\) and standard deviation \(\sqrt{np(1 - p)}\) . This can be used to get approximate \(p\) -values associated with the hypothesis test \(H_0: p = p_0\) versus \(H_a: p\not= p_0\) .

As before, we need to compute the probability under \(H_0\) of obtaining an outcome that is as likely or less likely than obtaining \(x\) successes. However, in this case we are using the normal approximation, which is symmetric about its mean. The \(p\) -value is twice the area of the tail outside of \(x\) .

The prop.test function performs this calculation, and has identical syntax to binom.test .

We return to the wrist example, testing \(H_0: p = 0.106\) versus \(H_a: p\not= 0.106\) . Let \(X\) be a binomial random variable with \(n = 104\) and \(p = 0.106\) . \(X\) is approximated by a normal variable \(Y\) with \[\begin{align} \mu(Y) &= np = 104 \cdot 0.106 = 11.024\\ \sigma(Y) &= \sqrt{np(1 - p)} = \sqrt{104 \cdot 0.106 \cdot 0.894} = 3.13934. \end{align}\]

Binomial rv with normal approximation overlaid.

Figure 10.3: Binomial rv with normal approximation overlaid.

Figure 10.3 is a plot of the pmf of \(X\) with its normal approximation \(Y\) overlaid. The shaded area corresponds to \(Y \leq 7\) . The \(p\) -value is twice that area, which we compute with pnorm :

For a better approximation, we perform a continuity correction (see also Example 4.26 ). The basic idea is that \(Y\) is a continuous rv, so when \(Y = 7.3\) , for example, we need to decide what integer value that should be associated with. Rounding suggests that \(Y=7.3\) should correspond to \(X=7\) and be included in the shaded area. The continuity correction includes values from 7 to 7.5 in the \(p\) -value, resulting in a corrected \(p\) -value of:

The continuity correction gives a more accurate approximation to the underlying binomial rv, but not necessarily a closer approximation to the exact binomial test.

The built-in R function for the one sample test of proportions is prop.test :

The prop.test function performs continuity correction by default. The \(p\) -value here is almost identical to the result of binom.test , and as before we fail to reject \(H_0\) . The confidence interval produced is also quite similar to the exact binomial test.

Look at the full output of prop.test(x = 7, n = 104, p = 0.106) , and observe that the test statistic is given as a \(\chi^2\) random variable with 1 degree of freedom. Confirm that the test statistic is \(c = \left((\tilde x - np_0)/\sqrt{np_0(1 - p_0)}\right)^2\) , where \(\tilde x\) is the number of successes after a continuity correction (in this case, \(\tilde x = 7.5\) ).

Use pchisq(c, 1, lower.tail = FALSE) to recompute the \(p\) -value using this test statistic. You should get the same answer \(p=\) 0.2616377.

The Economist/YouGov Poll leading up to the 2016 presidential election sampled 3669 likely voters and found that 1798 intended to vote for Clinton. Assuming that this is a random sample from all likely voters, find a 99% confidence interval for \(p\) , the true proportion of likely voters who intended to vote for Clinton at the time of the poll.

We are 99% confident that the true proportion of likely Clinton voters was between .465 and .507. In fact, 48.2% of voters did vote for Clinton, and the true value does fall in the 99% confidence interval range.

Most polls do not report a confidence interval. Typically, they report the point estimator \(\hat{p}\) and the margin of error , which is half the width of the 95% confidence interval. For this poll, \(\hat{p} \approx 0.486\) and the 95% confidence interval is \([0.470, 0.502]\) so the pollsters would report that they found 48.6% in favor of Clinton with a margin of error of 1.6%.

10.3 \(\chi^2\) tests

The \(\chi^2\) test is a general approach to testing the hypothesis that tabular data follows a given distribution. It relies on the Central Limit Theorem, in that the various counts in the tabular data are assumed to be approximately normally distributed.

The setting for \(\chi^2\) testing requires tabular data. For each cell in the table, the count of observations that fall in that cell is a random variable. We denote the observed counts in the \(k\) cells by \(X_1, \dotsc, X_k\) . The null hypothesis requires an expected count for each cell, \(E[X_i]\) . The test statistic is the \(\chi^2\) statistic.

If \(X_1, \dotsc, X_k\) are the observed counts of cells in tabular data, then the \(\chi^2\) statistic is:

\[ \chi^2 = \sum_{i=1}^k \frac{(X_i - E[X_i])^2}{E[X_i]} \]

The \(\chi^2\) statistic is always positive, and will be larger when the observed values \(X_i\) are far from the expected values \(E[X_i]\) . In all cases we consider, the \(\chi^2\) statistic will have approximately the \(\chi^2\) distribution with \(d\) degrees of freedom, for some \(d < k\) . The \(p\) -value for the test is the probability of a \(\chi^2\) value as large or larger than the observed \(\chi^2\) . The R function chisq.test computes \(\chi^2\) and the corresponding \(p\) -value.

The \(\chi^2\) test is always a one-tailed test. For example, if we observe \(\chi^2 = 10\) and have four degrees of freedom, the \(p\) -value corresponds to the shaded area in Figure 10.4 .

$\chi^2$ distribution with $p$-value shaded.

Figure 10.4: \(\chi^2\) distribution with \(p\) -value shaded.

The full theory behind the \(\chi^2\) test is beyond the scope of this book, but in the remainder of this section we give some motivation for the formula for \(\chi^2\) and the meaning of degrees of freedom. A reader less interested in theory could proceed to Section 10.3.1 .

Consider the value in one particular cell of tabular data. For each of the \(n\) observations in the sample, the observation either lies in the cell or it does not, hence the count in that one cell can be considered as a binomial rv \(X_i\) . Let \(p_i\) be the probability a random observation is in that cell. Then \(E[X_i] = np_i\) and \(\sigma(X_i) = \sqrt{np_i(1-p_i)}\) . If \(np_i\) is sufficiently large (at least 5, say) then \(X_i\) is approximately normal and \[ \frac{X_i - np_i}{\sqrt{np_i(1-p_i)}} \sim Z_i, \] where \(Z_i\) is a standard normal variable. Squaring both sides and multiplying by \((1-p_i)\) we have \[ (1-p_i)Z_i^2 \sim \frac{(X_i - np_i)^2}{np_i} = \frac{(X_i - E[X_i])^2}{E[X_i]} \]

As long as all cell counts are large enough, the \(\chi^2\) statistic is approximately \[ \chi^2 = \sum_{i=1}^k (1-p_i)Z_i^2 \] In this expression, the \(Z_i\) are standard normal but not independent random variables. In many circumstances, one can rewrite these \(k\) variables in terms of a smaller number \(d\) of independent standard normal rvs and find that the \(\chi^2\) statistic does have a \(\chi^2\) distribution with \(d\) degrees of freedom. The details of this process require some advanced linear algebra and making precise what we mean when we say \(X_i\) are approximately normal. The details of the dependence are not hard to work out in the simplest case when the table has two cells.

Consider a table with two cells, with \(n\) observations, cell probabilities \(p_1\) , \(p_2\) , and cell counts given by the random variables \(X_1\) and \(X_2\) :

\[ \begin{array}{|c|c|} \hline X_1 & X_2 \\ \hline \end{array} \]

This is simply a single binomial rv in disguise, with \(X_1\) the count of successes and \(X_2\) the count of failures. In particular, \(p_1 + p_2 = 1\) and \(X_1 + X_2 = n\) . Notice that

\[ \frac{X_1 - np_1}{\sqrt{np_1(1-p_1)}} + \frac{X_2 - np_2}{\sqrt{np_2(1-p_2)}} = \frac{X_1 + X_2 - n(p_1 + p_2)}{\sqrt{np_1p_2}} = \frac{n - n(1)}{\sqrt{np_1p_2}} = 0. \]

So the two variables \(Z_1\) and \(Z_2\) are not independent, and satisfy the equation \(Z_1 + Z_2 = 0\) . Then both can be written in terms of a single rv \(Z\) with \(Z_1 = Z\) and \(Z_2 = -Z\) . As long as \(X_1\) and \(X_2\) are both large, \(Z_i\) will be approximately standard normal, and \[ \chi^2 = (1-p_1)Z_1^2 + (1-p_2)Z_2^2 = (1- p_1 + 1 - p_2)Z^2 = Z^2. \] We see that \(\chi^2\) has the \(\chi^2\) distribution with one df.

The table in this example has two entries, giving two possible counts \(X_1\) and \(X_2\) . The constraint that these counts must sum to \(n\) leaves only the single degree of freedom to choose \(X_1\) .

10.3.1 \(\chi^2\) test for given probabilities

In this section, we consider data coming from a single categorical variable, typically displayed in a \(1 \times k\) table:

\[ \begin{array}{|c|c|c|c|} \hline X_1 & X_2 & \quad\dotsb\quad & X_k \\ \hline \end{array} \]

For our null hypothesis, we take some vector of probabilities \(p_1, \dotsc, p_k\) as given. Because there are \(k\) cells to fill and a single constraint (the cells sum to \(n\) ), the \(\chi^2\) statistic will have \(k-1\) df.

The most common case is when we assume all cells are equally likely, in which case this approach is called the \(\chi^2\) test for uniformity .

Doyle, Bottomley, and Angell 80 investigated the “Relative Age Effect,” the disadvantage of being the youngest in a cohort relative to being the oldest. The difference in outcomes can persist for years beyond any difference in actual ability relative to age difference.

In this study, the authors found the number of boys enrolled in the British elite soccer academies for under 18 years of age. They binned the boys into three age groups: oldest, middle, and youngest, with approximately 1/3 of all British children in each group. The number of boys in the elite soccer academies was:

\[ \begin{array}{|c|c|c|} \hline \text{Oldest} & \text{Middle} & \text{Youngest} \\ \hline 631 & 321 & 155 \\ \hline \end{array} \]

The null hypothesis is that the boys should be equally distributed among the three groups, or equivalently \(H_0: p_i = \frac{1}{3}\) . There are a total of 1107 boys in this study. Under the null hypothesis, we expect \(369 = 1107/3\) in each group. Then \[ \chi^2 = \frac{(631-369)^2}{369} + \frac{(321-369)^2}{369} + \frac{(155-369)^2}{369} \approx 316.38. \] The test statistic \(\chi^2\) has the \(\chi^2\) distribution with \(2 = 3-1\) degrees of freedom, and a quick glance at that distribution shows that our observed 316.38 is impossibly unlikely to occur by chance. The \(p\) -value is essentially 0, and we reject the null hypothesis. Boys’ ages in elite British soccer academies are not uniformly distributed across the three age bands for a given year.

In R, the computation is done with chisq.test :

Benford’s Law is used in forensic accounting to detect falsified or manufactured data. When data, such as financial or economic data, occurs over several orders of magnitude, the first digits of the values follow the distribution

\[ P(\text{first digit is}~d) = \log_{10}(1 + 1/d) \]

The data fosdata::rio_instagram has the number of Instagram followers for gold medal winners at the 2016 Olympics. First, we extract the first digits of each athlete’s number of followers:

Let’s visually compare the counts of observed first digits (as bars) to the expected counts from Benford’s Law (red dots):

definition of tabular data presentation

Is the observed data consistent with Benford’s Law?

The observed value of \(\chi^2\) is 9.876, from a \(\chi^2\) distribution with \(8 = 9 - 1\) degrees of freedom. This is not extraordinary. The \(p\) -value is 0.2738 and we fail to reject \(H_0\) . The data is consistent with Benford’s Law.

10.4 \(\chi^2\) goodness of fit

In this section, we consider tabular data that is hypothesized to follow a parametric model. When the parameters of the model are estimated from the observed data, the model fits the data better than it should. Each estimated parameter reduces the degrees of freedom in the \(\chi^2\) distribution by one.

When testing goodness of fit, the \(\chi^2\) statistic is approximately \(\chi^2\) with degrees of freedom given by the following:

\[ \text{degrees of freedom} = \text{bins} - 1 - \text{parameters estimated from the data}. \]

We will explore this claim through simulation in Section 10.4.1 .

Goals in a soccer game arrive at random moments and could be reasonably modeled by a Poisson process. If so, the total number of goals scored in a soccer game should be a Poisson rv.

The data set world_cup from fosdata contains the results of the 2014 and 2015 FIFA World Cup soccer finals. Let’s get the number of goals scored by each team in each game of the 2015 finals:

We want to perform a hypothesis test to determine whether a Poisson model is a good fit for the distribution of goals scored. The Poisson distribution has one parameter, the rate \(\lambda\) . The expected value of a Poisson rv is \(\lambda\) , so we estimate \(\lambda\) from the data:

Here \(\lambda \approx 1.4\) , meaning 1.4 goals were scored per game, on average. Figure 10.5 displays the observed counts of goals with the expected counts from the Poisson model \(\text{Pois}(\lambda)\) in red.

Goals scored by each team in each game of the 2015 World Cup. Poisson model shown with red dots.

Figure 10.5: Goals scored by each team in each game of the 2015 World Cup. Poisson model shown with red dots.

Since the \(\chi^2\) test relies on the Central Limit Theorem, each cell in the table should have a large expected value to be approximately normal. Traditionally, the threshold is that a cell’s expected count should be at least five. Here, all cells with 4 or more goals are too small. The solution is to bin these small counts into one category, giving five total categories: zero goals, one goal, two goals, three goals, or 4+ goals. The observed and expected counts for the five categories are:

Goals 0 1 2 3 4+
Observed 30 40 20 6 8
Expected 25.5 35.9 25.2 11.8 5.6

The \(\chi^2\) test statistic will have \(3 = 5 - 1 - 1\) df, since:

  • There are 5 bins.
  • The bins sum to 104, losing one df.
  • The model’s one parameter \(\lambda\) was estimated from the data, losing one df.

We compute the \(\chi^2\) test statistic and \(p\) -value manually, because the chisq.test function is unaware that our expected values were modeled from the data, and would use the incorrect df.

The observed value of \(\chi^2\) is 6.15. The \(p\) -value of this test is 0.105, and we would not reject \(H_0\) at the \(\alpha = .05\) level. This test does not give evidence against goal scoring being Poisson.

Note that there is one aspect of this data that is highly unlikely under the assumption that the data comes from a Poisson random variable: ten goals were scored on two different occasions. The \(\chi^2\) test did not consider that, because we binned those large values into a single category. If you believe that data might not be Poisson because you suspect it will have unusually large values (rather than unusually many large values), then the \(\chi^2\) test will not be very powerful.

10.4.1 Simulations

This section investigates the test statistic in the \(\chi^2\) goodness of fit test via simulation. We observe that it does follow the \(\chi^2\) distribution with df equal to bins minus one minus number of parameters estimated from the data.

Suppose that data comes from a Poisson variable \(X\) with mean 2 and there are \(N = 200\) data points.

The expected count in bin 5 is 200 * dpois(5,2) which is 7.2, large enough to use. The expected count in bin 6 is only 2.4, so we combine all bins 5 and higher. In a real experiment, the sample data could affect the number of bins chosen, but we ignore that technicality.

Next, compute the expected counts for each bin using the rate \(\lambda\) estimated from the data. Bins 0-4 can use dpois but bin 5 needs the entire tail of the Poisson distribution.

Finally, we produce one value of the test statistic:

Naively using chisq.test with the data and the fit probabilities gives the same value of \(\chi^2 = 0.9264\) , but produces a \(p\) -value using 5 df, which is wrong. The function does not know that we used one df to estimate a parameter.

We now replicate to produce a sample of values of the test statistic to verify that 4 is the correct df for this test:

definition of tabular data presentation

The black curve is the probability density from our simulated data. The blue curve is \(\chi^2\) with 4 degrees of freedom, equal to (bins - parameters - 1). The red curve is \(\chi^2\) with 5 degrees of freedom and does not match the observations. This seems to be pretty compelling.

10.5 \(\chi^2\) tests on cross tables

Given two categorical variables \(A\) and \(B\) , we can form a cross table with one cell for each pair of values \((A_i,B_j)\) . That cell’s count is a random variable \(X_{ij}\) :

\[ \begin{array}{c|c|c|c|c|} & B_1 & B_2 & \quad\dotsb\quad & B_n \\ \hline A_1 & X_{11} & X_{12} & \quad\dotsb\quad & X_{1n} \\ \hline A_2 & X_{21} & X_{22} & \quad\dotsb\quad & X_{2n} \\ \hline \vdots & \vdots &\vdots & \quad\ddots\quad & \vdots \\ \hline A_m & X_{m1} & X_{m2} & \quad\dotsb\quad & X_{mn} \\ \hline \end{array} \]

As in all \(\chi^2\) tests, the null hypothesis leads to an expected value for each cell. In this setting, we require a probability \(p_{ij}\) that an observation lies in cell \((i,j)\) , \(p_{ij} = P(A = A_i\ \cap\ B = B_j)\) . These probabilities are called the joint probability distribution of \(A\) and \(B\) .

The hypothesized joint probability distribution needs to come from somewhere. It could come from historical or population data, or by fitting a parametric model, in which case the methods of the previous two sections apply.

We assume that \(B\) is random (and perhaps \(A\) as well, but not necessarily) and we consider the null hypothesis that the probability distribution of \(B\) is independent of the levels of \(A\) . Let \(N\) be the total number of observations. If we let \(a_i = \frac{1}{N} \sum_j X_{ij}\) denote the proportion of observations for which \(A = A_i\) and \(b_j = \frac{1}{N} \sum_i X_{ij}\) denote the proportion of responses for which \(B = B_j\) , then under the assumption of \(H_0\) we would hypothesize that \[ p_{ij} = a_i b_j. \] It follows that \(E[X_{ij}] = N a_i b_j\) .

The test statistic is \[ \chi^2 = \sum_{i,j} \frac{(X_{ij} - E[X_{ij}])^2}{E[X_{ij}]}. \]

When the expected cell counts \(E[X_{ij}]\) are all large enough, the test statistic has approximately a \(\chi^2\) distribution with \((\text{columns} - 1)(\text{rows} - 1)\) degrees of freedom. There are two explanations for why this is the correct degrees of freedom, depending on the details of the experimental design. The mechanics of the test itself, however, do not depend on the experimental design. Sections 10.5.1 and 10.5.2 discuss the details.

10.5.1 \(\chi^2\) test of independence

In the \(\chi^2\) test of independence, the levels of \(A\) and \(B\) are both random. In this case, we are testing

\[ H_0: A {\text{ and }} B {\text{ are independent random variables}} \] versus the alternative that they are not independent. The values of \(p_{ij} = a_i b_j\) have a natural interpretation as \(p_{ij} = P(A = A_i \cap B = B_j) = P(A = A_i) P(B = B_j)\) .

To understand the degrees of freedom in the test for independence, the experimental design matters. We fix \(N\) the total number of observations, and for each subject the two categorical variables \(A\) and \(B\) are measured (see Example 10.9 ). The row and column marginal sums of the cross table are random. Then:

  • There are \(mn\) cells.
  • There are \(m + n\) marginal probabilities \(a_i\) and \(b_j\) estimated from the data, and \(\sum a_i = \sum b_i = 1\) , so we lose \(m + n - 2\) df.
  • All cell counts must add to \(N\) , losing one df.
  • \(mn - (m + n - 2) - 1 = (m - 1)(n - 1)\)

Are grove snail color and banding patterns related? Figure 10.1 suggests that brown snails are more likely to be unbanded than the other colors.

In R, the \(\chi^2\) test for independence is simple: we pass the cross table to chisq.test .

The cross table is \(3 \times 4\) , so the \(\chi^2\) statistic has \((3-1)(4-1) = 6\) df. The \(p\) -value is very small, so we reject \(H_0\) . Snail color and banding are not independent.

Let’s reproduce the results of chisq.test . First, compute marginal probabilities.

Next, compute the joint distribution \(p_{ij} = a_ib_j\) . This uses the matrix multiplication operator %*% and the matrix transpose t to compute all 12 entries at once. The result is multiplied by \(N = 2904\) to get expected cell counts:

Finally, compute the \(\chi^2\) test statistic and the \(p\) -value, which match the results of chisq.test .

It is instructive to view each cell’s contribution to \(\chi^2\) graphically as a “heatmap” to provide a sense of which cells were most responsible for the dependence between color and banding.

definition of tabular data presentation

Clearly, most of the interaction between Banding and Color comes from the overabundance of unbanded (X00000) Brown snails. The authors of the original study were interested in environmental effects on color and bandedness of snails. It is possible, though a more thorough analysis would be required, that an environment that favors the survival of brown snails also favors unbanded snails.

To what extent do animals display conformity? That is, will they forgo personal information in order to follow the majority? Researchers 81 studied conformity among dogs. They trained a subject dog to walk around a wall in one direction in order to receive a treat. After training, the subject dog then watched other dogs walk around the wall in the opposite direction. If the subject dog changes its behavior to match the dogs it observed, it is evidence of conforming behavior.

The data from this experiment is available as the dogs data frame in the fosdata package.

This data set has quite a bit going on. In particular, each dog repeated the experiment three times, which means that it would be unwise to assume independence across trials. So, we will restrict to the first trial only. We also restrict to dogs that did not drop out of the experiment.

Subject dogs participated under three conditions. The control group (condition = 0) observed no other dogs, and was simply asked to repeat what they were trained to do. Another group (condition = 1) saw one dog that went the “wrong” way around the wall three times. Another group (condition = 3) saw three different dogs that each went the wrong way around the wall one time.

We summarize the results of the experiment with a table showing the three experimental conditions in the three rows and whether the subject dog conformed or not in the two columns.

The null hypothesis is that conform and condition are independent variables, so that the three groups of dogs would have the same conforming behavior. We store the cross table and apply the \(\chi^2\) test for independence:

The \(p\) -value is 0.61, so there is not significant evidence that the conform and condition variables are dependent. Dogs do not disobey training to conform, at least according to this simple analysis.

The \(\chi^2\) test reports 2 df because we have 3 rows and 2 columns, and \((3-1)(2-1) = 2\) . The test also produces a warning, because the expected cell count for conforming dogs under condition 0 is low. With a high \(p\) -value and good cell counts elsewhere, the lack of accuracy is not a concern.

A link to the paper associated with the dogs data is given in ?fosdata::dogs . Find the place in the paper where they perform the above \(\chi^2\) test, and read the authors’ explanation related to it.

10.5.2 \(\chi^2\) test of homogeneity

In a \(\chi^2\) test of homogeneity, one of the variables \(A\) and \(B\) is not random. For example, if an experimenter decides to collect data on cats by finding 100 American shorthair cats, 100 highlander cats, and 100 munchkin cats and measuring eye color for each of the 300 cats, then the number of cats of each breed is not a random variable. A test of this type is called a \(\chi^2\) test of homogeneity, or a \(\chi^2\) test with one fixed margin. However, we are still interested in whether the distribution of eye color depends on the breed of the cat, and we proceed exactly in the same manner as before, with a slightly reworded null hypothesis and a different justification of the degrees of freedom. We denote \(B\) as the variable that is random. Our null hypothesis is:

\[ H_0: {\text{ the distribution of $B$ does not depend on the level of $A$}} \] and the alternative hypothesis is that the distribution of \(B\) does depend on the level of \(A\) . We compute degrees of freedom as follows:

  • There are \(n\) marginal probabilities \(b_1, \ldots, b_n\) . Since these must sum to 1, we lose \(n - 1\) degrees of freedom.
  • Each row sums to a fixed number, so we lose \(m\) degrees of freedom.
  • We do not lose any degrees of freedom for all bins summing to \(N\) , since that is implied by the column condition.
  • Total degrees of freedom are \(mn - (n - 1) - m = (m - 1)(n - 1)\) , as in the case of the \(\chi^2\) test of independence.

The mechanics of a \(\chi^2\) test of homogeneity are the same as a \(\chi^2\) test of independence.

Consider the sharks data set 82 in the fosdata package. Participants were paid 25 cents to listen to either silence, ominous music, or uplifting music while possibly watching a video on sharks. An equal number were recruited for each type of music. They were then asked to give their rating from 1-7 on their willingness to help conserve endangered sharks. We are interested in whether the distribution of the participants’ willingness to conserve sharks depends on the type of music they listened to.

We start by computing the cross table of the data.

The rows do not add up to exactly the same number because some participants dropped out of the study. We ignore this problem and continue.

We see that there is not sufficient evidence to conclude that the distribution of willingness to help conserve endangered sharks depends on the type of music heard ( \(p = .6982\) ).

10.5.3 Two sample test for equality of proportions

An important special case of the \(\chi^2\) test for independence is the two sample test for equality of proportions.

Suppose that \(n_1\) trials are made from population 1 with \(x_1\) successes, and that \(n_2\) trials are made from population 2 with \(x_2\) successes. We wish to test \(H_0: p_1 = p_2\) versus \(H_a: p_1 \not= p_2\) , where \(p_i\) is the true probability of success from population \(i\) . We create a \(2\times 2\) table of values as follows:

\[ \begin{array}{c|c|c|} & \text{Pop. 1} & \text{Pop. 2} \\ \hline \text{Successes} & x_{1} & x_{2} \\ \hline \text{Failures} & n_1 - x_1 & n_2 - x_2 \\ \hline \end{array} \]

The null hypothesis says that \(p_1 = p_2\) . We estimate this common probability using all the data:

\[ \hat{p} = \frac{\text{Successes}}{\text{Trials}} = \frac{x_1 + x_2}{n_1 + n_2} \]

The expected number of successes under \(H_0\) is calculated from \(n_1\) , \(n_2\) , and \(\hat{p}\) :

\[ \begin{array}{c|c|c|} & \text{Pop. 1} & \text{Pop. 2} \\ \hline \text{Exp. Successes} & n_1\hat{p} & n_2\hat{p} \\ \hline \text{Exp. Failures} & n_1(1-\hat{p}) & n_2(1-\hat{p})\\ \hline \end{array} \]

We then compute the \(\chi^2\) test statistic. This has 1 df, since there were 4 cells, two constraints that the columns sum to \(n_1\) , \(n_2\) , and one parameter estimated from the data.

The test statistic and \(p\) -value can be computed with chisq.test . The prop.test function performs the same computation, and allows for finer control over the test in this specific setting.

Researchers randomly assigned patients with wrist fractures to receive a cast in one of two positions, the VFUDC position and the functional position. The assignment of cast position should be independent of which wrist (left or right) was fractured. We produce a cross table from the data in fosdata::wrist and run the \(\chi^2\) test for independence:

For prop.test we need to know the group sizes, \(n_1 = 45\) with right-side fractures and \(n_2 = 60\) with left-side fractures. We also need the number of successes, which we arbitrarily select as cast position 1.

The prop.test function applies a continuity correction by default. chisq.test only applies continuity correction in this \(2 \times 2\) case. There seems to be some disagreement on whether or not continuity correction is desirable. From the point of view of this text, we would choose the version that has observed type I error rate closest to the assigned rate of \(\alpha\) . Let’s run some simulations, using \(n_1 = 45\) , \(n_2 = 60\) , and success probability \(p = 50/105\) to match the wrist example.

We see that for this sample size and common probability of success, correct = FALSE comes closer to the desired type I error rate of 0.05, but is a bit too high. This holds across a wide range of \(p\) , \(n_1\) and \(n_2\) . Using continuity correction tends to have effective error rates lower than the designed type I error rates, while correct = FALSE has type I error rates closer to the designed type I error rates.

Consider the babynames data set in the babynames package. Is there a statistically significant difference in the proportion of girls named “Bella” 83 in 2007 and the proportion of girls named “Bella” in 2009?

We will need to do some data wrangling on this data and define a binomial variable bella :

We see that the number of girls named “Bella” nearly doubled from 2007 to 2009. The two sample proportions test shows that this was highly significant.

10.6 Exact and Monte Carlo methods

The \(\chi^2\) methods of the previous sections all approximate discrete variables with continuous (normal) variables. Exact and Monte Carlo methods are very general approaches to testing tabular data, and neither method requires assumptions of normality.

Exact methods produce exact \(p\) -values by examining all possible ways the \(N\) outcomes could fill the table. The first step of an exact method is to compute the test statistic associated to the observed data, often \(\chi^2\) . Then for each possible table, compute the test statistic and the probability of that table occurring, assuming the null hypothesis. The \(p\) -value is the sum of the probabilities of the tables whose associated test statistics are as extreme or more extreme than the observed test statistic. This \(p\) -value is exact because (assuming the null hypothesis) it is exactly the probability of obtaining a test statistic as or more extreme than the one coming from the data.

Unfortunately, the number of ways to fill out a table grows exponentially with the number of cells in the table (or more precisely, exponentially in the degrees of freedom). This makes exact methods unreasonably slow when \(N\) is large or the table has many cells. Monte Carlo methods present a compromise that avoids assumptions but stays computationally tractable. Rather than investigate every possible way to fill the table, we randomly create many tables according to the null hypothesis. For each, the \(\chi^2\) statistic is computed. The \(p\) -value is taken to be the proportion of generated tables that have a larger \(\chi^2\) statistic than the observed data. Though we compute the \(\chi^2\) statistic for the observed and simulated tables, we do not rely on assumptions about its distribution – it may not have a \(\chi^2\) distribution at all.

Return to the data on age cohorts in soccer, introduced in Example 10.6 . There were three relative age groups in each cohort year: old, middle, and young. Our null hypothesis is that each age group should be equally likely for an elite soccer player in a given cohort. The data has \(N = 1107\) boys, with 631, 321, and 155 in the old, middle, and young groups.

To apply Monte Carlo methods, we need to generate simulated \(3\times 1\) tables. We use the R function rmultinom , which generates multinomially distributed random number vectors. As with all random variable generation functions in R, the first argument to rmultinom is the number of simulations we want. Then there are two required parameters, the number of observations \(N\) in each table and the null hypothesis probability distribution. Here are ten tables that might result from the experiment, one in each column.

From the first column, one possible outcome of the soccer study would be to find 355, 394, and 358 boys in the old, medium, and young groups. The next nine columns are also possible outcomes, each with \(N = 1107\) observations. It is apparent that the observed value of 631 boys in the “old” group is exceptionally large under \(H_0\) .

To get a \(p\) -value, we first compute the \(\chi^2\) statistic for the observed data:

The \(\chi^2\) statistic is a measure of how far our observed group sizes are from the expected group sizes. For the observed boys, \(\chi^2\) is 316.3794. Next compute the \(\chi^2\) statistic for each set of simulated group sizes:

Again, it is clear that the observed data is quite different than the data that was simulated under \(H_0\) . We should use more than 10 simulations, of course, but for this particular data you will never see a value as large as 316 in the simulations. The true \(p\) -value for this experiment is essentially zero.

R can carry out the Monte Carlo method within the chisq.test function:

The function performed 2000 simulations and none of them had a higher \(\chi^2\) value than the observed data. The \(p\) -value was reported as 1/2001, because R always includes the actual data set in addition to the 2000 simulated values. This is a common technique that makes a relatively small absolute difference in estimates.

Continuing with the boys elite soccer age data, we show how to apply the exact multinomial test .

The idea of the exact test is to sum the probabilities of all tables that lead to test statistics that are as extreme or more extreme than the observed test statistic. The table of boys is \(3 \times 1\) , and we need the three values in the table to sum to 1107. In Exercise 10.29 , you are asked to show that there are 614386 possible ways to fill a \(3 \times 1\) table with numbers that sum to 1107.

The multinomial.test function in the EMT package carries out this process.

As before, the \(p\) -value is 0. The EMT::multinomial.test function can also run Monte Carlo tests using the parameter MonteCarlo = TRUE .

Vignette: Tables

Tables are an often overlooked part of data visualization and presentation. They can also be difficult to do well! In this vignette, we introduce the knitr::kable function, which produces tables compatible with .pdf, .docx and .html output inside of your R Markdown documents.

To make a table using knitr::kable , create a data frame and apply kable to it.

Brown Pink Yellow
X00000 339 433 126
X00300 48 421 222
X12345 16 395 352
Others 23 373 156

Suppose you are studying the palmerpenguins::penguins data set, and you want to report the mean, standard deviation, range, and number of samples of bill length in each species type. The dplyr package helps to produce the data frame, and we use kable options to create a caption and better column headings. The table is displayed as Table 10.1 .

Table 10.1: Bill lengths (mm) for penguins.
Species Mean SD Range # Birds
Adelie 38.79 2.66 32.1 – 46 151
Chinstrap 48.83 3.34 40.9 – 58 68
Gentoo 47.50 3.08 40.9 – 59.6 123

The kable package provides only basic table styles. To adjust the width and other features of table style, use the kableExtra package.

Another interesting use of tables is in combination with broom::tidy , which converts the outputs of many common statistical tests into data frames. Let’s see how it works with t.test .

Display the results of a \(t\) -test of the body temperature data from fosdata::normtemp in a table.

estimate statistic p.value parameter conf.low conf.high
98.249 -5.455 0 129 98.122 98.376

We selected only the first six variables so that the table would better fit the page.

As a final example, let’s test groups of cars from mtcars to see if their mean mpg is different from 25. The groups we want are the four possible combinations of transmission ( am ) and engine ( vs ). This requires four \(t\) -tests, and could be a huge hassle! But, check this out:

Table 10.2: Is mean mpg 25 for combinations of trans and engine? A two-sided one sample \(t\)-test.
am vs estimate statistic p.value parameter conf.low conf.high
0 0 15.0500 -12.4235 0.0000 11 13.2872 16.8128
0 1 20.7429 -4.5581 0.0039 6 18.4575 23.0282
1 0 19.7500 -3.2078 0.0238 5 15.5430 23.9570
1 1 28.3714 1.8748 0.1099 6 23.9713 32.7716

Exercises 10.1 – 10.2 require material through Section 10.1 .

definition of tabular data presentation

Consider the cern data set in the fosdata package. Create a figure similar to Figure 10.1 which illustrates the total number of likes for each type of post, colored by the platform. French Twitter may not show up because it has so few likes.

Exercises 10.3 – 10.8 require material through Section 10.2 .

Suppose you are testing \(H_0: p = 0.4\) versus \(H_a: p \not= 0.4\) . You collect 20 pieces of data and observe 12 successes. Use dbinom to compute the \(p\) -value associated with the exact binomial test, and check using binom.test .

Suppose you are testing \(H_0: p = 0.4\) versus \(H_a: p \not= 0.4\) . You collect 100 pieces of data and observe 33 successes. Use the normal approximation to the binomial to find an approximate \(p\) -value associated with the hypothesis test.

Shaquille O’Neal (Shaq) was an NBA basketball player from 1992–2011. He was a notoriously bad free throw shooter 85 . Shaq always claimed, however, that the true probability of him making a free throw was greater than 50%. Throughout his career, Shaq made 5,935 out of 11,252 free throws attempted. Is there sufficient evidence to conclude that Shaq indeed had a better than 50/50 chance of making a free throw?

Diaconis, Holmes and Montgomery 86 claim that vigorously flipped coins tend to come up the same way they started. In a real coin tossing experiment 87 , two UC Berkeley students tossed coins a total of 40 thousand times in order to assess whether this is true. Out of the 40,000 tosses, 20,245 landed on the same side as they were tossed from.

  • Find a (two-sided) 99% confidence interval for \(p\) , the true proportion of times a coin will land on the same side it is tossed from.
  • Clearly state the null and alternative hypotheses, defining any parameters that you use.
  • Is there sufficient evidence to reject the null hypothesis at the \(\alpha = .05\) level based on this experiment? What is the \(p\) -value?

This exercise requires material from Section 6.7 or knowledge of loops. The curious case of the dishonest statistician – suppose a statistician wants to “prove” that a coin is not a fair coin. They decide to start flipping the coin, and after 10 tosses they will run a hypothesis test on \(H_0: p = 1/2\) versus \(H_a: p \not= 1/2\) . If they reject at the \(\alpha = .05\) level, they stop. Otherwise, they toss the coin one more time and run the test again. They repeatedly toss and run the test until either they reject \(H_0\) or they toss the coin 100 times (hey, they’re dishonest and lazy). Estimate using simulation the probability that the dishonest statistician will reject \(H_0\) .

Suppose you wish to test whether a die truly comes up “6” 1/6 of the time. You decide to roll the die until you observe 100 sixes. You do this, and it takes 560 rolls to observe 100 sixes.

  • State the appropriate null and alternative hypotheses.
  • Explain why prop.test and binom.test are not formally valid to do a hypothesis test.
  • Use reasoning similar to that in the explanation of binom.test above and the function dnbinom to compute a \(p\) -value.
  • Should you accept or reject the null hypothesis?

Exercises 10.9 – 10.12 require material through Section 10.3 .

Suppose you are collecting categorical data that comes in three levels. You wish to test whether the levels are equally likely using a \(\chi^2\) test. You collect 150 items and obtain a test statistic of 4.32. What is the \(p\) -value associated with this experiment?

Recall that the colors of M&M’s supposedly follow this distribution:

\[ \begin{array}{cccccc} Yellow & Red & Orange & Brown & Green & Blue \\ 0.14 & 0.13 & 0.20 & 0.12 & 0.20 & 0.21 \end{array} \]

Imagine you bought 10,000 M&M’s and got the following color counts:

\[ \begin{array}{cccccc} Yellow & Red & Orange & Brown & Green & Blue \\ 1357 & 1321 & 1946 & 1182 & 2052 & 2142 \end{array} \]

Does your sample appear to follow the known color distribution? Perform the appropriate \(\chi^2\) test at the \(\alpha = .05\) level and interpret.

The data set fosdata::bechdel has information on budget and earnings for many popular movies.

  • Is the budget data consistent with Benford’s Law?
  • Is the intgross data consistent with Benford’s Law?
  • Is the domgross data consistent with Benford’s Law? (Hint: one movie had no domestic gross. Bonus: which one was it?)

The United States Census Bureau produces estimates of population for all cities and towns in the U.S. On the census website http://www.census.gov , find population estimates for all incorporated places (cities and towns) for any one state. Import that data into R. Do the values for city and town population numbers follow Benford’s Law? Report your results with a plot and a \(p\) -value as in Example 10.7 .

Exercises 10.13 – 10.17 require material through Section 10.4 .

Did the goals scored by each team in each game of the 2014 FIFA Men’s World Cup soccer final follow a Poisson distribution? Perform a \(\chi^2\) goodness of fit test at the \(\alpha = 0.05\) level, binning values 4 and above. Data is in fosdata::world_cup .

Consider the austen data set in the fosdata package. In this exercise, we are testing to see whether the number of times that words are repeated after their first occurrence is Poisson. Restrict to the first chapter of Pride and Prejudice , and count the number of times that each word is repeated, and see that we obtain the following table:

Use a \(\chi^2\) goodness of fit test with \(\alpha = .05\) to test whether the distribution of repetitions of words is consistent with a Poisson distribution.

Powerball is a lottery game in which players try to guess the numbers on six balls drawn randomly. The first five are white balls and the sixth is a special red ball called the powerball. The results of all Powerball drawings from February 2010 to July 2020 are available in fosdata::powerball .

  • Plot the numbers drawn over time. Use color to distinguish the six balls. What do you observe? You will need pivot_longer to tidy the data.
  • Use a \(\chi^2\) test of uniformity to check if all numbers ever drawn fit a uniform distribution.
  • Restrict to draws after October 4, 2015, and only consider the white balls drawn, Ball1 - Ball5 . Do they fit a uniform distribution?
  • Restrict to draws after October 4, 2015, and only consider Ball1. Check that it is not uniform. Explain why not.

In this exercise, we explore doing \(\chi^2\) goodness of fit tests for continuous variables. Consider the hdl variable in the adipose data set in fosdata . We wish to test whether the data is normal using a \(\chi^2\) goodness of fit test and 7 bins.

  • Estimate the mean \(\mu\) and the standard deviation \(\sigma\) of the HDL.
  • Use qnorm(seq(0, 1, length.out = 8), mu, sigma) to create the dividing points ( breaks ) between 7 equally likely regions. The first region is \((-\infty, 0.8988)\) .
  • Use table(cut(aa, breaks = breaks)) to obtain the observed distribution of values in bins. The expected number in each bin is the number of data points over 7, since each bin is equally likely.
  • Compute the \(\chi^2\) test statistic as the difference between observed and expected squared, divided by the expected.
  • Compute the probability of getting this test-statistic or larger using pchisq . The degrees of freedom is the number of bins minus 3, one because the sum has to be 71 and the other because you are estimating two parameters from the data.
  • Is there evidence to conclude that HDL is not normally distributed?

Consider the fosdata::normtemp data set. Use a goodness of fit test with 10 bins, all with equal probabilities, to test the normality of the temperature data set. Note that in this case, you will need to estimate two parameters, so the degrees of freedom will need to be adjusted appropriately.

Exercises 10.18 – 10.28 require material through Section 10.5 .

Clark and Westerberg 88 investigated whether people can learn to toss heads more often than tails. The participants were told to start with a heads up coin, toss the coin from the same height every time, and catch it at the same height, while trying to get the number of revolutions to work out so as to obtain heads. After training, the first participant got 162 heads and 138 tails.

  • Find a 95% confidence interval for \(p\) , the proportion of times this participant will get heads.
  • Clearly state the null and alternative hypotheses, defining any parameters.
  • Is there sufficient evidence to reject the null hypothesis at the \(\alpha = .01\) level based on this experiment? What is the \(p\) -value?
  • The second participant got 175 heads and 125 tails. Is there sufficient evidence to conclude that the probability of getting heads is different for the two participants at the \(\alpha = .05\) level?

Left digit bias is when people attribute a difference to two numbers based on the first digit of the number, when there is not really a large difference between the numbers. In an article 89 , researchers studied left digit bias in the context of treatment choices for patients who were just over or just under 80 years old.

Researchers found that 265 of 5036 patients admitted with acute myocardial infarction who were admitted in the two weeks after their 80th birthday underwent Coronary-Artery Bypass Graft (CABG) surgery, while 308 out of 4426 patients with the same diagnosis admitted in the two weeks before their 80th birthday underwent CABG. There is no recommendation in clinical guidelines to reduce CABG use at the age of 80. Is there a statistically significant difference in the percentage of patients receiving CABG in the two groups?

Exercises 10.20 and 10.21 consider the psychology of randomness, as studied in Bar-Hillel et al. 90

The researchers considered whether people are good at creating random sequences of heads and tails in a unique way. The researchers recruited 175 people and asked them to create a random sequence of 10 heads and tails, though the researchers were only interested in the first guess. Of the 175 people, 143 predicted heads on the first toss. Let \(p\) be the probability that a randomly selected person will predict heads on the first toss. Perform a hypothesis test of \(p = 0.5\) versus \(p \not= 0.5\) at the \(\alpha = 0.05\) level.

The researchers also considered whether the linguistic convention of naming heads before tails impacts participants’ choice for their first imaginary coin toss. The authors recruited 54 people and told them to create a sample of size 10 by entering H for heads and T for tails. They recruited 51 people and told them to create a sample of size 10 by entering T for tails and H for heads. A total of 47 of the 54 people in Group 1 chose heads first, while 16 of the 51 people in Group 2 chose heads first. Perform a hypothesis test of \(p_1 = p_2\) versus \(p_1 \not= p_2\) at the \(\alpha = .05\) level, where \(p_i\) is the percentage of heads that people given instructions in Group \(i\) would create as their first guess.

If someone offered you either one really great marble and three mediocre ones, or four mediocre marbles, which would you choose?

Third-grade children in Rijen, the Netherlands, were split into two groups. 91 In group 1, 43 out of 48 children preferred a blue and white striped marble to a solid red marble. In group 2, 12 out of 44 children preferred four solid red marbles to three solid red marbles and one blue and white striped marble. Let \(p_1\) be the proportion of children who would prefer a blue and white marble to a red marble, and let \(p_2\) be the proportion of children who would prefer three red marbles and one blue and white striped marble to four red marbles. Perform a hypothesis test of \(p_1 = p_2\) versus \(p_1 \not= p_2\) at the \(\alpha = .05\) level.

A 2017 study 92 considered the care of patients with burns. A patient who stayed in the hospital for seven or more days past the last surgery for a burn is considered an extended postoperative stay. The researchers examined records and found that for patients with scalds, 30 did not have extended stays while 16 did have extended stays. For patients with flame burns, 51 did not have extended stays while 78 did have extended stays. Test whether the proportion of extended stays is the same for scald patients as for flame burn patients at the \(\alpha = .05\) level.

Ronald Reagan became president of the United States in 1980. The babynames::babynames data set contains information on babies named “Reagan” born in the United States. Is there a statistically significant difference in the percentage of babies (of either sex) named “Reagan” in the United States in 1982 and in 1978? If so, which direction was the change?

Consider the dogs data set in the fosdata package. For dogs in trial 1 that were shown a single dog going around the wall in the “wrong” direction three times, is there a statistically significant difference in the proportion that stay and the proportion that switch depending on their start direction?

Consider the sharks data set in the fosdata package. Participants were assigned to listen to either silence, ominous music, or uplifting music while watching a video about sharks. They then ranked sharks on various scales.

  • Create a cross table of the type of music listened to and the response to dangerous ; “how well does dangerous describe sharks.”
  • Perform a \(\chi^2\) test of homogeneity to test whether the ranking of how well “dangerous” describes sharks has the same distribution across the type of music heard.

Police sergeants in the Boston Police Department take an exam for promotion to lieutenant. In 2008, 91 sergeants took the lieutenant promotion test. Of them, 65 were white and 26 were Black or Hispanic. 93 The passing rate for white officers was 94%, while the passing rate for minorities was 69%. Was there a significant difference in the passing rates for whites and for minority test takers?

Bicycle signage. (Image credit: Hess and Peterson.)

Figure 10.6: Bicycle signage. (Image credit: Hess and Peterson.)

Hess and Peterson 94 studied whether bicycle signage can affect an automobile driver’s perception of bicycle rights and safety. Load the fosdata::bicycle_signage data, and see the help page for descriptions of the variables.

  • Create a contingency table of the variables bike_move_right2 and treatment .
  • Calculate the proportion of participants who agreed and disagreed for each type of sign treatment. Which sign was most likely to lead participants to disagree?
  • Perform a \(\chi^2\) test of independence on the variables bike_move_right2 and treatment at the \(\alpha = .05\) level. Interpret your answer.

Exercise 10.29 requires material through Section 10.6 .

In Example 10.15 , we stated that the number of possible ways to fill a \(3 \times 1\) table with non-negative integers that sum to 1107 is 614,386. Explain why this is the case. (Hint: if you know the first two values, then the third one is determined.)

Raittio et al., “Two Casting Methods Compared in Patients with C olles’ Fracture.” ↩︎

A J Cain and P M Sheppard, “Selection in the Polymorphic Land Snail Cep æ a Nemoralis,” Heredity 4, no. 3 (1950): 275–94. ↩︎

The R function proportions is new to R 4.0.1 and is recommended as a drop-in replacement for the unfortunately named prop.table . ↩︎

M Papadatou-Pastou et al., “Human Handedness: A Meta-Analysis.” Psychological Bulletin 146, no. 6 (2020): 481–524, https://doi.org/10.1037/bul0000229 . ↩︎

John R Doyle, Paul A Bottomley, and Rob Angell, “Tails of the Travelling Gaussian Model and the Relative Age Effect: Tales of Age Discrimination and Wasted Talent,” PLOS One 12, no. 4 (April 2017): 1–22, https://doi.org/10.1371/journal.pone.0176206 . ↩︎

Markus Germar et al., “Dogs (Canis Familiaris) Stick to What They Have Learned Rather Than Conform to Their Conspecifics’ Behavior,” PLOS One 13, no. 3 (March 2018): 1–16, https://doi.org/10.1371/journal.pone.0194808 . ↩︎

Andrew P Nosal et al., “The Effect of Background Music in Shark Documentaries on Viewers’ Perceptions of Sharks.” PLOS One 11, no. 8 (2016): e0159279, https://doi.org/10.1371/journal.pone.0159279 . ↩︎

“Bella” was the name of the character played by Kristen Stewart in the movie Twilight , released in 2008. Fun fact, one of the authors has a family member who appeared in The Twilight Saga: Breaking Dawn - Part 2 . ↩︎

Kate Kahle, Aviv J Sharon, and Ayelet Baram-Tsabari, “Footprints of Fascination: Digital Traces of Public Engagement with Particle Physics on CERN’s Social Media Platforms.” PLOS One 11, no. 5 (2016): e0156409. ↩︎

Shaq is reported to have said, “Me shooting 40 percent at the foul line is just God’s way of saying that nobody’s perfect. If I shot 90 percent from the line, it just wouldn’t be right.” ↩︎

Persi Diaconis, Susan Holmes, and Richard Montgomery, “Dynamical Bias in the Coin Toss,” SIAM Review 49, no. 2 (2007): 211–35. ↩︎

Priscilla Ku and Janet Larwood, “40,000 Coin Tosses Yield Ambiguous Evidence for Dynamical Bias,” 2009, https://www.stat.berkeley.edu/~aldous/Real-World/coin_tosses.html . ↩︎

Matthew P A Clark and Brian D Westerberg, “Holiday Review. How Random Is the Toss of a Coin?” Canadian Medical Association Journal 181, no. 12 (December 2009): E306–8. ↩︎

Andrew R Olenski et al., “Behavioral Heuristics in Coronary-Artery Bypass Graft Surgery.” N Engl J Med 382, no. 8 (February 2020): 778–79. ↩︎

M Bar-Hillel, E Peer, and A Acquisti, “ ‘Heads or Tails?’ – a Reachability Bias in Binary Choice,” Journal of Experimental Psychology: Learning, Memory, and Cognition 40, no. 6 (2014): 1656--1663, https://doi.org/10.1037/xlm0000005 . ↩︎

Ellen R K Evers, Yoel Inbar, and Marcel Zeelenberg, “Set-Fit Effects in Choice.” J Exp Psychol Gen 143, no. 2 (April 2014): 504–9. ↩︎

Islam Abdelrahman et al., “Division of Overall Duration of Stay into Operative Stay and Postoperative Stay Improves the Overall Estimate as a Measure of Quality of Outcome in Burn Care,” PLOS One 12, no. 3 (March 2017): e0174579–79. ↩︎

Zack Huffman, “Boston Police Promotion Exam Deemed Biased” (Courthouse News Service, November 18, 2015), https://www.courthousenews.com/boston-police-promotion-exam-deemed-biased/ . ↩︎

George Hess and M Nils Peterson, “"Bicycles May Use Full Lane" Signage Communicates U.S. Roadway Rules and Increases Perception of Safety,” PLOS One 10, no. 8 (August 2015): e0136973. ↩︎

Talk to our experts

1800-120-456-456

  • Tabular Presentation of Data

ffImage

Understanding tabular representation of statistical data

The statistical data usually refers to the aggregate of the numerical data which eventually contributes to its collection, interpretation, and analysis. Quantifying this data helps with the research and statistical operations. In the tabular presentation, the data is presented in the form of rows and columns, and this data positioning makes reading and understanding the data more feasible. The logical and statistical conclusions are derived from the presentation of the data.

Objectives of Tabular Data Presentation

The objectives of tabular data presentation are as follows.

The tabular data presentation helps in simplifying the complex data.

It also helps to compare different data sets thereby bringing out the important aspects.

The tabular presentation provides the foundation for statistical analysis.

The tabular data presentation further helps in the formation of graphs, as well as diagrams for the purpose of advanced data analysis.

Parts of the Table that are Used in the Tabulation

Some of the parts that are used in the table of tabular data presentation are as follows.

Table number: This is included for the purpose of identification and it provides for easy reference. 

Title: It provides the nature of information which is included in the table. This information is included adjacent to table number. 

Stub: This is provided on the left-side of tabular form. The specific issues that are mentioned in the stub are presented in the horizontal rows. 

Caption: The caption is put on the top of columns within the table. The columns come with the specific unit within which figures are noted down.

Body: This is the most significant of the table and it is located in the middle or centre of the table. It is made up of numerical contents. 

Footnote: The footnote gives the scope or potential for further explanation that might be required for any item which is included in the table. The footnote helps with the clarification of data that is mentioned within the table. 

Information source: The information source is included on the bottom of the table. It gives the source related to the specific piece of information and the authenticity of the sources that are cited here helps in contributing to the credibility of the data. 

You can check out the illustration of the tabular presentation of data through the provided sample included in the Vedantu notes related to this topic. The different forms of tabular analysis are quantitative analysis, qualitative analysis, spatial analysis, and temporal analysis. When it comes to limitations related to the tabular presentation of the data, they are lack of focus on the individual items, no scope or potential for description, and requiring expert knowledge.

Illustration Of A Tabular Representation of Data 

Tabular presentation of data example is shown below. 

Age group

(in years)

Children

(Female)

Total 

(X)

Children

(Male)

Total

(Y)

Grand total

(X+Y)


Residents 

Non-Residents


Residents 

Non- Residents



3-5

8

4

12

4

4

8

20

5-8

3

3

6

1

2

3

9

8-10

3

3

6

2

2

4

10

10-12

0

4

4

1

2

3

7

12-15

1

3

4

0

0

0

4

Total 

15

17

32

8

10

18

50

Test Your Knowledge –

1. Where Is A “Headnote” Placed In A Table?

A headnote comprises the main title

It follows the primary title within a small bracket

A headnote can be placed anywhere in the table

2. Which Of The Following is Used for Explanation of Column Figures?

Caption 

Title 

Forms of Tabular Analysis 

Quantitative .

The quantitative tabular analysis provides a description and interpretation of items based on statistics. Such analysis is undertaken through numeric variables as well as statistical methods. 

Qualitative 

Qualitative analysis is done, taking into account various attributes that are non-numerical. For instance, it may include social status, nationality, and physical specifications, among others. In such classification, the attributes that are taken into consideration cannot be subjected to quantitative measurement. 

Spatial 

Categorisation, when done based on location such as a state, country, block, and district, etc., is called spatial analysis.

Temporal 

In this analysis method, time becomes a variable for data analysis. Such consideration of time may be in the form of hours, days, weeks, and months among others. 

Limitations of A Tabular Presentation 

There are certain drawbacks to a table presentation of data that have been mentioned below. 

Lack of Focus on Individual Items 

Individual items are not presented distinctly. A tabular presentation shows data in an aggregated manner.

No Scope for Description 

It is only the figures that are indicated in a tabular presentation. The attributes of those figures cannot be mentioned in tables. Moreover, the qualitative aspects of figures cannot be mentioned. 

Requires Expert Knowledge 

A layperson will not be able to decipher the intricacies that are mentioned in the figures within a tabular presentation. Its interpretation and analysis can only be undertaken by a person with the requisite expertise. 

To know more about this topic and others, install the Vedantu app on your device and read from online study materials available over our platform.

arrow-right

FAQs on Tabular Presentation of Data

1. What is tabular data presentation?

The specific methods that are used for presenting statistical data in the tabular format is known as tabular presentation of data. The data is systematically and logically arranged within the rows and the columns with regards to the specific characteristics of the data. The tabular data presentation makes forthright interpretation as well as comprehensible dataset. This is the reason why tabular data presentation format is widely used in a number of applications where data needs to be organised and analysed.

2. What are the objectives related to data tabulation?

There are specific and well-defined objectives that are associated with the presentation of data tabulation. The data tabular presentation helps with the easy conversion of data into a simple and comprehensible form through tabulation. Besides data arrangement convenience, the tabular presentation of data also creates the foundation for statistical analysis. This statistical analysis might include dispersion, averages, and correlation amongst other factors. These well-laid out objectives are the primary reason behind the usage of tabular data presentation.

3. What are the primary benefits of using tabular presentation of data?

The tabular presentation of data helps with the organisation of data that is easy to understand and analyse. It also helps with the comparison of data. The data is presented in such a way that it helps reduce the time and effort of the user through the organisation as well as the simplicity of the data presentation. The easy organisation plus presentation of data in tabular form is one of the reasons why it is widely used in data analysis.

4. Can I rely on the tabular presentation of data notes from Vedantu?

Yes, you can rely on the Vedantu note for tabular presentation of data. These notes and chapters are compiled by well-qualified teachers or experts who have distinguished knowledge in the subject and who understand the comprehension skills of the students. These notes are carefully created to provide the best explanation of the topic and help students understand the concept in detail through text and illustrations wherever essential.

5. How can I access the tabular presentation of data notes provided by Vedantu?

If you want access to the Vedantu notes on tabular presentation of data then you can download it from the Vedantu app or website. These notes are available for download in the PDF file format for free. Once you are on the relevant section of the website, you will find the “Download PDF” button and when you click on that option, the file will be downloaded on your device. Now you can access the Vedantu notes even offline as per your convenience.

Presentation of Data

Class Registration Banner

Statistics deals with the collection, presentation and analysis of the data, as well as drawing meaningful conclusions from the given data. Generally, the data can be classified into two different types, namely primary data and secondary data. If the information is collected by the investigator with a definite objective in their mind, then the data obtained is called the primary data. If the information is gathered from a source, which already had the information stored, then the data obtained is called secondary data. Once the data is collected, the presentation of data plays a major role in concluding the result. Here, we will discuss how to present the data with many solved examples.

What is Meant by Presentation of Data?

As soon as the data collection is over, the investigator needs to find a way of presenting the data in a meaningful, efficient and easily understood way to identify the main features of the data at a glance using a suitable presentation method. Generally, the data in the statistics can be presented in three different forms, such as textual method, tabular method and graphical method.

Presentation of Data Examples

Now, let us discuss how to present the data in a meaningful way with the help of examples.

Consider the marks given below, which are obtained by 10 students in Mathematics:

36, 55, 73, 95, 42, 60, 78, 25, 62, 75.

Find the range for the given data.

Given Data: 36, 55, 73, 95, 42, 60, 78, 25, 62, 75.

The data given is called the raw data.

First, arrange the data in the ascending order : 25, 36, 42, 55, 60, 62, 73, 75, 78, 95.

Therefore, the lowest mark is 25 and the highest mark is 95.

We know that the range of the data is the difference between the highest and the lowest value in the dataset.

Therefore, Range = 95-25 = 70.

Note: Presentation of data in ascending or descending order can be time-consuming if we have a larger number of observations in an experiment.

Now, let us discuss how to present the data if we have a comparatively more number of observations in an experiment.

Consider the marks obtained by 30 students in Mathematics subject (out of 100 marks)

10, 20, 36, 92, 95, 40, 50, 56, 60, 70, 92, 88, 80, 70, 72, 70, 36, 40, 36, 40, 92, 40, 50, 50, 56, 60, 70, 60, 60, 88.

In this example, the number of observations is larger compared to example 1. So, the presentation of data in ascending or descending order is a bit time-consuming. Hence, we can go for the method called ungrouped frequency distribution table or simply frequency distribution table . In this method, we can arrange the data in tabular form in terms of frequency.

For example, 3 students scored 50 marks. Hence, the frequency of 50 marks is 3. Now, let us construct the frequency distribution table for the given data.

Therefore, the presentation of data is given as below:

10

1

20

1

36

3

40

4

50

3

56

2

60

4

70

4

72

1

80

1

88

2

92

3

95

1

The following example shows the presentation of data for the larger number of observations in an experiment.

Consider the marks obtained by 100 students in a Mathematics subject (out of 100 marks)

95, 67, 28, 32, 65, 65, 69, 33, 98, 96,76, 42, 32, 38, 42, 40, 40, 69, 95, 92, 75, 83, 76, 83, 85, 62, 37, 65, 63, 42, 89, 65, 73, 81, 49, 52, 64, 76, 83, 92, 93, 68, 52, 79, 81, 83, 59, 82, 75, 82, 86, 90, 44, 62, 31, 36, 38, 42, 39, 83, 87, 56, 58, 23, 35, 76, 83, 85, 30, 68, 69, 83, 86, 43, 45, 39, 83, 75, 66, 83, 92, 75, 89, 66, 91, 27, 88, 89, 93, 42, 53, 69, 90, 55, 66, 49, 52, 83, 34, 36.

Now, we have 100 observations to present the data. In this case, we have more data when compared to example 1 and example 2. So, these data can be arranged in the tabular form called the grouped frequency table. Hence, we group the given data like 20-29, 30-39, 40-49, ….,90-99 (As our data is from 23 to 98). The grouping of data is called the “class interval” or “classes”, and the size of the class is called “class-size” or “class-width”.

In this case, the class size is 10. In each class, we have a lower-class limit and an upper-class limit. For example, if the class interval is 30-39, the lower-class limit is 30, and the upper-class limit is 39. Therefore, the least number in the class interval is called the lower-class limit and the greatest limit in the class interval is called upper-class limit.

Hence, the presentation of data in the grouped frequency table is given below:

20 – 29

3

30 – 39

14

40 – 49

12

50 – 59

8

60 – 69

18

70 – 79

10

80 – 89

23

90 – 99

12

Hence, the presentation of data in this form simplifies the data and it helps to enable the observer to understand the main feature of data at a glance.

Practice Problems

  • The heights of 50 students (in cms) are given below. Present the data using the grouped frequency table by taking the class intervals as 160 -165, 165 -170, and so on.  Data: 161, 150, 154, 165, 168, 161, 154, 162, 150, 151, 162, 164, 171, 165, 158, 154, 156, 172, 160, 170, 153, 159, 161, 170, 162, 165, 166, 168, 165, 164, 154, 152, 153, 156, 158, 162, 160, 161, 173, 166, 161, 159, 162, 167, 168, 159, 158, 153, 154, 159.
  • Three coins are tossed simultaneously and each time the number of heads occurring is noted and it is given below. Present the data using the frequency distribution table. Data: 0, 1, 2, 2, 1, 2, 3, 1, 3, 0, 1, 3, 1, 1, 2, 2, 0, 1, 2, 1, 3, 0, 0, 1, 1, 2, 3, 2, 2, 0.

To learn more Maths-related concepts, stay tuned with BYJU’S – The Learning App and download the app today!

MATHS Related Links

Leave a Comment Cancel reply

Your Mobile number and Email id will not be published. Required fields are marked *

Request OTP on Voice Call

Post My Comment

definition of tabular data presentation

Register with BYJU'S & Download Free PDFs

Register with byju's & watch live videos.

Call Us Today! +91 99907 48956 | [email protected]

definition of tabular data presentation

It is the simplest form of data Presentation often used in schools or universities to provide a clearer picture to students, who are better able to capture the concepts effectively through a pictorial Presentation of simple data.

2. Column chart

definition of tabular data presentation

It is a simplified version of the pictorial Presentation which involves the management of a larger amount of data being shared during the presentations and providing suitable clarity to the insights of the data.

3. Pie Charts

pie-chart

Pie charts provide a very descriptive & a 2D depiction of the data pertaining to comparisons or resemblance of data in two separate fields.

4. Bar charts

Bar-Charts

A bar chart that shows the accumulation of data with cuboid bars with different dimensions & lengths which are directly proportionate to the values they represent. The bars can be placed either vertically or horizontally depending on the data being represented.

5. Histograms

definition of tabular data presentation

It is a perfect Presentation of the spread of numerical data. The main differentiation that separates data graphs and histograms are the gaps in the data graphs.

6. Box plots

box-plot

Box plot or Box-plot is a way of representing groups of numerical data through quartiles. Data Presentation is easier with this style of graph dealing with the extraction of data to the minutes of difference.

definition of tabular data presentation

Map Data graphs help you with data Presentation over an area to display the areas of concern. Map graphs are useful to make an exact depiction of data over a vast case scenario.

All these visual presentations share a common goal of creating meaningful insights and a platform to understand and manage the data in relation to the growth and expansion of one’s in-depth understanding of data & details to plan or execute future decisions or actions.

Importance of Data Presentation

Data Presentation could be both can be a deal maker or deal breaker based on the delivery of the content in the context of visual depiction.

Data Presentation tools are powerful communication tools that can simplify the data by making it easily understandable & readable at the same time while attracting & keeping the interest of its readers and effectively showcase large amounts of complex data in a simplified manner.

If the user can create an insightful presentation of the data in hand with the same sets of facts and figures, then the results promise to be impressive.

There have been situations where the user has had a great amount of data and vision for expansion but the presentation drowned his/her vision.

To impress the higher management and top brass of a firm, effective presentation of data is needed.

Data Presentation helps the clients or the audience to not spend time grasping the concept and the future alternatives of the business and to convince them to invest in the company & turn it profitable both for the investors & the company.

Although data presentation has a lot to offer, the following are some of the major reason behind the essence of an effective presentation:-

  • Many consumers or higher authorities are interested in the interpretation of data, not the raw data itself. Therefore, after the analysis of the data, users should represent the data with a visual aspect for better understanding and knowledge.
  • The user should not overwhelm the audience with a number of slides of the presentation and inject an ample amount of texts as pictures that will speak for themselves.
  • Data presentation often happens in a nutshell with each department showcasing their achievements towards company growth through a graph or a histogram.
  • Providing a brief description would help the user to attain attention in a small amount of time while informing the audience about the context of the presentation
  • The inclusion of pictures, charts, graphs and tables in the presentation help for better understanding the potential outcomes.
  • An effective presentation would allow the organization to determine the difference with the fellow organization and acknowledge its flaws. Comparison of data would assist them in decision making.

Recommended Courses

Data-Visualization-Using-PowerBI-Tableau

Data Visualization

Using powerbi &tableau.

tableau-course

Tableau for Data Analysis

mysql-course

MySQL Certification Program

powerbi-course

The PowerBI Masterclass

Need help call our support team 7:00 am to 10:00 pm (ist) at (+91 999-074-8956 | 9650-308-956), keep in touch, email: [email protected].

WhatsApp us

Home Blog Design Understanding Data Presentations (Guide + Examples)

Understanding Data Presentations (Guide + Examples)

Cover for guide on data presentation by SlideModel

In this age of overwhelming information, the skill to effectively convey data has become extremely valuable. Initiating a discussion on data presentation types involves thoughtful consideration of the nature of your data and the message you aim to convey. Different types of visualizations serve distinct purposes. Whether you’re dealing with how to develop a report or simply trying to communicate complex information, how you present data influences how well your audience understands and engages with it. This extensive guide leads you through the different ways of data presentation.

Table of Contents

What is a Data Presentation?

What should a data presentation include, line graphs, treemap chart, scatter plot, how to choose a data presentation type, recommended data presentation templates, common mistakes done in data presentation.

A data presentation is a slide deck that aims to disclose quantitative information to an audience through the use of visual formats and narrative techniques derived from data analysis, making complex data understandable and actionable. This process requires a series of tools, such as charts, graphs, tables, infographics, dashboards, and so on, supported by concise textual explanations to improve understanding and boost retention rate.

Data presentations require us to cull data in a format that allows the presenter to highlight trends, patterns, and insights so that the audience can act upon the shared information. In a few words, the goal of data presentations is to enable viewers to grasp complicated concepts or trends quickly, facilitating informed decision-making or deeper analysis.

Data presentations go beyond the mere usage of graphical elements. Seasoned presenters encompass visuals with the art of data storytelling , so the speech skillfully connects the points through a narrative that resonates with the audience. Depending on the purpose – inspire, persuade, inform, support decision-making processes, etc. – is the data presentation format that is better suited to help us in this journey.

To nail your upcoming data presentation, ensure to count with the following elements:

  • Clear Objectives: Understand the intent of your presentation before selecting the graphical layout and metaphors to make content easier to grasp.
  • Engaging introduction: Use a powerful hook from the get-go. For instance, you can ask a big question or present a problem that your data will answer. Take a look at our guide on how to start a presentation for tips & insights.
  • Structured Narrative: Your data presentation must tell a coherent story. This means a beginning where you present the context, a middle section in which you present the data, and an ending that uses a call-to-action. Check our guide on presentation structure for further information.
  • Visual Elements: These are the charts, graphs, and other elements of visual communication we ought to use to present data. This article will cover one by one the different types of data representation methods we can use, and provide further guidance on choosing between them.
  • Insights and Analysis: This is not just showcasing a graph and letting people get an idea about it. A proper data presentation includes the interpretation of that data, the reason why it’s included, and why it matters to your research.
  • Conclusion & CTA: Ending your presentation with a call to action is necessary. Whether you intend to wow your audience into acquiring your services, inspire them to change the world, or whatever the purpose of your presentation, there must be a stage in which you convey all that you shared and show the path to staying in touch. Plan ahead whether you want to use a thank-you slide, a video presentation, or which method is apt and tailored to the kind of presentation you deliver.
  • Q&A Session: After your speech is concluded, allocate 3-5 minutes for the audience to raise any questions about the information you disclosed. This is an extra chance to establish your authority on the topic. Check our guide on questions and answer sessions in presentations here.

Bar charts are a graphical representation of data using rectangular bars to show quantities or frequencies in an established category. They make it easy for readers to spot patterns or trends. Bar charts can be horizontal or vertical, although the vertical format is commonly known as a column chart. They display categorical, discrete, or continuous variables grouped in class intervals [1] . They include an axis and a set of labeled bars horizontally or vertically. These bars represent the frequencies of variable values or the values themselves. Numbers on the y-axis of a vertical bar chart or the x-axis of a horizontal bar chart are called the scale.

Presentation of the data through bar charts

Real-Life Application of Bar Charts

Let’s say a sales manager is presenting sales to their audience. Using a bar chart, he follows these steps.

Step 1: Selecting Data

The first step is to identify the specific data you will present to your audience.

The sales manager has highlighted these products for the presentation.

  • Product A: Men’s Shoes
  • Product B: Women’s Apparel
  • Product C: Electronics
  • Product D: Home Decor

Step 2: Choosing Orientation

Opt for a vertical layout for simplicity. Vertical bar charts help compare different categories in case there are not too many categories [1] . They can also help show different trends. A vertical bar chart is used where each bar represents one of the four chosen products. After plotting the data, it is seen that the height of each bar directly represents the sales performance of the respective product.

It is visible that the tallest bar (Electronics – Product C) is showing the highest sales. However, the shorter bars (Women’s Apparel – Product B and Home Decor – Product D) need attention. It indicates areas that require further analysis or strategies for improvement.

Step 3: Colorful Insights

Different colors are used to differentiate each product. It is essential to show a color-coded chart where the audience can distinguish between products.

  • Men’s Shoes (Product A): Yellow
  • Women’s Apparel (Product B): Orange
  • Electronics (Product C): Violet
  • Home Decor (Product D): Blue

Accurate bar chart representation of data with a color coded legend

Bar charts are straightforward and easily understandable for presenting data. They are versatile when comparing products or any categorical data [2] . Bar charts adapt seamlessly to retail scenarios. Despite that, bar charts have a few shortcomings. They cannot illustrate data trends over time. Besides, overloading the chart with numerous products can lead to visual clutter, diminishing its effectiveness.

For more information, check our collection of bar chart templates for PowerPoint .

Line graphs help illustrate data trends, progressions, or fluctuations by connecting a series of data points called ‘markers’ with straight line segments. This provides a straightforward representation of how values change [5] . Their versatility makes them invaluable for scenarios requiring a visual understanding of continuous data. In addition, line graphs are also useful for comparing multiple datasets over the same timeline. Using multiple line graphs allows us to compare more than one data set. They simplify complex information so the audience can quickly grasp the ups and downs of values. From tracking stock prices to analyzing experimental results, you can use line graphs to show how data changes over a continuous timeline. They show trends with simplicity and clarity.

Real-life Application of Line Graphs

To understand line graphs thoroughly, we will use a real case. Imagine you’re a financial analyst presenting a tech company’s monthly sales for a licensed product over the past year. Investors want insights into sales behavior by month, how market trends may have influenced sales performance and reception to the new pricing strategy. To present data via a line graph, you will complete these steps.

First, you need to gather the data. In this case, your data will be the sales numbers. For example:

  • January: $45,000
  • February: $55,000
  • March: $45,000
  • April: $60,000
  • May: $ 70,000
  • June: $65,000
  • July: $62,000
  • August: $68,000
  • September: $81,000
  • October: $76,000
  • November: $87,000
  • December: $91,000

After choosing the data, the next step is to select the orientation. Like bar charts, you can use vertical or horizontal line graphs. However, we want to keep this simple, so we will keep the timeline (x-axis) horizontal while the sales numbers (y-axis) vertical.

Step 3: Connecting Trends

After adding the data to your preferred software, you will plot a line graph. In the graph, each month’s sales are represented by data points connected by a line.

Line graph in data presentation

Step 4: Adding Clarity with Color

If there are multiple lines, you can also add colors to highlight each one, making it easier to follow.

Line graphs excel at visually presenting trends over time. These presentation aids identify patterns, like upward or downward trends. However, too many data points can clutter the graph, making it harder to interpret. Line graphs work best with continuous data but are not suitable for categories.

For more information, check our collection of line chart templates for PowerPoint and our article about how to make a presentation graph .

A data dashboard is a visual tool for analyzing information. Different graphs, charts, and tables are consolidated in a layout to showcase the information required to achieve one or more objectives. Dashboards help quickly see Key Performance Indicators (KPIs). You don’t make new visuals in the dashboard; instead, you use it to display visuals you’ve already made in worksheets [3] .

Keeping the number of visuals on a dashboard to three or four is recommended. Adding too many can make it hard to see the main points [4]. Dashboards can be used for business analytics to analyze sales, revenue, and marketing metrics at a time. They are also used in the manufacturing industry, as they allow users to grasp the entire production scenario at the moment while tracking the core KPIs for each line.

Real-Life Application of a Dashboard

Consider a project manager presenting a software development project’s progress to a tech company’s leadership team. He follows the following steps.

Step 1: Defining Key Metrics

To effectively communicate the project’s status, identify key metrics such as completion status, budget, and bug resolution rates. Then, choose measurable metrics aligned with project objectives.

Step 2: Choosing Visualization Widgets

After finalizing the data, presentation aids that align with each metric are selected. For this project, the project manager chooses a progress bar for the completion status and uses bar charts for budget allocation. Likewise, he implements line charts for bug resolution rates.

Data analysis presentation example

Step 3: Dashboard Layout

Key metrics are prominently placed in the dashboard for easy visibility, and the manager ensures that it appears clean and organized.

Dashboards provide a comprehensive view of key project metrics. Users can interact with data, customize views, and drill down for detailed analysis. However, creating an effective dashboard requires careful planning to avoid clutter. Besides, dashboards rely on the availability and accuracy of underlying data sources.

For more information, check our article on how to design a dashboard presentation , and discover our collection of dashboard PowerPoint templates .

Treemap charts represent hierarchical data structured in a series of nested rectangles [6] . As each branch of the ‘tree’ is given a rectangle, smaller tiles can be seen representing sub-branches, meaning elements on a lower hierarchical level than the parent rectangle. Each one of those rectangular nodes is built by representing an area proportional to the specified data dimension.

Treemaps are useful for visualizing large datasets in compact space. It is easy to identify patterns, such as which categories are dominant. Common applications of the treemap chart are seen in the IT industry, such as resource allocation, disk space management, website analytics, etc. Also, they can be used in multiple industries like healthcare data analysis, market share across different product categories, or even in finance to visualize portfolios.

Real-Life Application of a Treemap Chart

Let’s consider a financial scenario where a financial team wants to represent the budget allocation of a company. There is a hierarchy in the process, so it is helpful to use a treemap chart. In the chart, the top-level rectangle could represent the total budget, and it would be subdivided into smaller rectangles, each denoting a specific department. Further subdivisions within these smaller rectangles might represent individual projects or cost categories.

Step 1: Define Your Data Hierarchy

While presenting data on the budget allocation, start by outlining the hierarchical structure. The sequence will be like the overall budget at the top, followed by departments, projects within each department, and finally, individual cost categories for each project.

  • Top-level rectangle: Total Budget
  • Second-level rectangles: Departments (Engineering, Marketing, Sales)
  • Third-level rectangles: Projects within each department
  • Fourth-level rectangles: Cost categories for each project (Personnel, Marketing Expenses, Equipment)

Step 2: Choose a Suitable Tool

It’s time to select a data visualization tool supporting Treemaps. Popular choices include Tableau, Microsoft Power BI, PowerPoint, or even coding with libraries like D3.js. It is vital to ensure that the chosen tool provides customization options for colors, labels, and hierarchical structures.

Here, the team uses PowerPoint for this guide because of its user-friendly interface and robust Treemap capabilities.

Step 3: Make a Treemap Chart with PowerPoint

After opening the PowerPoint presentation, they chose “SmartArt” to form the chart. The SmartArt Graphic window has a “Hierarchy” category on the left.  Here, you will see multiple options. You can choose any layout that resembles a Treemap. The “Table Hierarchy” or “Organization Chart” options can be adapted. The team selects the Table Hierarchy as it looks close to a Treemap.

Step 5: Input Your Data

After that, a new window will open with a basic structure. They add the data one by one by clicking on the text boxes. They start with the top-level rectangle, representing the total budget.  

Treemap used for presenting data

Step 6: Customize the Treemap

By clicking on each shape, they customize its color, size, and label. At the same time, they can adjust the font size, style, and color of labels by using the options in the “Format” tab in PowerPoint. Using different colors for each level enhances the visual difference.

Treemaps excel at illustrating hierarchical structures. These charts make it easy to understand relationships and dependencies. They efficiently use space, compactly displaying a large amount of data, reducing the need for excessive scrolling or navigation. Additionally, using colors enhances the understanding of data by representing different variables or categories.

In some cases, treemaps might become complex, especially with deep hierarchies.  It becomes challenging for some users to interpret the chart. At the same time, displaying detailed information within each rectangle might be constrained by space. It potentially limits the amount of data that can be shown clearly. Without proper labeling and color coding, there’s a risk of misinterpretation.

A heatmap is a data visualization tool that uses color coding to represent values across a two-dimensional surface. In these, colors replace numbers to indicate the magnitude of each cell. This color-shaded matrix display is valuable for summarizing and understanding data sets with a glance [7] . The intensity of the color corresponds to the value it represents, making it easy to identify patterns, trends, and variations in the data.

As a tool, heatmaps help businesses analyze website interactions, revealing user behavior patterns and preferences to enhance overall user experience. In addition, companies use heatmaps to assess content engagement, identifying popular sections and areas of improvement for more effective communication. They excel at highlighting patterns and trends in large datasets, making it easy to identify areas of interest.

We can implement heatmaps to express multiple data types, such as numerical values, percentages, or even categorical data. Heatmaps help us easily spot areas with lots of activity, making them helpful in figuring out clusters [8] . When making these maps, it is important to pick colors carefully. The colors need to show the differences between groups or levels of something. And it is good to use colors that people with colorblindness can easily see.

Check our detailed guide on how to create a heatmap here. Also discover our collection of heatmap PowerPoint templates .

Pie charts are circular statistical graphics divided into slices to illustrate numerical proportions. Each slice represents a proportionate part of the whole, making it easy to visualize the contribution of each component to the total.

The size of the pie charts is influenced by the value of data points within each pie. The total of all data points in a pie determines its size. The pie with the highest data points appears as the largest, whereas the others are proportionally smaller. However, you can present all pies of the same size if proportional representation is not required [9] . Sometimes, pie charts are difficult to read, or additional information is required. A variation of this tool can be used instead, known as the donut chart , which has the same structure but a blank center, creating a ring shape. Presenters can add extra information, and the ring shape helps to declutter the graph.

Pie charts are used in business to show percentage distribution, compare relative sizes of categories, or present straightforward data sets where visualizing ratios is essential.

Real-Life Application of Pie Charts

Consider a scenario where you want to represent the distribution of the data. Each slice of the pie chart would represent a different category, and the size of each slice would indicate the percentage of the total portion allocated to that category.

Step 1: Define Your Data Structure

Imagine you are presenting the distribution of a project budget among different expense categories.

  • Column A: Expense Categories (Personnel, Equipment, Marketing, Miscellaneous)
  • Column B: Budget Amounts ($40,000, $30,000, $20,000, $10,000) Column B represents the values of your categories in Column A.

Step 2: Insert a Pie Chart

Using any of the accessible tools, you can create a pie chart. The most convenient tools for forming a pie chart in a presentation are presentation tools such as PowerPoint or Google Slides.  You will notice that the pie chart assigns each expense category a percentage of the total budget by dividing it by the total budget.

For instance:

  • Personnel: $40,000 / ($40,000 + $30,000 + $20,000 + $10,000) = 40%
  • Equipment: $30,000 / ($40,000 + $30,000 + $20,000 + $10,000) = 30%
  • Marketing: $20,000 / ($40,000 + $30,000 + $20,000 + $10,000) = 20%
  • Miscellaneous: $10,000 / ($40,000 + $30,000 + $20,000 + $10,000) = 10%

You can make a chart out of this or just pull out the pie chart from the data.

Pie chart template in data presentation

3D pie charts and 3D donut charts are quite popular among the audience. They stand out as visual elements in any presentation slide, so let’s take a look at how our pie chart example would look in 3D pie chart format.

3D pie chart in data presentation

Step 03: Results Interpretation

The pie chart visually illustrates the distribution of the project budget among different expense categories. Personnel constitutes the largest portion at 40%, followed by equipment at 30%, marketing at 20%, and miscellaneous at 10%. This breakdown provides a clear overview of where the project funds are allocated, which helps in informed decision-making and resource management. It is evident that personnel are a significant investment, emphasizing their importance in the overall project budget.

Pie charts provide a straightforward way to represent proportions and percentages. They are easy to understand, even for individuals with limited data analysis experience. These charts work well for small datasets with a limited number of categories.

However, a pie chart can become cluttered and less effective in situations with many categories. Accurate interpretation may be challenging, especially when dealing with slight differences in slice sizes. In addition, these charts are static and do not effectively convey trends over time.

For more information, check our collection of pie chart templates for PowerPoint .

Histograms present the distribution of numerical variables. Unlike a bar chart that records each unique response separately, histograms organize numeric responses into bins and show the frequency of reactions within each bin [10] . The x-axis of a histogram shows the range of values for a numeric variable. At the same time, the y-axis indicates the relative frequencies (percentage of the total counts) for that range of values.

Whenever you want to understand the distribution of your data, check which values are more common, or identify outliers, histograms are your go-to. Think of them as a spotlight on the story your data is telling. A histogram can provide a quick and insightful overview if you’re curious about exam scores, sales figures, or any numerical data distribution.

Real-Life Application of a Histogram

In the histogram data analysis presentation example, imagine an instructor analyzing a class’s grades to identify the most common score range. A histogram could effectively display the distribution. It will show whether most students scored in the average range or if there are significant outliers.

Step 1: Gather Data

He begins by gathering the data. The scores of each student in class are gathered to analyze exam scores.

NamesScore
Alice78
Bob85
Clara92
David65
Emma72
Frank88
Grace76
Henry95
Isabel81
Jack70
Kate60
Liam89
Mia75
Noah84
Olivia92

After arranging the scores in ascending order, bin ranges are set.

Step 2: Define Bins

Bins are like categories that group similar values. Think of them as buckets that organize your data. The presenter decides how wide each bin should be based on the range of the values. For instance, the instructor sets the bin ranges based on score intervals: 60-69, 70-79, 80-89, and 90-100.

Step 3: Count Frequency

Now, he counts how many data points fall into each bin. This step is crucial because it tells you how often specific ranges of values occur. The result is the frequency distribution, showing the occurrences of each group.

Here, the instructor counts the number of students in each category.

  • 60-69: 1 student (Kate)
  • 70-79: 4 students (David, Emma, Grace, Jack)
  • 80-89: 7 students (Alice, Bob, Frank, Isabel, Liam, Mia, Noah)
  • 90-100: 3 students (Clara, Henry, Olivia)

Step 4: Create the Histogram

It’s time to turn the data into a visual representation. Draw a bar for each bin on a graph. The width of the bar should correspond to the range of the bin, and the height should correspond to the frequency.  To make your histogram understandable, label the X and Y axes.

In this case, the X-axis should represent the bins (e.g., test score ranges), and the Y-axis represents the frequency.

Histogram in Data Presentation

The histogram of the class grades reveals insightful patterns in the distribution. Most students, with seven students, fall within the 80-89 score range. The histogram provides a clear visualization of the class’s performance. It showcases a concentration of grades in the upper-middle range with few outliers at both ends. This analysis helps in understanding the overall academic standing of the class. It also identifies the areas for potential improvement or recognition.

Thus, histograms provide a clear visual representation of data distribution. They are easy to interpret, even for those without a statistical background. They apply to various types of data, including continuous and discrete variables. One weak point is that histograms do not capture detailed patterns in students’ data, with seven compared to other visualization methods.

A scatter plot is a graphical representation of the relationship between two variables. It consists of individual data points on a two-dimensional plane. This plane plots one variable on the x-axis and the other on the y-axis. Each point represents a unique observation. It visualizes patterns, trends, or correlations between the two variables.

Scatter plots are also effective in revealing the strength and direction of relationships. They identify outliers and assess the overall distribution of data points. The points’ dispersion and clustering reflect the relationship’s nature, whether it is positive, negative, or lacks a discernible pattern. In business, scatter plots assess relationships between variables such as marketing cost and sales revenue. They help present data correlations and decision-making.

Real-Life Application of Scatter Plot

A group of scientists is conducting a study on the relationship between daily hours of screen time and sleep quality. After reviewing the data, they managed to create this table to help them build a scatter plot graph:

Participant IDDaily Hours of Screen TimeSleep Quality Rating
193
228
319
4010
519
637
747
856
956
1073
11101
1265
1373
1482
1592
1647
1756
1847
1992
2064
2137
22101
2328
2456
2537
2619
2782
2846
2973
3028
3174
3292
33101
34101
35101

In the provided example, the x-axis represents Daily Hours of Screen Time, and the y-axis represents the Sleep Quality Rating.

Scatter plot in data presentation

The scientists observe a negative correlation between the amount of screen time and the quality of sleep. This is consistent with their hypothesis that blue light, especially before bedtime, has a significant impact on sleep quality and metabolic processes.

There are a few things to remember when using a scatter plot. Even when a scatter diagram indicates a relationship, it doesn’t mean one variable affects the other. A third factor can influence both variables. The more the plot resembles a straight line, the stronger the relationship is perceived [11] . If it suggests no ties, the observed pattern might be due to random fluctuations in data. When the scatter diagram depicts no correlation, whether the data might be stratified is worth considering.

Choosing the appropriate data presentation type is crucial when making a presentation . Understanding the nature of your data and the message you intend to convey will guide this selection process. For instance, when showcasing quantitative relationships, scatter plots become instrumental in revealing correlations between variables. If the focus is on emphasizing parts of a whole, pie charts offer a concise display of proportions. Histograms, on the other hand, prove valuable for illustrating distributions and frequency patterns. 

Bar charts provide a clear visual comparison of different categories. Likewise, line charts excel in showcasing trends over time, while tables are ideal for detailed data examination. Starting a presentation on data presentation types involves evaluating the specific information you want to communicate and selecting the format that aligns with your message. This ensures clarity and resonance with your audience from the beginning of your presentation.

1. Fact Sheet Dashboard for Data Presentation

definition of tabular data presentation

Convey all the data you need to present in this one-pager format, an ideal solution tailored for users looking for presentation aids. Global maps, donut chats, column graphs, and text neatly arranged in a clean layout presented in light and dark themes.

Use This Template

2. 3D Column Chart Infographic PPT Template

definition of tabular data presentation

Represent column charts in a highly visual 3D format with this PPT template. A creative way to present data, this template is entirely editable, and we can craft either a one-page infographic or a series of slides explaining what we intend to disclose point by point.

3. Data Circles Infographic PowerPoint Template

definition of tabular data presentation

An alternative to the pie chart and donut chart diagrams, this template features a series of curved shapes with bubble callouts as ways of presenting data. Expand the information for each arch in the text placeholder areas.

4. Colorful Metrics Dashboard for Data Presentation

definition of tabular data presentation

This versatile dashboard template helps us in the presentation of the data by offering several graphs and methods to convert numbers into graphics. Implement it for e-commerce projects, financial projections, project development, and more.

5. Animated Data Presentation Tools for PowerPoint & Google Slides

Canvas Shape Tree Diagram Template

A slide deck filled with most of the tools mentioned in this article, from bar charts, column charts, treemap graphs, pie charts, histogram, etc. Animated effects make each slide look dynamic when sharing data with stakeholders.

6. Statistics Waffle Charts PPT Template for Data Presentations

definition of tabular data presentation

This PPT template helps us how to present data beyond the typical pie chart representation. It is widely used for demographics, so it’s a great fit for marketing teams, data science professionals, HR personnel, and more.

7. Data Presentation Dashboard Template for Google Slides

definition of tabular data presentation

A compendium of tools in dashboard format featuring line graphs, bar charts, column charts, and neatly arranged placeholder text areas. 

8. Weather Dashboard for Data Presentation

definition of tabular data presentation

Share weather data for agricultural presentation topics, environmental studies, or any kind of presentation that requires a highly visual layout for weather forecasting on a single day. Two color themes are available.

9. Social Media Marketing Dashboard Data Presentation Template

definition of tabular data presentation

Intended for marketing professionals, this dashboard template for data presentation is a tool for presenting data analytics from social media channels. Two slide layouts featuring line graphs and column charts.

10. Project Management Summary Dashboard Template

definition of tabular data presentation

A tool crafted for project managers to deliver highly visual reports on a project’s completion, the profits it delivered for the company, and expenses/time required to execute it. 4 different color layouts are available.

11. Profit & Loss Dashboard for PowerPoint and Google Slides

definition of tabular data presentation

A must-have for finance professionals. This typical profit & loss dashboard includes progress bars, donut charts, column charts, line graphs, and everything that’s required to deliver a comprehensive report about a company’s financial situation.

Overwhelming visuals

One of the mistakes related to using data-presenting methods is including too much data or using overly complex visualizations. They can confuse the audience and dilute the key message.

Inappropriate chart types

Choosing the wrong type of chart for the data at hand can lead to misinterpretation. For example, using a pie chart for data that doesn’t represent parts of a whole is not right.

Lack of context

Failing to provide context or sufficient labeling can make it challenging for the audience to understand the significance of the presented data.

Inconsistency in design

Using inconsistent design elements and color schemes across different visualizations can create confusion and visual disarray.

Failure to provide details

Simply presenting raw data without offering clear insights or takeaways can leave the audience without a meaningful conclusion.

Lack of focus

Not having a clear focus on the key message or main takeaway can result in a presentation that lacks a central theme.

Visual accessibility issues

Overlooking the visual accessibility of charts and graphs can exclude certain audience members who may have difficulty interpreting visual information.

In order to avoid these mistakes in data presentation, presenters can benefit from using presentation templates . These templates provide a structured framework. They ensure consistency, clarity, and an aesthetically pleasing design, enhancing data communication’s overall impact.

Understanding and choosing data presentation types are pivotal in effective communication. Each method serves a unique purpose, so selecting the appropriate one depends on the nature of the data and the message to be conveyed. The diverse array of presentation types offers versatility in visually representing information, from bar charts showing values to pie charts illustrating proportions. 

Using the proper method enhances clarity, engages the audience, and ensures that data sets are not just presented but comprehensively understood. By appreciating the strengths and limitations of different presentation types, communicators can tailor their approach to convey information accurately, developing a deeper connection between data and audience understanding.

[1] Government of Canada, S.C. (2021) 5 Data Visualization 5.2 Bar Chart , 5.2 Bar chart .  https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch9/bargraph-diagrammeabarres/5214818-eng.htm

[2] Kosslyn, S.M., 1989. Understanding charts and graphs. Applied cognitive psychology, 3(3), pp.185-225. https://apps.dtic.mil/sti/pdfs/ADA183409.pdf

[3] Creating a Dashboard . https://it.tufts.edu/book/export/html/1870

[4] https://www.goldenwestcollege.edu/research/data-and-more/data-dashboards/index.html

[5] https://www.mit.edu/course/21/21.guide/grf-line.htm

[6] Jadeja, M. and Shah, K., 2015, January. Tree-Map: A Visualization Tool for Large Data. In GSB@ SIGIR (pp. 9-13). https://ceur-ws.org/Vol-1393/gsb15proceedings.pdf#page=15

[7] Heat Maps and Quilt Plots. https://www.publichealth.columbia.edu/research/population-health-methods/heat-maps-and-quilt-plots

[8] EIU QGIS WORKSHOP. https://www.eiu.edu/qgisworkshop/heatmaps.php

[9] About Pie Charts.  https://www.mit.edu/~mbarker/formula1/f1help/11-ch-c8.htm

[10] Histograms. https://sites.utexas.edu/sos/guided/descriptive/numericaldd/descriptiven2/histogram/ [11] https://asq.org/quality-resources/scatter-diagram

Like this article? Please share

Data Analysis, Data Science, Data Visualization Filed under Design

Related Articles

How To Make a Graph on Google Slides

Filed under Google Slides Tutorials • June 3rd, 2024

How To Make a Graph on Google Slides

Creating quality graphics is an essential aspect of designing data presentations. Learn how to make a graph in Google Slides with this guide.

How to Make a Presentation Graph

Filed under Design • March 27th, 2024

How to Make a Presentation Graph

Detailed step-by-step instructions to master the art of how to make a presentation graph in PowerPoint and Google Slides. Check it out!

Turning Your Data into Eye-opening Stories

Filed under Presentation Ideas • February 12th, 2024

Turning Your Data into Eye-opening Stories

What is Data Storytelling is a question that people are constantly asking now. If you seek to understand how to create a data storytelling ppt that will complete the information for your audience, you should read this blog post.

Leave a Reply

definition of tabular data presentation

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Korean J Anesthesiol
  • v.70(3); 2017 Jun

Statistical data presentation

1 Department of Anesthesiology and Pain Medicine, Dongguk University Ilsan Hospital, Goyang, Korea.

Sangseok Lee

2 Department of Anesthesiology and Pain Medicine, Sanggye Paik Hospital, Inje University College of Medicine, Seoul, Korea.

Data are usually collected in a raw format and thus the inherent information is difficult to understand. Therefore, raw data need to be summarized, processed, and analyzed. However, no matter how well manipulated, the information derived from the raw data should be presented in an effective format, otherwise, it would be a great loss for both authors and readers. In this article, the techniques of data and information presentation in textual, tabular, and graphical forms are introduced. Text is the principal method for explaining findings, outlining trends, and providing contextual information. A table is best suited for representing individual information and represents both quantitative and qualitative information. A graph is a very effective visual tool as it displays data at a glance, facilitates comparison, and can reveal trends and relationships within the data such as changes over time, frequency distribution, and correlation or relative share of a whole. Text, tables, and graphs for data and information presentation are very powerful communication tools. They can make an article easy to understand, attract and sustain the interest of readers, and efficiently present large amounts of complex information. Moreover, as journal editors and reviewers glance at these presentations before reading the whole article, their importance cannot be ignored.

Introduction

Data are a set of facts, and provide a partial picture of reality. Whether data are being collected with a certain purpose or collected data are being utilized, questions regarding what information the data are conveying, how the data can be used, and what must be done to include more useful information must constantly be kept in mind.

Since most data are available to researchers in a raw format, they must be summarized, organized, and analyzed to usefully derive information from them. Furthermore, each data set needs to be presented in a certain way depending on what it is used for. Planning how the data will be presented is essential before appropriately processing raw data.

First, a question for which an answer is desired must be clearly defined. The more detailed the question is, the more detailed and clearer the results are. A broad question results in vague answers and results that are hard to interpret. In other words, a well-defined question is crucial for the data to be well-understood later. Once a detailed question is ready, the raw data must be prepared before processing. These days, data are often summarized, organized, and analyzed with statistical packages or graphics software. Data must be prepared in such a way they are properly recognized by the program being used. The present study does not discuss this data preparation process, which involves creating a data frame, creating/changing rows and columns, changing the level of a factor, categorical variable, coding, dummy variables, variable transformation, data transformation, missing value, outlier treatment, and noise removal.

We describe the roles and appropriate use of text, tables, and graphs (graphs, plots, or charts), all of which are commonly used in reports, articles, posters, and presentations. Furthermore, we discuss the issues that must be addressed when presenting various kinds of information, and effective methods of presenting data, which are the end products of research, and of emphasizing specific information.

Data Presentation

Data can be presented in one of the three ways:

–as text;

–in tabular form; or

–in graphical form.

Methods of presentation must be determined according to the data format, the method of analysis to be used, and the information to be emphasized. Inappropriately presented data fail to clearly convey information to readers and reviewers. Even when the same information is being conveyed, different methods of presentation must be employed depending on what specific information is going to be emphasized. A method of presentation must be chosen after carefully weighing the advantages and disadvantages of different methods of presentation. For easy comparison of different methods of presentation, let us look at a table ( Table 1 ) and a line graph ( Fig. 1 ) that present the same information [ 1 ]. If one wishes to compare or introduce two values at a certain time point, it is appropriate to use text or the written language. However, a table is the most appropriate when all information requires equal attention, and it allows readers to selectively look at information of their own interest. Graphs allow readers to understand the overall trend in data, and intuitively understand the comparison results between two groups. One thing to always bear in mind regardless of what method is used, however, is the simplicity of presentation.

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g001.jpg

VariableGroupBaselineAfter drug1 min3 min5 min
SBPC135.1 ± 13.4139.2 ± 17.1186.0 ± 26.6 160.1 ± 23.2 140.7 ± 18.3
D135.4 ± 23.8131.9 ± 13.5165.2 ± 16.2 127.9 ± 17.5 108.4 ± 12.6
DBPC79.7 ± 9.879.4 ± 15.8104.8 ± 14.9 87.9 ± 15.5 78.9 ± 11.6
D76.7 ± 8.378.4 ± 6.397.0 ± 14.5 74.1 ± 8.3 66.5 ± 7.2
MBPC100.3 ± 11.9103.5 ± 16.8137.2 ± 18.3 116.9 ± 16.2 103.9 ± 13.3
D97.7 ± 14.998.1 ± 8.7123.4 ± 13.8 95.4 ± 11.7 83.4 ± 8.4

Values are expressed as mean ± SD. Group C: normal saline, Group D: dexmedetomidine. SBP: systolic blood pressure, DBP: diastolic blood pressure, MBP: mean blood pressure, HR: heart rate. * P < 0.05 indicates a significant increase in each group, compared with the baseline values. † P < 0.05 indicates a significant decrease noted in Group D, compared with the baseline values. ‡ P < 0.05 indicates a significant difference between the groups.

Text presentation

Text is the main method of conveying information as it is used to explain results and trends, and provide contextual information. Data are fundamentally presented in paragraphs or sentences. Text can be used to provide interpretation or emphasize certain data. If quantitative information to be conveyed consists of one or two numbers, it is more appropriate to use written language than tables or graphs. For instance, information about the incidence rates of delirium following anesthesia in 2016–2017 can be presented with the use of a few numbers: “The incidence rate of delirium following anesthesia was 11% in 2016 and 15% in 2017; no significant difference of incidence rates was found between the two years.” If this information were to be presented in a graph or a table, it would occupy an unnecessarily large space on the page, without enhancing the readers' understanding of the data. If more data are to be presented, or other information such as that regarding data trends are to be conveyed, a table or a graph would be more appropriate. By nature, data take longer to read when presented as texts and when the main text includes a long list of information, readers and reviewers may have difficulties in understanding the information.

Table presentation

Tables, which convey information that has been converted into words or numbers in rows and columns, have been used for nearly 2,000 years. Anyone with a sufficient level of literacy can easily understand the information presented in a table. Tables are the most appropriate for presenting individual information, and can present both quantitative and qualitative information. Examples of qualitative information are the level of sedation [ 2 ], statistical methods/functions [ 3 , 4 ], and intubation conditions [ 5 ].

The strength of tables is that they can accurately present information that cannot be presented with a graph. A number such as “132.145852” can be accurately expressed in a table. Another strength is that information with different units can be presented together. For instance, blood pressure, heart rate, number of drugs administered, and anesthesia time can be presented together in one table. Finally, tables are useful for summarizing and comparing quantitative information of different variables. However, the interpretation of information takes longer in tables than in graphs, and tables are not appropriate for studying data trends. Furthermore, since all data are of equal importance in a table, it is not easy to identify and selectively choose the information required.

For a general guideline for creating tables, refer to the journal submission requirements 1) .

Heat maps for better visualization of information than tables

Heat maps help to further visualize the information presented in a table by applying colors to the background of cells. By adjusting the colors or color saturation, information is conveyed in a more visible manner, and readers can quickly identify the information of interest ( Table 2 ). Software such as Excel (in Microsoft Office, Microsoft, WA, USA) have features that enable easy creation of heat maps through the options available on the “conditional formatting” menu.

Example of a regular tableExample of a heat map
SBPDBPMBPHRSBPDBPMBPHR
128668787128668787
125437085125437085
11452681031145268103
111446679111446679
139618190139618190
103446196103446196
9447618394476183

All numbers were created by the author. SBP: systolic blood pressure, DBP: diastolic blood pressure, MBP: mean blood pressure, HR: heart rate.

Graph presentation

Whereas tables can be used for presenting all the information, graphs simplify complex information by using images and emphasizing data patterns or trends, and are useful for summarizing, explaining, or exploring quantitative data. While graphs are effective for presenting large amounts of data, they can be used in place of tables to present small sets of data. A graph format that best presents information must be chosen so that readers and reviewers can easily understand the information. In the following, we describe frequently used graph formats and the types of data that are appropriately presented with each format with examples.

Scatter plot

Scatter plots present data on the x - and y -axes and are used to investigate an association between two variables. A point represents each individual or object, and an association between two variables can be studied by analyzing patterns across multiple points. A regression line is added to a graph to determine whether the association between two variables can be explained or not. Fig. 2 illustrates correlations between pain scoring systems that are currently used (PSQ, Pain Sensitivity Questionnaire; PASS, Pain Anxiety Symptoms Scale; PCS, Pain Catastrophizing Scale) and Geop-Pain Questionnaire (GPQ) with the correlation coefficient, R, and regression line indicated on the scatter plot [ 6 ]. If multiple points exist at an identical location as in this example ( Fig. 2 ), the correlation level may not be clear. In this case, a correlation coefficient or regression line can be added to further elucidate the correlation.

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g002.jpg

Bar graph and histogram

A bar graph is used to indicate and compare values in a discrete category or group, and the frequency or other measurement parameters (i.e. mean). Depending on the number of categories, and the size or complexity of each category, bars may be created vertically or horizontally. The height (or length) of a bar represents the amount of information in a category. Bar graphs are flexible, and can be used in a grouped or subdivided bar format in cases of two or more data sets in each category. Fig. 3 is a representative example of a vertical bar graph, with the x -axis representing the length of recovery room stay and drug-treated group, and the y -axis representing the visual analog scale (VAS) score. The mean and standard deviation of the VAS scores are expressed as whiskers on the bars ( Fig. 3 ) [ 7 ].

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g003.jpg

By comparing the endpoints of bars, one can identify the largest and the smallest categories, and understand gradual differences between each category. It is advised to start the x - and y -axes from 0. Illustration of comparison results in the x - and y -axes that do not start from 0 can deceive readers' eyes and lead to overrepresentation of the results.

One form of vertical bar graph is the stacked vertical bar graph. A stack vertical bar graph is used to compare the sum of each category, and analyze parts of a category. While stacked vertical bar graphs are excellent from the aspect of visualization, they do not have a reference line, making comparison of parts of various categories challenging ( Fig. 4 ) [ 8 ].

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g004.jpg

A pie chart, which is used to represent nominal data (in other words, data classified in different categories), visually represents a distribution of categories. It is generally the most appropriate format for representing information grouped into a small number of categories. It is also used for data that have no other way of being represented aside from a table (i.e. frequency table). Fig. 5 illustrates the distribution of regular waste from operation rooms by their weight [ 8 ]. A pie chart is also commonly used to illustrate the number of votes each candidate won in an election.

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g005.jpg

Line plot with whiskers

A line plot is useful for representing time-series data such as monthly precipitation and yearly unemployment rates; in other words, it is used to study variables that are observed over time. Line graphs are especially useful for studying patterns and trends across data that include climatic influence, large changes or turning points, and are also appropriate for representing not only time-series data, but also data measured over the progression of a continuous variable such as distance. As can be seen in Fig. 1 , mean and standard deviation of systolic blood pressure are indicated for each time point, which enables readers to easily understand changes of systolic pressure over time [ 1 ]. If data are collected at a regular interval, values in between the measurements can be estimated. In a line graph, the x-axis represents the continuous variable, while the y-axis represents the scale and measurement values. It is also useful to represent multiple data sets on a single line graph to compare and analyze patterns across different data sets.

Box and whisker chart

A box and whisker chart does not make any assumptions about the underlying statistical distribution, and represents variations in samples of a population; therefore, it is appropriate for representing nonparametric data. AA box and whisker chart consists of boxes that represent interquartile range (one to three), the median and the mean of the data, and whiskers presented as lines outside of the boxes. Whiskers can be used to present the largest and smallest values in a set of data or only a part of the data (i.e. 95% of all the data). Data that are excluded from the data set are presented as individual points and are called outliers. The spacing at both ends of the box indicates dispersion in the data. The relative location of the median demonstrated within the box indicates skewness ( Fig. 6 ). The box and whisker chart provided as an example represents calculated volumes of an anesthetic, desflurane, consumed over the course of the observation period ( Fig. 7 ) [ 9 ].

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g006.jpg

Three-dimensional effects

Most of the recently introduced statistical packages and graphics software have the three-dimensional (3D) effect feature. The 3D effects can add depth and perspective to a graph. However, since they may make reading and interpreting data more difficult, they must only be used after careful consideration. The application of 3D effects on a pie chart makes distinguishing the size of each slice difficult. Even if slices are of similar sizes, slices farther from the front of the pie chart may appear smaller than the slices closer to the front ( Fig. 8 ).

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g008.jpg

Drawing a graph: example

Finally, we explain how to create a graph by using a line graph as an example ( Fig. 9 ). In Fig. 9 , the mean values of arterial pressure were randomly produced and assumed to have been measured on an hourly basis. In many graphs, the x- and y-axes meet at the zero point ( Fig. 9A ). In this case, information regarding the mean and standard deviation of mean arterial pressure measurements corresponding to t = 0 cannot be conveyed as the values overlap with the y-axis. The data can be clearly exposed by separating the zero point ( Fig. 9B ). In Fig. 9B , the mean and standard deviation of different groups overlap and cannot be clearly distinguished from each other. Separating the data sets and presenting standard deviations in a single direction prevents overlapping and, therefore, reduces the visual inconvenience. Doing so also reduces the excessive number of ticks on the y-axis, increasing the legibility of the graph ( Fig. 9C ). In the last graph, different shapes were used for the lines connecting different time points to further allow the data to be distinguished, and the y-axis was shortened to get rid of the unnecessary empty space present in the previous graphs ( Fig. 9D ). A graph can be made easier to interpret by assigning each group to a different color, changing the shape of a point, or including graphs of different formats [ 10 ]. The use of random settings for the scale in a graph may lead to inappropriate presentation or presentation of data that can deceive readers' eyes ( Fig. 10 ).

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g009.jpg

Owing to the lack of space, we could not discuss all types of graphs, but have focused on describing graphs that are frequently used in scholarly articles. We have summarized the commonly used types of graphs according to the method of data analysis in Table 3 . For general guidelines on graph designs, please refer to the journal submission requirements 2) .

AnalysisSubgroupNumber of variablesType
ComparisonAmong itemsTwo per itemsVariable width column chart
One per itemBar/column chart
Over timeMany periodsCircular area/line chart
Few periodsColumn/line chart
RelationshipTwoScatter chart
ThreeBubble chart
DistributionSingleColumn/line histogram
TwoScatter chart
ThreeThree-dimensional area chart
ComparisonChanging over timeOnly relative differences matterStacked 100% column chart
Relative and absolute differences matterStacked column chart
StaticSimple share of totalPie chart
AccumulationWaterfall chart
Components of componentsStacked 100% column chart with subcomponents

Conclusions

Text, tables, and graphs are effective communication media that present and convey data and information. They aid readers in understanding the content of research, sustain their interest, and effectively present large quantities of complex information. As journal editors and reviewers will scan through these presentations before reading the entire text, their importance cannot be disregarded. For this reason, authors must pay as close attention to selecting appropriate methods of data presentation as when they were collecting data of good quality and analyzing them. In addition, having a well-established understanding of different methods of data presentation and their appropriate use will enable one to develop the ability to recognize and interpret inappropriately presented data or data presented in such a way that it deceives readers' eyes [ 11 ].

<Appendix>

Output for presentation.

Discovery and communication are the two objectives of data visualization. In the discovery phase, various types of graphs must be tried to understand the rough and overall information the data are conveying. The communication phase is focused on presenting the discovered information in a summarized form. During this phase, it is necessary to polish images including graphs, pictures, and videos, and consider the fact that the images may look different when printed than how appear on a computer screen. In this appendix, we discuss important concepts that one must be familiar with to print graphs appropriately.

The KJA asks that pictures and images meet the following requirement before submission 3)

“Figures and photographs should be submitted as ‘TIFF’ files. Submit files of figures and photographs separately from the text of the paper. Width of figure should be 84 mm (one column). Contrast of photos or graphs should be at least 600 dpi. Contrast of line drawings should be at least 1,200 dpi. The Powerpoint file (ppt, pptx) is also acceptable.”

Unfortunately, without sufficient knowledge of computer graphics, it is not easy to understand the submission requirement above. Therefore, it is necessary to develop an understanding of image resolution, image format (bitmap and vector images), and the corresponding file specifications.

Resolution is often mentioned to describe the quality of images containing graphs or CT/MRI scans, and video files. The higher the resolution, the clearer and closer to reality the image is, while the opposite is true for low resolutions. The most representative unit used to describe a resolution is “dpi” (dots per inch): this literally translates to the number of dots required to constitute 1 inch. The greater the number of dots, the higher the resolution. The KJA submission requirements recommend 600 dpi for images, and 1,200 dpi 4) for graphs. In other words, resolutions in which 600 or 1,200 dots constitute one inch are required for submission.

There are requirements for the horizontal length of an image in addition to the resolution requirements. While there are no requirements for the vertical length of an image, it must not exceed the vertical length of a page. The width of a column on one side of a printed page is 84 mm, or 3.3 inches (84/25.4 mm ≒ 3.3 inches). Therefore, a graph must have a resolution in which 1,200 dots constitute 1 inch, and have a width of 3.3 inches.

Bitmap and Vector

Methods of image construction are important. Bitmap images can be considered as images drawn on section paper. Enlarging the image will enlarge the picture along with the grid, resulting in a lower resolution; in other words, aliasing occurs. On the other hand, reducing the size of the image will reduce the size of the picture, while increasing the resolution. In other words, resolution and the size of an image are inversely proportionate to one another in bitmap images, and it is a drawback of bitmap images that resolution must be considered when adjusting the size of an image. To enlarge an image while maintaining the same resolution, the size and resolution of the image must be determined before saving the image. An image that has already been created cannot avoid changes to its resolution according to changes in size. Enlarging an image while maintaining the same resolution will increase the number of horizontal and vertical dots, ultimately increasing the number of pixels 5) of the image, and the file size. In other words, the file size of a bitmap image is affected by the size and resolution of the image (file extensions include JPG [JPEG] 6) , PNG 7) , GIF 8) , and TIF [TIFF] 9) . To avoid this complexity, the width of an image can be set to 4 inches and its resolution to 900 dpi to satisfy the submission requirements of most journals [ 12 ].

Vector images overcome the shortcomings of bitmap images. Vector images are created based on mathematical operations of line segments and areas between different points, and are not affected by aliasing or pixelation. Furthermore, they result in a smaller file size that is not affected by the size of the image. They are commonly used for drawings and illustrations (file extensions include EPS 10) , CGM 11) , and SVG 12) ).

Finally, the PDF 13) is a file format developed by Adobe Systems (Adobe Systems, CA, USA) for electronic documents, and can contain general documents, text, drawings, images, and fonts. They can also contain bitmap and vector images. While vector images are used by researchers when working in Powerpoint, they are saved as 960 × 720 dots when saved in TIFF format in Powerpoint. This results in a resolution that is inappropriate for printing on a paper medium. To save high-resolution bitmap images, the image must be saved as a PDF file instead of a TIFF, and the saved PDF file must be imported into an imaging processing program such as Photoshop™(Adobe Systems, CA, USA) to be saved in TIFF format [ 12 ].

1) Instructions to authors in KJA; section 5-(9) Table; https://ekja.org/index.php?body=instruction

2) Instructions to Authors in KJA; section 6-1)-(10) Figures and illustrations in Manuscript preparation; https://ekja.org/index.php?body=instruction

3) Instructions to Authors in KJA; section 6-1)-(10) Figures and illustrations in Manuscript preparation; https://ekja.org/index.php?body=instruction

4) Resolution; in KJA, it is represented by “contrast.”

5) Pixel is a minimum unit of an image and contains information of a dot and color. It is derived by multiplying the number of vertical and horizontal dots regardless of image size. For example, Full High Definition (FHD) monitor has 1920 × 1080 dots ≒ 2.07 million pixel.

6) Joint Photographic Experts Group.

7) Portable Network Graphics.

8) Graphics Interchange Format

9) Tagged Image File Format; TIFF

10) Encapsulated PostScript.

11) Computer Graphics Metafile.

12) Scalable Vector Graphics.

13) Portable Document Format.

  • Textual And Tabular Presentation Of Data

Think about a scenario where your report cards are printed in a textual format. Your grades and remarks about you are presented in a paragraph format instead of data tables. Would be very confusing right? This is why data must be presented correctly and clearly. Let us take a look.

Suggested Videos

Presentation of data.

Presentation of data is of utter importance nowadays. Afterall everything that’s pleasing to our eyes never fails to grab our attention. Presentation of data refers to an exhibition or putting up data in an attractive and useful manner such that it can be easily interpreted. The three main forms of presentation of data are:

  • Textual presentation
  • Data tables
  • Diagrammatic presentation

Here we will be studying only the textual and tabular presentation, i.e. data tables in some detail.

Textual Presentation

The discussion about the presentation of data starts off with it’s most raw and vague form which is the textual presentation. In such form of presentation, data is simply mentioned as mere text, that is generally in a paragraph. This is commonly used when the data is not very large.

This kind of representation is useful when we are looking to supplement qualitative statements with some data. For this purpose, the data should not be voluminously represented in tables or diagrams. It just has to be a statement that serves as a fitting evidence to our qualitative evidence and helps the reader to get an idea of the scale of a phenomenon .

For example, “the 2002 earthquake proved to be a mass murderer of humans . As many as 10,000 citizens have been reported dead”. The textual representation of data simply requires some intensive reading. This is because the quantitative statement just serves as an evidence of the qualitative statements and one has to go through the entire text before concluding anything.

Further, if the data under consideration is large then the text matter increases substantially. As a result, the reading process becomes more intensive, time-consuming and cumbersome.

Data Tables or Tabular Presentation

A table facilitates representation of even large amounts of data in an attractive, easy to read and organized manner. The data is organized in rows and columns. This is one of the most widely used forms of presentation of data since data tables are easy to construct and read.

Components of  Data Tables

  • Table Number : Each table should have a specific table number for ease of access and locating. This number can be readily mentioned anywhere which serves as a reference and leads us directly to the data mentioned in that particular table.
  • Title:  A table must contain a title that clearly tells the readers about the data it contains, time period of study, place of study and the nature of classification of data .
  • Headnotes:  A headnote further aids in the purpose of a title and displays more information about the table. Generally, headnotes present the units of data in brackets at the end of a table title.
  • Stubs:  These are titles of the rows in a table. Thus a stub display information about the data contained in a particular row.
  • Caption:  A caption is the title of a column in the data table. In fact, it is a counterpart if a stub and indicates the information contained in a column.
  • Body or field:  The body of a table is the content of a table in its entirety. Each item in a body is known as a ‘cell’.
  • Footnotes:  Footnotes are rarely used. In effect, they supplement the title of a table if required.
  • Source:  When using data obtained from a secondary source, this source has to be mentioned below the footnote.

Construction of Data Tables

There are many ways for construction of a good table. However, some basic ideas are:

  • The title should be in accordance with the objective of study:  The title of a table should provide a quick insight into the table.
  • Comparison:  If there might arise a need to compare any two rows or columns then these might be kept close to each other.
  • Alternative location of stubs:  If the rows in a data table are lengthy, then the stubs can be placed on the right-hand side of the table.
  • Headings:  Headings should be written in a singular form. For example, ‘good’ must be used instead of ‘goods’.
  • Footnote:  A footnote should be given only if needed.
  • Size of columns:  Size of columns must be uniform and symmetrical.
  • Use of abbreviations:  Headings and sub-headings should be free of abbreviations.
  • Units: There should be a clear specification of units above the columns.

The Advantages of Tabular Presentation

  • Ease of representation:  A large amount of data can be easily confined in a data table. Evidently, it is the simplest form of data presentation.
  • Ease of analysis:  Data tables are frequently used for statistical analysis like calculation of central tendency, dispersion etc.
  • Helps in comparison:  In a data table, the rows and columns which are required to be compared can be placed next to each other. To point out, this facilitates comparison as it becomes easy to compare each value.
  • Economical:  Construction of a data table is fairly easy and presents the data in a manner which is really easy on the eyes of a reader. Moreover, it saves time as well as space.

Classification of Data and Tabular Presentation

Qualitative classification.

In this classification, data in a table is classified on the basis of qualitative attributes. In other words, if the data contained attributes that cannot be quantified like rural-urban, boys-girls etc. it can be identified as a qualitative classification of data.

200 390
167 100

Quantitative Classification

In quantitative classification, data is classified on basis of quantitative attributes.

0-50 29
51-100 64

Temporal Classification

Here data is classified according to time. Thus when data is mentioned with respect to different time frames, we term such a classification as temporal.

2016 10,000
2017 12,500

Spatial Classification

When data is classified according to a location, it becomes a spatial classification.

India 139,000
Russia 43,000

A Solved Example for You

Q:  The classification in which data in a table is classified according to time is known as:

  • Qualitative
  • Quantitative

Ans:  The form of classification in which data is classified based on time frames is known as the temporal classification of data and tabular presentation.

Customize your course in 30 seconds

Which class are you in.

tutor

  • Diagrammatic Presentation of Data

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Download the App

Google Play

10  Shaped Data [EMPTY]
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23  Parsing
24  A First Look at Interpretation
25 
26 
27  A First Look at Types
28 
29 
30 
31  Structures and Variables
32  Interpretation and Types
33 
34 
35 
4.1 
4.2 
4.1
4.2
4.2.1
4.2.2
4.2.3
4.2.4
4.2.5
4.2.6
4.2.7 Wise Table Operations

4   Introduction to Tabular Data

     Creating Tabular Data

     Processing Rows

       Keeping

       Ordering

       Combining Keeping and Ordering

       Extending

       Transforming, Cleansing, and Normalizing

       Selecting

       Summary of Row-Wise Table Operations

An email inbox is a list of messages. For each message, your inbox stores a bunch of information: its sender, the subject line, the conversation it’s part of, the body, and quite a bit more.

definition of tabular data presentation

A music playlist. For each song, your music player maintains a bunch of information: its name, the singer, its length, its genre, and so on.

definition of tabular data presentation

A filesystem folder or directory. For each file, your filesystem records a name, a modification date, size, and other information.

definition of tabular data presentation

Do Now! Can you come up with more examples?

Responses to a party invitation.

A gradebook.

A calendar agenda.

They consists of rows and columns. For instance, each song or email message or file is a row. Each of their characteristics— the song title, the message subject, the filename— is a column.

Each row has the same columns as the other rows, in the same order.

A given column has the same type, but different columns can have different types. For instance, an email message has a sender’s name, which is a string; a subject line, which is a string; a sent date, which is a date; whether it’s been read, which is a Boolean; and so on.

The rows are usually in some particular order. For instance, the emails are ordered by which was most recently sent.

Exercise Find the characteristics of tabular data in the other examples described above, as well as in the ones you described.

We will now learn how to program with tables and to think about decomposing tasks involving them. You can also look up the full Pyret documentation for table operations .

4.1   Creating Tabular Data

table: name, age row: "Alice", 30 row: "Bob", 40 row: "Carol", 25 end

Exercise Change different parts of the above example— e.g., remove a necessary value from a row, add an extraneous one, remove a comma, add an extra comma, leave an extra comma at the end of a row— and see what errors you get.

check: table: name, age row: "Alice", 30 row: "Bob", 40 row: "Carol", 25 end is-not table: age, name row: 30, "Alice" row: 40, "Bob" row: 25, "Carol" end end

create the sheet on your own,

create a sheet collaboratively with friends,

find data on the Web that you can import into a sheet,

create a Google Form that you get others to fill out, and obtain a sheet out of their responses

4.2   Processing Rows

Let’s now learn how we can actually process a table. Pyret offers a variety of built-in operations that make it quite easy to perform interesting computations over tables. In addition, as we will see later [REF], if we don’t find these sufficient, we can write our own. For now, we’ll focus on the operations Pyret provides.

Which emails were sent by a particular user?

Which songs were sung by a particular artist?

Which are the most frequently played songs in a playlist?

Which are the least frequently played songs in a playlist?

4.2.1   Keeping

sieve email using sender: sender == 'Matthias Felleisen' end

sieve playlist using artist: (artist == 'Deep Purple') or (artist == 'Van Halen') end

4.2.2   Ordering

order playlist: play-count ascending end

Note that what goes between the : and end is not an expression. Therefore, we cannot write arbitrary code here. We can only name columns and indicate which way they should be ordered.

4.2.3   Combining Keeping and Ordering

Of the emails from a particular person, which is the oldest?

Of the songs by a particular artist, which have we played the least often?

Do Now! Take a moment to think about how you would write these with what you have seen so far.

mf-emails = sieve email using sender: sender == 'Matthias Felleisen' end order mf-emails: sent-date ascending end

Exercise Write the second example as a composition of keep and order operations on a playlist table.

4.2.4   Extending

extend employees using hourly-wage, hours-worked: total-wage: hourly-wage * hours-worked end

ext-email = extend email using subject: subject-length: string-length(subject) end order ext-email: subject-length descending end

4.2.5   Transforming, Cleansing, and Normalizing

There are times when a table is “almost right”, but requires a little adjusting. For instance, we might have a table of customer requests for a free sample, and want to limit each customer to at most a certain number. We might get temperature readings from different countries in different formats, and want to convert them all to one single format. Because unit errors can be dangerous ! We might have a gradebook where different graders have used different levels of precision, and want to standardize all of them to have the same level of precision.

transform orders using count: count: num-min(count, 3) end

transform gradebook using total-grade: total-grade: num-round(total-grade) end

transform weather using temp, unit: temp: if unit == "F": fahrenheit-to-celsius(temp) else: temp end unit: if unit == "F": "C" else: unit end end

Do Now! In this example, why do we also transform unit ?

4.2.6   Selecting

select name, total-grade from gradebook end

ss = select artist, song from playlist end order ss: artist ascending end

4.2.7   Summary of Row-Wise Table Operations

We’ve seen a lot in a short span. Specifically, we have seen several operations that consume a table and produce a new one according to some criterion. It’s worth summarizing the impact each of them has in terms of key table properties (where “-” means the entry is left unchanged):

Operation

  

Cell contents

  

Row order

  

Number of rows

  

Column order

  

Number of columns

Keeping

  

-

  

-

  

reduced

  

-

  

-

Ordering

  

-

  

changed

  

-

  

-

  

-

Extending

  

existing unchanged, new computed

  

-

  

-

  

-

  

augmented

Transforming

  

altered

  

-

  

-

  

-

  

-

Selecting

  

-

  

-

  

-

  

changed

  

reduced

The italicized entries reflect how the new table may differ from the old. Note that an entry like “reduced” or “altered” should be read as potentially reduced or altered; depending on the specific operation and the content of the table, there may be no change at all. (For instance, if a table is already sorted according to the criterion given in an order expression, the row order will not change.) However, in general one should expect the kind of change described in the above grid.

Observe that both dimensions of this grid provide interesting information. Unsurprisingly, each row has at least some kind of impact on a table (otherwise the operation would be useless and would not exist). Likewise, each column also has at least one way of impacting it. Furthermore, observe that most entries leave the table unchanged: that means each operation has limited impact on the table, careful to not overstep the bounds of its mandate.

On the one hand, the decision to limit the impact of each operation means that to achieve complex tasks, we may have to compose several operations together. We have already seen examples of this earlier this chapter. However, there is also a much more subtle consequence: it also means that to achieve complex tasks, we can compose several operations and get exactly what we want. If we had fewer operations that each did more, then composing them might have various undesired or (worse) unintended consequences, making it very difficult for us to obtain exactly the answer we want. Instead, the operations above follow the principle of orthogonality : no operation shadows what any other operation does, so they can be composed freely.

As a result of having these operations, we can think of tables also algebrically. Concretely, when given a problem, we should again begin with concrete examples of what we’re starting with and where we want to end. Then we can ask ourselves questions like, “Does the number of columns stay the same, grow, or shrink?”, “Does the number of rows stay the same or shrink?”, and so on. The grid above now provides us a toolkit by which we can start to decompose the task into individual operations. Of course, we still have to think: the order of operations matters, and sometimes we have to perform an operation mutiple times. Still, this grid is a useful guide to hint us towards the operations that might help solve our problem.

Data Presentation

Josée Dupuis, PhD, Professor of Biostatistics, Boston University School of Public Health

Wayne LaMorte, MD, PhD, MPH, Professor of Epidemiology, Boston University School of Public Health

Introduction

"Modern data graphics can do much more than simply substitute for small statistical tables. At their best, graphics are instruments for reasoning about quantitative information. Often the most effective was to describe, explore, and summarize a set of numbers - even a very large set - is to look at pictures of those numbers. Furthermore, of all methods for analyzing and communicating statistical information, well-designed data graphics are usually the simplest and at the same time the most powerful."

Edward R. Tufte in the introduction to

"The Visual Display of Quantitative Information"

While graphical summaries of data can certainly be powerful ways of communicating results clearly and unambiguously in a way that facilitates our ability to think about the information, poorly designed graphical displays can be ambiguous, confusing, and downright misleading. The keys to excellence in graphical design and communication are much like the keys to good writing. Adhere to fundamental principles of style and communicate as logically, accurately, and clearly as possible. Excellence in writing is generally achieved by avoiding unnecessary words and paragraphs; it is efficient. In a similar fashion, excellence in graphical presentation is generally achieved by efficient designs that avoid unnecessary ink.

Excellence in graphical presentation depends on:

  • Choosing the best medium for presenting the information
  • Designing the components of the graph in a way that communicates the information as clearly and accurately as possible.

Table or Graph?

  • Tables are generally best if you want to be able to look up specific information or if the values must be reported precisely.
  • Graphics are best for illustrating trends and making comparisons

The side by side illustrations below show the same information, first in table form and then in graphical form. While the information in the table is precise, the real goal is to compare a series of clinical outcomes in subjects taking either a drug or a placebo. The graphical presentation on the right makes it possible to quickly see that for each of the outcomes evaluated, the drug produced relief in a great proportion of subjects. Moreover, the viewer gets a clear sense of the magnitude of improvement, and the error bars provided a sense of the uncertainty in the data.

Source: Connor JT.  Statistical Graphics in AJG:  Save the Ink for the Information.  Am J of Gastroenterology. 2009; 104:1624-1630.

Principles for Table Display

  • Sort table rows in a meaningful way
  • Avoid alphabetical listing!
  • Use rates, proportions or ratios in addition (or instead of) totals
  • Show more than two time points if available
  • Multiple time points may be better presented in a Figure
  • Similar data should go down columns
  • Highlight important comparisons
  • Show the source of the data

Consider the data in the table below from http://www.cancer.gov/cancertopics/types/commoncancers

Incidence

Proportion

Bladder

72,570

5.7%

Breast

232,340

18.2%

Colon

142,820

11.2%

Kidney

59,938

4.7%

Leukemia

48,610

3.8%

Lung

228,190

17.9%

Melanoma

76,690

6.0%

Lymphoma

69,740

5.5%

Pancreas

45,220

3.5%

Prostate

238,590

18.7%

Thyroid

60,220

4.7%

Our ability to quickly understand the relative frequency of these cancers is hampered by presenting them in alphabetical order. It is much easier for the reader to grasp the relative frequency by listing them from most frequent to least frequent as in the next table.

Type

Incidence

Proportion

Prostate

238,590

18.7%

Breast

232,340

18.2%

Lung

228,340

17.9%

Colon

142,820

11.2%

Melanoma

76,690

6.0%

Bladder

72,570

5.7%

Lymphoma

69,740

5.5%

Thyroid

60,220

4.7%

Kidney

59,938

4.7%

Leukemia

48,610

3.8%

Pancreas

45,220

3.5%

However, the same information might be presented more effectively with a dot plot, as shown below.

definition of tabular data presentation

Data from http://www.cancer.gov/cancertopics/types/commoncancers

Principles of Graphical Excellence from E.R. Tufte

 

From E. R. Tufte. The Visual Display of Quantitative Information, 2nd Edition.  Graphics Press, Cheshire, Connecticut, 2001.

 

Pattern Perception

Pattern perception is done by

  • Detection: recognition of geometry encoding physical values
  • Assembly: grouping of detected symbol elements; discerning overall patterns in data
  • Estimation: assessment of relative magnitudes of two physical values

Geographic Variation in Cancer

As an example, Tufte offers a series of maps that summarize the age-adjusted mortality rates for various types of cancer in the 3,056 counties in the United States. The maps showing the geographic variation in stomach cancer are shown below.

Adapted from Atlas of Cancer Mortality for U.S. Counties: 1950-1969,

TJ Mason et al, PHS, NIH, 1975

 

These maps summarize an enormous amount of information and present it efficiently, coherently, and effectively.in a way that invites the viewer to make comparisons and to think about the substance of the findings. Consider, for example, that the region to the west of the Great Lakes was settled largely by immigrants from Germany and Scand anavia, where traditional methods of preserving food included pickling and curing of fish by smoking. Could these methods be associated with an increased risk of stomach cancer?

John Snow's Spot Map of Cholera Cases

Consider also the spot map that John Snow presented after the cholera outbreak in the Broad Street section of London in September 1854. Snow ascertained the place of residence or work of the victims and represented them on a map of the area using a small black disk to represent each victim and stacking them when more than one occurred at a particular location. Snow reasoned that cholera was probably caused by something that was ingested, because of the intense diarrhea and vomiting of the victims, and he noted that the vast majority of cholera deaths occurred in people who lived or worked in the immediate vicinity of the broad street pump (shown with a red dot that we added for clarity). He further ascertained that most of the victims drank water from the Broad Street pump, and it was this evidence that persuaded the authorities to remove the handle from the pump in order to prevent more deaths.

Map of the Broad Street area of London showing stacks of black disks to represent the number of cholera cases that occurred at various locations. The cases seem to be clustered around the Broad Street water pump.

Humans can readily perceive differences like this when presented effectively as in the two previous examples. However, humans are not good at estimating differences without directly seeing them (especially for steep curves), and we are particularly bad at perceiving relative angles (the principal perception task used in a pie chart).

The use of pie charts is generally discouraged. Consider the pie chart on the left below. It is difficult to accurately assess the relative size of the components in the pie chart, because the human eye has difficulty judging angles. The dot plot on the right shows the same data, but it is much easier to quickly assess the relative size of the components and how they changed from Fiscal Year 2000 to Fiscal Year 2007.

Adapted from Wainer H.:Improving data displays: Ours and the media's. Chance, 2007;20:8-15.

Data from http://www.taxpolicycenter.org/taxfacts/displayafact.cfm?Docid=203

Consider the information in the two pie charts below (showing the same information).The 3-dimensional pie chart on the left distorts the relative proportions. In contrast the 2-dimensional pie chart on the right makes it much easier to compare the relative size of the varies components..

Adapted from Cawley S, et al. (2004) Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116:499-509, Figure 1

More Principles of Graphical Excellence

 

Adapted from Frank E. Harrell Jr. on graphics:  http://biostat.mc.vanderbilt.edu/twiki/pub/Main/StatGraphCourse/graphscourse.pdf ]

Exclude Unneeded Dimensions

 

 

 

 

Source: Cotter DJ, et al. (2004) Hematocrit was not validated as a surrogate endpoint for survival among epoetin-treated hemodialysis patients. Journal of Clinical Epidemiology 57:1086-1095, Figure 2.

 

Source: Roeder K (1994) DNA fingerprinting: A review of the controversy (with discussion). Statistical Science 9:222-278, Figure 4.

These 3-dimensional techniques distort the data and actually interfere with our ability to make accurate comparisons. The distortion caused by 3-dimensional elements can be particularly severe when the graphic is slanted at an angle or when the viewer tends to compare ends up unwittingly comparing the areas of the ink rather than the heights of the bars.

It is much easier to make comparisons with a chart like the one below.

definition of tabular data presentation

Source: Huang, C, Guo C, Nichols C, Chen S, Martorell R. Elevated levels of protein in urine in adulthood after exposure to

the Chinese famine of 1959–61 during gestation and the early postnatal period. Int. J. Epidemiol. (2014) 43 (6): 1806-1814 .

Omit "Chart Junk"

Consider these two examples.

Hash lines are what E.R. Tufte refers to as "chart junk."

 

This graphic uses unnecessary bar graphs, pointless and annoying cross-hatching, and labels with incomplete abbreviations. The cluttered legend expands the inadequate bar labels, but it is difficult to go back and forth from the legend to the bar graph, and the use of all uppercase letters is visually unappealing.

This presentation would have been greatly enhanced by simply using a horizontal dot plot that rank ordered the categories in a logical way. This approach could have been cleared and would have completely avoided the need for a legend.

This grey background is a waste of ink, and it actually detracts from the readability of the graph by reducing contrast between the data points and other elements of the graph. Also, the axis labels are too small to be read easily.

 Source: Miller AH, Goldenberg EN, Erbring L.  (1979)  Type-Set Politics: Impact of Newspapers on Public Confidence. American Political Science Review, 73:67-84.

 

 

Source: Jorgenson E, et al. (2005) Ethnicity and human genetic linkage maps. 76:276-290, Figure 2

Here is a simple enumeration of the number of pets in a neighborhood. There is absolutely no reason to connect these counts with lines. This is, in fact, confusing and inappropriate and nothing more than "chart junk."

definition of tabular data presentation

Source: http://www.go-education.com/free-graph-maker.html

Moiré Vibration

Moiré effects are sometimes used in modern art to produce the appearance of vibration and movement. However, when these effects are applied to statistical presentations, they are distracting and add clutter because the visual noise interferes with the interpretation of the data.

Tufte presents the example shown below from Instituto de Expansao Commercial, Brasil, Graphicos Estatisticas (Rio de Janeiro, 1929, p. 15).

 While the intention is to present quantitative information about the textile industry, the moiré effects do not add anything, and they are distracting, if not visually annoying.

Present Data to Facilitate Comparisons

Tips

 

Here is an attempt to compare catches of cod fish and crab across regions and to relate the variation to changes in water temperature. The problem here is that the Y-axes are vastly different, making it hard to sort out what's really going on. Even the Y-axes for temperature are vastly different.

definition of tabular data presentation

http://seananderson.ca/courses/11-multipanel/multipanel.pdf1

The ability to make comparisons is greatly facilitated by using the same scales for axes, as illustrated below.

definition of tabular data presentation

Data source: Dawber TR, Meadors GF, Moore FE Jr. Epidemiological approaches to heart disease:

the Framingham Study. Am J Public Health Nations Health. 1951;41(3):279-81. PMID: 14819398

It is also important to avoid distorting the X-axis. Note in the example below that the space between 0.05 to 0.1 is the same as space between 0.1 and 0.2.

definition of tabular data presentation

Source: Park JH, Gail MH, Weinberg CR, et al. Distribution of allele frequencies and effect sizes and

their interrelationships for common genetic susceptibility variants. Proc Natl Acad Sci U S A. 2011; 108:18026-31.

Consider the range of the Y-axis. In the examples below there is no relevant information below $40,000, so it is not necessary to begin the Y-axis at 0. The graph on the right makes more sense.

Data from http://www.myplan.com/careers/registered-nurses/salary-29-1111.00.html

Also, consider using a log scale. this can be particularly useful when presenting ratios as in the example below.

definition of tabular data presentation

Source: Broman KW, Murray JC, Sheffield VC, White RL, Weber JL (1998) Comprehensive human genetic maps:

Individual and sex-specific variation in recombination. American Journal of Human Genetics 63:861-869, Figure 1

We noted earlier that pie charts make it difficult to see differences within a single pie chart, but this is particularly difficult when data is presented with multiple pie charts, as in the example below.

definition of tabular data presentation

Source: Bell ML, et al. (2007) Spatial and temporal variation in PM2.5 chemical composition in the United States

for health effects studies. Environmental Health Perspectives 115:989-995, Figure 3

When multiple comparisons are being made, it is essential to use colors and symbols in a consistent way, as in this example.

definition of tabular data presentation

Source: Manning AK, LaValley M, Liu CT, et al.  Meta-Analysis of Gene-Environment Interaction:

Joint Estimation of SNP and SNP x Environment Regression Coefficients.  Genet Epidemiol 2011, 35(1):11-8.

Avoid putting too many lines on the same chart. In the example below, the only thing that is readily apparent is that 1980 was a very hot summer.

definition of tabular data presentation

Data from National Weather Service Weather Forecast Office at

http://www.srh.noaa.gov/tsa/?n=climo_tulyeartemp

Make Efficient Use of Space

 

More Tips:

Reduce the Ratio of Ink to Information

This isn't efficient, because this graphic is totally uninformative.

definition of tabular data presentation

Source: Mykland P, Tierney L, Yu B (1995) Regeneration in Markov chain samplers.  Journal of the American Statistical Association 90:233-241, Figure 1

Bar charts are not appropriate for indicating means ± SEs. The only important information is the mean and the variation about the mean. Consider the figure to the right. By representing a mean with a number and a bar that has width, the information is representing one number over and over with:

 

 

Bar graphs add ink without conveying any additional information, and they are distracting. The graph below on the left inappropriately uses bars which clutter the graph without adding anything. The graph on the right displays the same data, by does so more clearly and with less clutter.

Source: Conford EM, Huot ME. Glucose transfer from male to female schistosomes. Science. 1981 213:1269-71

 

"Just as a good editor of prose ruthlessly prunes unnecessary words, so a designer of statistical graphics should prune out ink that fails to present fresh data-information. Although nothing can replace a good graphical idea applied to an interesting set of numbers, editing and revision are as essential to sound graphical design work as they are to writing."

Edward R. Tufte, "The Visual Display of Quantitative Information"

Multiple Types of Information on the Same Figure

Choosing the Best Graph Type

Adapted from Frank E Harrell, Jr: on Graphics:

http://biostat.mc.vanderbilt.edu/twiki/pub/Main/StatGraphCourse/graphscourse.pdf

 

Bar Charts, Error Bars and Dot Plots

As noted previously, bar charts can be problematic. Here is another one presenting means and error bars, but the error bars are misleading because they only extend in one direction. A better alternative would have been to to use full error bars with a scatter plot, as illustrated previously (right).

Source: Hummer BT, Li XL, Hassel BA (2001) Role for p53 in gene

induction by double-stranded RNA. J Virol 75:7774-7777, Figure 4

 

Consider the four graphs below presenting the incidence of cancer by type. The upper left graph unnecessary uses bars, which take up a lot of ink. This layout also ends up making the fonts for the types of cancer too small. Small font is also a problem for the dot plot at the upper right, and this one also has unnecessary grid lines across the entire width.

The graph at the lower left has more readable labels and uses a simple dot plot, but the rank order is difficult to figure out.

The graph at the lower right is clearly the best, since the labels are readable, the magnitude of incidence is shown clearly by the dot plots, and the cancers are sorted by frequency.

*************************

+

Single Continuous Numeric Variable

In this situation a cumulative distribution function conveys the most information and requires no grouping of the variable. A box plot will show selected quantiles effectively, and box plots are especially useful when stratifying by multiple categories of another variable.

Histograms are also possible. Consider the examples below.

Density Plot

Histogram

Box Plot

Two Variables

Adapted from Frank E. Harrell Jr. on graphics: 

http://biostat.mc.vanderbiltedu/twiki/pub/Main/StatGraphCourse/graphscourse.pdf

 The two graphs below summarize BMI (Body Mass Index) measurements in four categories, i.e., younger and older men and women. The graph on the left shows the means and 95% confidence interval for the mean in each of the four groups. This is easy to interpret, but the viewer cannot see that the data is actually quite skewed. The graph on the right shows the same information presented as a box plot. With this presentation method one gets a better understanding of the skewed distribution and how the groups compare.

The next example is a scatter plot with a superimposed smoothed line of prediction. The shaded region embracing the blue line is a representation of the 95% confidence limits for the estimated prediction. This was created using "ggplot" in the R programming language.

definition of tabular data presentation

Source: Frank E. Harrell Jr. on graphics:  http://biostat.mc.vanderbilt.edu/twiki/pub/Main/StatGraphCourse/graphscourse.pdf (page 121)

Multivariate Data

The example below shows the use of multiple panels.

definition of tabular data presentation

Source: Cleveland S. The Elements of Graphing Data. Hobart Press, Summit, NJ, 1994.

Displaying Uncertainty

  • Error bars showing confidence limits
  • Confidence bands drawn using two lines
  • Shaded confidence bands
  • Bayesian credible intervals
  • Bayesian posterior densities

Confidence Limits

Shaded Confidence Bands

definition of tabular data presentation

Source: Frank E. Harrell Jr. on graphics:  http://biostat.mc.vanderbilt.edu/twiki/pub/Main/StatGraphCourse/graphscourse.pdf

definition of tabular data presentation

Source: Tweedie RL and Mengersen KL. (1992) Br. J. Cancer 66: 700-705

Forest Plot

This is a Forest plot summarizing 26 studies of cigarette smoke exposure on risk of lung cancer. The sizes of the black boxes indicating the estimated odds ratio are proportional to the sample size in each study.

definition of tabular data presentation

Data from Tweedie RL and Mengersen KL. (1992) Br. J. Cancer 66: 700-705

Summary Recommendations

  • In general, avoid bar plots
  • Avoid chart junk and the use of too much ink relative to the information you are displaying. Keep it simple and clear.
  • Avoid pie charts, because humans have difficulty perceiving relative angles.
  • Pay attention to scale, and make scales consistent.
  • Explore several ways to display the data!

12 Tips on How to Display Data Badly

Adapted from Wainer H.  How to Display Data Badly.  The American Statistician 1984; 38: 137-147. 

  • Show as few data as possible
  • Hide what data you do show; minimize the data-ink ratio
  • Ignore the visual metaphor altogether
  • Only order matters
  • Graph data out of context
  • Change scales in mid-axis
  • Emphasize the trivial;  ignore the important
  • Jiggle the baseline
  • Alphabetize everything.
  • Make your labels illegible, incomplete, incorrect, and ambiguous.
  • More is murkier: use a lot of decimal places and make your graphs three dimensional whenever possible.
  • If it has been done well in the past, think of another way to do it

Additional Resources

  • Stephen Few: Designing Effective Tables and Graphs. http://www.perceptualedge.com/images/Effective_Chart_Design.pdf
  • Gary Klaas: Presenting Data: Tabular and graphic display of social indicators. Illinois State University, 2002. http://lilt.ilstu.edu/gmklass/pos138/datadisplay/sections/goodcharts.htm (Note: The web site will be discontinued to be replaced by the Just Plain Data Analysis site).

TABULAR PRESENTATION OF DATA

Tabulation may be defined as systematic presentation of data with the help of a statistical table having a number of rows and columns and complete with reference number, title, description of rows as well as columns and foot notes, if any.

We may consider the following guidelines for tabulation :

1.  A statistical table should be allotted a serial number along with a self-explanatory title.

2. The table under consideration should be divided into caption, Box-head, Stub and Body.

Caption is the upper part of the table, describing the columns and sub-columns, if any.

The Box-head is the entire upper part of the table which includes columns and sub-column numbers, unit(s) of measurement along with caption.

Stub is the left part of the table providing the description of the rows.

The body is the main part of the table that contains the numerical figures.

3. The table should be well-balanced in length and breadth.

4.  The data must be arranged in a table in such a way that comparison(s) between different figures are made possible without much labor and time.

Also the row totals, column totals, the units of measurement must be shown.

5. The data should be arranged intelligently in a well-balanced sequence and the presentation of data in the table should be appealing to the eyes as far as practicable.

6.  Notes describing the source of the data and bringing clarity and, if necessary, about any rows or columns known as footnotes, should be shown at the bottom part of the table.

The textual presentation of data, relating to the workers of a factory is shown in the following table.

Status of the workers of the factory on the basis of their trade union membership for 1999 and 2000.

definition of tabular data presentation

Here, we have to write the source through which we got the above data.

TU, M, F and T stand for trade union, male, female and total respectively.

The tabulation method is usually preferred to textual presentation as

(i)  It facilitates comparison between rows and columns.

(ii) Complicated data can also be represented using tabulation.

(iii)  It is a must for diagrammatic representation.

(iv)  Without tabulation, statistical analysis of data is not possible.

Kindly mail your feedback to   [email protected]

We always appreciate your feedback.

© All rights reserved. onlinemath4all.com

  • Sat Math Practice
  • SAT Math Worksheets
  • PEMDAS Rule
  • BODMAS rule
  • GEMDAS Order of Operations
  • Math Calculators
  • Transformations of Functions
  • Order of rotational symmetry
  • Lines of symmetry
  • Compound Angles
  • Quantitative Aptitude Tricks
  • Trigonometric ratio table
  • Word Problems
  • Times Table Shortcuts
  • 10th CBSE solution
  • PSAT Math Preparation
  • Privacy Policy
  • Laws of Exponents

Recent Articles

Sat math resources (videos, concepts, worksheets and more).

Aug 17, 24 11:04 PM

Digital SAT Math Problems and Solutions (Part - 29)

Aug 17, 24 10:59 PM

digitalsatmath23.png

Geometry Problems with Solutions (Part - 4)

Aug 17, 24 09:10 PM

geometryproblems15.png

IMAGES

  1. What is Tabular Data? (Definition & Example)

    definition of tabular data presentation

  2. What is Tabular Data? (Definition & Example)

    definition of tabular data presentation

  3. Tabular Presentation of Data: Meaning, Objectives, Features and Merits

    definition of tabular data presentation

  4. Tabular Presentation of Data

    definition of tabular data presentation

  5. Tabular Presentation of Data: Meaning, Objectives, Features and Merits

    definition of tabular data presentation

  6. Presentation of data mod 6

    definition of tabular data presentation

COMMENTS

  1. Tabular Presentation of Data

    Tabular Presentation of Data: Find out the Components of Data Tables such as-Table Number, Title, Headnotes, Stubs, Caption, Body or field and Footnotes.

  2. Tabular Presentation of Data: Meaning, Objectives ...

    The following are the main objectives of tabulation: To make complex data simpler: The main aim of tabulation is to present the classified data in a systematic way. The purpose is to condense the bulk of information (data) under investigation into a simple and meaningful form.

  3. What is Tabular Data? (Definition & Example)

    What is Tabular Data? (Definition & Example) In statistics, tabular data refers to data that is organized in a table with rows and columns. Within the table, the rows represent observations and the columns represent attributes for those observations. For example, the following table represents tabular data: This dataset has 9 rows and 5 columns.

  4. Data presentation: A comprehensive guide

    Definition: Data presentation is the art of visualizing complex data for better understanding. Importance: Data presentations enhance clarity, engage the audience, aid decision-making, and leave a lasting impact. Types: Textual, Tabular, and Graphical presentations offer various ways to present data.

  5. What is Tabular Data? (Definition & Example)

    In statistics, tabular data refers to data that is organized in a table with rows and columns. Within the table, the rows represent observations and the columns represent attributes for those observations. For example, the following table represents tabular data: This dataset has 9 rows and 5 columns. Each row represents one basketball player ...

  6. 7 Introduction to Tabular Data

    The characteristics of tabular data are: They consists of rows and columns. For instance, each song or email message or file is a row. Each of their characteristics— the song title, the message subject, the filename— is a column. Each row has the same columns as the other rows, in the same order.

  7. Data Presentation

    Data Presentation - Tables. Tables are a useful way to organize information using rows and columns. Tables are a versatile organization tool and can be used to communicate information on their own, or they can be used to accompany another data representation type (like a graph). Tables support a variety of parameters and can be used to keep ...

  8. Data Presentation

    Key Objectives of Data Presentation. Here are some key objectives to think about when presenting financial analysis: Visual communication. Audience and context. Charts, graphs, and images. Focus on important points. Design principles. Storytelling. Persuasiveness.

  9. Chapter 10 Tabular Data

    Chapter 10 Tabular Data. Chapter 10. Tabular Data. Tabular data is data on entities that has been aggregated in some way. A typical example would be to count the number of successes and failures in an experiment, and to report those aggregate numbers rather than the outcomes of the individual trials. Another way that tabular data arises is via ...

  10. 7 thumb rules to optimize your tabular data presentation

    TL;DR: I'm going to show you some data-viz techniques for tabular data. Those techniques will help your audience focus more on the impact of your cells than the table definition itself — feel free to just to conclusions to have a quick bullet list.

  11. Presentation of Data

    Presentation of data in a tabular form. A tabular data presentation is the clear organization of data into rows and columns. The use of tables is pervasive throughout all communication, research and data analysis. definition.

  12. Tabular Presentation of Data

    The objectives of tabular data presentation are as follows. The tabular data presentation helps in simplifying the complex data. It also helps to compare different data sets thereby bringing out the important aspects. The tabular presentation provides the foundation for statistical analysis. The tabular data presentation further helps in the ...

  13. Presentation of Data (Methods and Examples)

    What is Meant by Presentation of Data? As soon as the data collection is over, the investigator needs to find a way of presenting the data in a meaningful, efficient and easily understood way to identify the main features of the data at a glance using a suitable presentation method. Generally, the data in the statistics can be presented in three different forms, such as textual method, tabular ...

  14. Data Presentation

    Data Analysis & Data Presentation have a practical implementation in every possible field from academic studies to professional practices.

  15. Understanding Data Presentations (Guide + Examples)

    Understanding Data Presentations (Guide + Examples) Design • March 20th, 2024. In this age of overwhelming information, the skill to effectively convey data has become extremely valuable. Initiating a discussion on data presentation types involves thoughtful consideration of the nature of your data and the message you aim to convey.

  16. Statistical data presentation

    In this article, the techniques of data and information presentation in textual, tabular, and graphical forms are introduced. Text is the principal method for explaining findings, outlining trends, and providing contextual information. A table is best suited for representing individual information and represents both quantitative and ...

  17. Textual, Tabular & Diagrammatic Presentation of Data

    Data can be presented in three ways: 1. Textual Mode of presentation is layman's method of presentation of data. Anyone can prepare, anyone can understand. No specific skill (s) is/are required. 2. Tabular Mode of presentation is the most accurate mode of presentation of data. It requires a lot of skill to prepare, and some skill (s) to ...

  18. Textual And Tabular Presentation Of Data

    Presentation of data refers to an exhibition or putting up data in an attractive and useful manner such that it can be easily interpreted. The three main forms of presentation of data are: Textual presentation Data tables Diagrammatic presentation Here we will be studying only the textual and tabular presentation, i.e. data tables in some detail.

  19. 4 Introduction to Tabular Data

    4.1 Creating Tabular Data Pyret provides multiple easy ways of creating tabular data. The simplest is to define the datum in a program as follows: table: name, age

  20. Data Presentation

    Encourage the eye to compare different pieces of data. Reveal the data at several levels of detail, from a broad overview to the fine structure. Serve a clear purpose: description, exploration, tabulation, or decoration. Be closely integrated with the statistical and verbal descriptions of the data set. From E. R. Tufte.

  21. TABULAR PRESENTATION OF DATA

    TABULAR PRESENTATION OF DATA Tabulation may be defined as systematic presentation of data with the help of a statistical table having a number of rows and columns and complete with reference number, title, description of rows as well as columns and foot notes, if any.

  22. What Is Data Presentation? (Definition, Types And How-To)

    Discover data presentation with an overview of the different types of methods you can use to present data and a guide for sharing data with an audience.

  23. Explaining the method of a tabular presentation of data

    A table represents a large amount of data in an arranged, organised, engaging, coordinated and easy to read form called the tabular presentation of data.