The goal in this tutorial is to constructing appropriate summary measures, tables, and graphs. There are numerous ways to do presentation.
- a variety of graphs, including bar charts, pie charts, histograms, scatter charts, and time series graphs
- Numerical summary measures such as counts, percentages, averages and measures of variability
- Tables of summary measures such as totals, averages, and counts, grouped by categories
Data sets, Variables, and Observations
A data set is usually a rectangular array of data, with variables in columns and observations in rows. A variable is a characteristic of members of a population. An observation is a list of all variable values for a single member of a population.
Types of data
The basic distinction of data is between numerical and categorical data. The third type of data is date.
For categorical data, it also has a classification. If numbers are in natural ordering, this is ordinal. Otherwise, it is nominal.
Also if you see there is column contain texts, and each text represent a range of numbers, it is called bining.
For numerical data, it can be classified as discrete or continuous. If results are from counting, the data is considered as discrete. Otherwise, it is continuous variable.
Data sets can also be categorized as cross-sectional or time series. Cross-sectional data are data on a cross section of a population at a distinct point in time. Time series data are data collected over time.
Section 1 Discusses methods for describing categorical variable
To do presentation for categorical data:
- Look at each column, determine data type for each column.
- Determine how many categories for each column of categorical data. (You can use pivotable table to do this)