Economic data sets come in variety types. While some of them can be applied with little or no modification to many kinds of data sets, some special features of data sets should be accounted for or should be exploited. The most important data structures are described as following:
Cross-Sectional Data: A cross-sectional data set consists of a sample of individuals,households,firms,cities,states,countries,or other units, taken at a given point in time. Sometimes the data on all units does not correspond to precisely time period. In a pure cross section analysis we may ignore any minor timing differences in collecting the data.
An important feature of cross-sectional data is that we can often assume that they have been obtained by random sampling from the underlying population. For example when we collect information we gather it from people that are from different social backgrounds,but sometimes random sampling is not appropriate as an assumption for analyzing cross-sectional data.
In economics,the analysis of cross-sectional data is closely related to applied micro-economics fields, such as labor economics,state and local public finance,industrial organization, urban economics, demography, and health economics. Data on individuals, households, firms, and cities at a given point in time are important for testing micro-economic hypothesis and evaluating economic policies.
Time Series Data: A time series data set consists of observations on a variable or a several variables over time. Examples of time series data include stock prices,money supply, consumer price index,gross domestic product. Because past events can influence future events and lags lags in behavior are prevalent in social sciences, time is important dimension in a time series data set. Unlike the arrangement of cross-sectional data, the chronological ordering of observations in a time series conveys potentially important information. This type of data set is very important because some economic variables tend to display clear tends over time.
Another feature of time series data that can require special attention is the data frequency at which data are collected. In economics, the most common frequencies are daily,weekly,monthly,quarterly, and annually. Many weekly,monthly, and quarterly economic time series display a strong seasonal pattern, which can be an important factor in a time series analysis. When econometric methods are used to analyze time series data, the data should be stored in chronological order.
Pooled Cross Sections: Some data sets have both cross-sectional and time series feature. For example , suppose that two cross-sectional household surveys are taken in Moldova, one in 1945 and one in 1965. In 1945 , a random sample of households is surveyed for variable such as income,savings,family size. In 1965 a new sample of households is taken the same survey questions. In order to increase our sample size, we can form a pooled cross section by combining the two years.
Pooling cross sections from different years is often an effective way of analyzing the effects of a new government policy. While the order in which we store the data tuns out not to be crucial, keeping track of the year for each observation is usually very important. This is why we enter year as a separate variable. In fact, in addition to increasing the sample size, the point of a pooled cross-sectional analysis is often to see how a key relationship has changed over time.
Panel or Longitudinal Data: A panel data set consists of a time series for each cross-sectional member in the data set. As an example, suppose we have wage, education, and employment history for a set of individuals followed over a ten-year period. Or we might collect information, such as investment and financial data,about the same set of firms over a five-year time period.
The key feature of panel data that distinguishes cross section is the fact that the same cross-sectional units (individuals,firms or countries). Because panel data requires replication of the same units over time,panel data sets, especially those on individuals, households, firms, are most difficult to obtain than pooled cross sections. The use of more than one observation can facilitate causal inference in situations where inferring causality would be very difficult if only a single cross section were available. An advantage is that having multiple observations on the same units allows us to control certain unobserved characteristics of individuals,firms. A second advantage of panel data is that it often allows us to study the importance of lags in behavior or result of decision making.