Data Mining and Cleaning
The preliminary approach to analyze data in project management is data mining, or collecting data from various sources and converting it into a presentable format for modeling and making predictions. This involves collecting and recording data from business processes, including:
- Quantitative data such as time sequences, occurrence rates, and extent of deviations of processes from the standard mean.
- Qualitative data such as customer complaints, incidence reports, daily status reports, and more.
The next step is data cleaning, or inspecting the collected data to correct errors, remove discrepancies, and eliminate superfluous and unrelated data. The actual data analysis is either exploratory or confirmatory. The exploratory approach discovers something new, such as trends or probabilities. The confirmatory approach validates assumptions and establishes controls.
The basic method of data analysis in projects is various types of statistical analysis. Most quality management methods such as Six Sigma are statistic intensive, and apply a variety of statistical applications to analyze the production or operations data and confirm the extent of deviation from the standard mean. Such techniques aim to uncover the root causes for process failure and provide guidelines to implement lasting controls for sustained process improvement.
The common methods of statistical data analysis techniques for projects are:
- Correlation Analysis: Correlation analysis shows how one variable relates to another. For instance, it shows whether piece rates lead to better productivity.
- Regression Analysis: Regression analysis is a quantitative prediction of the difference in values of one variable from that of another variable.
- t-test: t-test is a basic test to determine whether two groups of data are statistically different. For instance, use the t-test to determine the similarity of time sheet data from two different projects.
- ANOVA or Analysis of Variance: ANOVA makes use of simultaneous comparisons and determines whether a significant relation exists between variables.
- ANCOVA or Analysis of covariance: ANCOVA is a merger of ANOVA and regression analysis, to model a linear relationship between one continuous quantitative variable and one or more qualitative variable.
- MANOVA or Multivariate analysis of variance: MANOVA is a generalized form of ANOVA, used to make simultaneous comparisons and determines the existence of a significant relationship between two or more dependent variables.
- Normality tests: normality tests find determine the extent to which a random variable is distributed normally.
- Scatter Plots: Scatter plots are 3D visualizations that facilitate a representation of multivariate data in four dimensions.
- Histograms: Histograms are diagrams consisting of rectangles, with the area of each rectangle proportionate to the frequency of a variable, and the width of the rectangle equal to the class interval.
- Pareto charts: Pareto charts are a combination of bar and line graphs, to display values of one variable in descending order as bars, and cumulative values of all variables in a category, left to right, as a line graph.
A time series is a set of ordered observations of a quantitative characteristic of a phenomenon, undertaken at equally spaced time points, to forecast future values of the series, and thereby identify trends, seasonal variations, and periodic oscillation. For instance, time analysis applies in a project to set up a call center, to identify the seasonal and peak hours of demand for telephone operators, and thereby make decisions on the number of terminals required.
Time series models can be:
- Single-Equation Regression involving a single time-dependent variable.
- Multi-equation Simulation with multiple variables involving multiple variables, to forge a better understanding of relationships and structures.
The Time-series model becomes appropriate when the variable, under study, has little or no apparent information, and there exists many data points to facilitate a pattern.
Probabilistic data analysis techniques provide a range of possible outcomes for each set of data, and facilitate decision making when encountering data that throws up uncertainties. It finds widespread use in marketing projects.
The popular probabilistic techniques include:
- Event History Analysis: Event history analysis involves identifying concurrent events or measurements that influence the event of interest. For example, the arrival of a tourist ship may lead to increased sales in the local shops and services in an island resort.
- Regression Trees: regression trees are similar to decision trees, and involve using a tree-like graph to list decisions and their possible consequences, including chance outcomes.
- Delphi Analysis: Delphi analysis or cross impact analysis, is a type of brainstorming where several “experts” sit together and consider all variables and factors. After each round of discussions, a facilitator summarizes all opinions, on which everyone revises their earlier answers, and the process repeats until identifying common ground.
- System Dynamics Modeling (SD): SD is a tool for scenario analysis or analyzing possible future events by considering alternative possible outcomes. The tools used include differential equations and simulation.
- Markov Chains: Markov Chains are a special collection of random variables. The application of simulations models on such chains to observe behavior, helps to evaluate performance measurement and decide the best policy.
The importance of data analysis for a project manager can never be underestimated, for collecting and analyzing business data serves many purposes, such as facilitating informed and confident decision making, validating compliance of data with business rules, confirmation of proper functioning of existing controls, reconciliation of data across disparate systems, and quick deployment of system solutions.
- InderScience Publishers. “International Journal of Data Analysis and Techniques.” https://www.inderscience.com/www/IJDATS_leaflet.pdf. Retrieved June 11, 2011.
- Good, David. “Data Analysis Techniques”. https://www.ifm.eng.cam.ac.uk/mtms/events/documents/data_analysis_techniques.pdf. Retrieved June 11, 2011.
- Georgia State University. “Data Analysis Techniques.” https://dstraub.cis.gsu.edu:88/quant/5dataanal.asp. Retrieved June 11, 2011.
- University of Baltimore. “Time-Critical Decision Making for Business Administration.” https://home.ubalt.edu/ntsbarsh/stat-data/forecast.htm. Retrieved June 11, 2011.
Image Credit: freedigitalphotos.net/jscreationzs