“What gets measured, gets managed.”
There are a many alternatives on how to measure forecast accuracy. It does not matter too much how you define the KPI as long as:
Therefore, choose a set of metrics and stick with them for a while, you might evolve to change your metric slightly based upon feedback and discussions but make changes gradually and slowly.
Since metrics should be easy and intuitive to understand, we have found the below metrics bringing the most added value:
Use a condense set of metrics, not a single one. Choose at least 3 metrics:
References
Simply visualize the top 10 forecast errors by sales representative. A root cause analysis can be coupled to this in order for sales reps to provide a reason for the error.
When comparing the forecast accuracy across different global regions we observe that the customer buying behavior might be significantly different across those regions. In some cases Japan and Korea have shown to be much more stable markets when comparing with India and China. This skews the comparison of forecast accuracy metrics across these regions. It’s unfair to state that demand planners in China are doing a worse job than in Japan and Korea. A metric that can help is coefficient of variation (COV).

The coefficient of variation is a measure of the volatility in a time series. A scatter plot of different regions, customers or customer/product combinations with axis depicting COV and forecast error can throw an interesting perspective.
Figure 2: Scatter plot of forecast combinations
The result is typically a cloud of dots that is concentrated around a line with slope 45°. This is also called the Forecast Value Add (FVA) line. Dots that are below the line have received added value during the forecasting process, dots above have not gained value during the forecasting process.
The definition of forecast value added is “the change in a forecasting performance metric that can be attributed to a particular step or participant in the forecasting process”. FVA is measured by comparing the performance that you would have reached without performing a specific process step and the performance when including the process step. Performance metrics can e.g. be MAPE, accuracy or bias. FVA can be negative or positive.
A typical forecast value added analysis is the comparison between the naive, statistical, and collaborative forecast. Typically one would expect that the statistical forecast outperforms the naive forecast. The collaborative forecast which is being created by reviewing the statistical forecast should bring additional added value and further reduce the forecast error.
Figure 3: Example of Forecast Value Added (FVA) Analysis
When one masters FVA analysis, there is another step of sophistication that can be taken by reviewing the overrides and their respective size. This analysis was described in a number of papers by Goodwin & Fildes (2011).
During the process of reviewing the statistical forecast the different collaborators make what is typically called overrides or judgmental adjustment to the forecast. This type of analysis often leads to insights such as:
These insights have led to a number of guidelines for collaborators of the forecasting process:
Figure 4: Added value by size and sign of the judgmental adjustments (Goodwin & Fildes (2011))
The Pareto rule is a well-known observation in a lot of companies. In the context of supply chain management it describes that often 80% of the volume is represented by only 20% of the products. While the remaining 20% of the volume is represented by 80% of the products, this can be extended from products or SKU’s towards more general forecast combinations (E.g. shipto/product combinations). Typically the Pareto analysis drives an ABC classification where the first 80% of the volume defines A products, the next 15% B Products and the last 5% C Products. The graphical representation is called a Pareto curve and shows the cumulative volume. The more the Pareto curve is to the top left corner, the more outspoken is the Pareto-effect.
Figure 5: A typical pareto curve (cumulative percent of volume in function of cumulative # of references)
When one understands the Pareto rule well, a possible visualization is to visualize the cumulative forecast performance. In this visualization the WMAPE is calculated for each cumulative point of the portfolio.
Figure 6: Visualisation of cumulative forecast performance
This type of visualisation is particularly handy to compare different forecasting techniques. It helps identifying where one technique performs better than others. Each line on the cumulative forecast performance graph represents one forecasting technique. When a line decreases it means that it is more difficult to forecast the just added items in the portfolio versus the first (biggest volume) items. Typically the general tendency of the cumulative forecast performance lines is downward since it is inherently more difficult to forecast low volume, typically more erratic, combinations than high volume, typically more stable, combinations. When the lines of different techniques cross then you have one technique that performs better on the high volume combinations and another that performs better on the low volume combinations. Where the different lines end when reaching 100% of the portfolio defines the overall wMAPE for the portfolio. Often the conclusion is that the optimal performance is a combination of different techniques. E.g. how to capture best of both statistical forecast and sales forecast worlds.
Figure 7: Example visualisation of cumulative forecast performance and bias
Every manager knows that KPI’s should be aligned with your process or business goals. When it comes to forecasting demand it can be relevant to report forecast accuracy by forecast horizon.
A forecast horizon is the amount of periods one forecasts ahead of time. A forecast prepared in a January S\&OP cycle for the month of May would represent a forecast horizon of 4 months. The accuracy would be referred to as the accuracy of lag 4 and would help understand how good one is capable to forecast 4 months ahead. As such one gets insight in how good projections can be made in the near future.
Since uncertainty is higher, typically the forecast accuracy decreases when forecasting further ahead. The forecast error respectively decreases when forecasting further ahead.
When reporting forecast accuracy one needs to understand which lag is being reported. When starting to publish forecast accuracy measurements within the organisation it is advisable to start with one or two chosen lags, e.g. 30 and 60 day lag. When the maturity around understanding and analyzing forecast accuracy increases one can complement with a mid-term (4 to 6 months lag) and long-term (8 to 12 months) forecast accuracy measure.
Figure 8: Example of forecasting accuracy reported per forecast horizon
The canyon or crawl chart is a type of bar chart that shows two bars at each side. E.g. The left bar shows the (full year) forecast made in the previous S\&OP cycle and the right bar shows the (full year) forecast made in the current S\&OP cycle. Often management is interested in understanding why there is a difference between these different versions of the forecast. Therefore the delta (gap or difference) would be visualized incrementally In between the two outer bars. Positive delta’s are typically colored green, negative deltas are typically colored in red. Deltas are typically ordered from large positive to large negative to attract as much attention to the key differences. The axes start close to the minimum of one of both outer bars. This visualization is very powerful to be used in executive slide decks for higher management. Graphs can be automatically generated and exported to PowerPoint. One of the design decisions is which dimensions makes most sense to breakdown the gap.
This could be:
Often one can drill down from one dimension to lower dimensions. This type of visualization is very common among financial departments. E.g. In quarterly earnings calls it is often used to report EBIT evolution over time between last quarter and same quarter last year or per rolling full year. It is also being used to visualize the delta between two inventory positions.
Figure 9: Example canyon chart that breaks down the delta between forecast versions per product line
Figure 10: Example canyon chart visualising the delta in inventory evolution
The waterfall chart is a bar chart that represents the different forecast versions per time bucket. It is a good way to represent the volatility of the different forecast versions and can help answer the question ‘how did we end up with this forecasted number’.
Figure 11: Example of a forecast waterfall chart
A pattern that is well known among forecast professionals is the hockey stick effect. Sales people usually make up a budget forecast for the next fiscal year. Once the fiscal year is rolling you can see individuals that haven’t met their budgeted numbers, re-adjust their forecasts to higher numbers in later months of the fiscal year. This typically shows an exponential pattern in cumulative sales for the fiscal year. The adagio of the sales people is ‘we will catch up to meet the target’.
Figure 12 Often observed pattern: hockey stick effect
Practical Application of FVA Analysis
A business forecast is normally produced by applying overrides and modifications to an initial statistical forecast. Forecast Value Added (FVA) analysis can be used to identify if certain process steps are improving the forecast accuracy or if they are just adding to the noise. This article identifies some of the key factors that must be considered in applying FVA analysis within your demand management process. The discussion should be read in the context of large industrial suppliers.
In a typical forecasting process, a statistical forecast is generated using historical demand data. This forecast may use statistical software like the Arkieva Demand Planner, SAS, or simply an Excel worksheet. This initial prediction is then modified by input from the demand planner, the sales organization, marketing, and management before being passed down to the Sales and Operations Planning (S\&OP) process.
The question is if each of these steps in developing a forecast actually improves the result. After all, there does not seem to be any point in gathering a lot of data from folks if their inputs do not improve accuracy. In his article “The Null Hypothesis: Your Forecasting Process Has No Effect”, Michael Gilliland raises the key question:
The typical business forecasting process consumes large amounts of management time –but is it “adding value” by making the forecast more accurate and less biased?
Whether or not a process adds value can only be determined by first selecting a suitable performance metric like MAPE (mean absolute percentage error).

There are numerous articles in the literature that suggest that management overrides typically do not improve a metric like this systematically. Michael Gilliland describes how a company can attempt to determine if a particular step adds value by using control charts to compare the results of a process step versus the null hypothesis that the forecasting step has no effect. While this sort of analysis is useful as an initial exercise, it ignores some practical realities.
The statistical model and data used to generate the initial forecast tries to capture any systematic behavior of the underlying demand pattern. The intrinsic assumption used is that the demand is the sum of an underlying pattern that can be approximated by a statistical model and random variability. Past data is used to identify the parameters of the underlying model as well as the extent of the randomness.
The statistical forecast is passed on to collaborators with little or no guidance other than to “improve on it if you can” or “check it for reasonableness”. This sets the folks responsible for improving the forecast in direct competition with statistical forecasting. They see their role as correcting the forecast, rather than improving it. Many of the academic studies that compare forecasts generated manually with statistical forecasts have also added to this perception. Best practice views the statistical forecast as one of the collaborators, not the competition.
The purpose of gathering inputs is not to validate the statistical model or calculations, but is to include selective information that may be available but not reflected in historical data. Input can either just change a forecasted number, or may be used to change the underlying statistical model. Making sure that the statistical model effective reflects history (to the extent that it is possible) is the role of the demand planner.
The sales person is often the only person who has information like imminent customer plant outages, or if they are about to book an order with the customer, or if the customer is in the process of replacing a product with a competitor’s offering. These are changes that are not reflected in history and should be used by the sales person to modify the forecast because they represent a departure from past history.
Imminent changes should translate into manual overrides, but longer term step changes in demand need to be communicated to the demand planner so that they can change the underlying model. For example, a sudden but temporary surge in demand due to a weather event represents a departure from the forecast, but should not necessarily modify the underlying statistical model. On the other hand if a customer discontinues a product which is the primary consumer of the ingredient that the sales person is selling, then this information needs to be communicated to the demand planner so that this can be incorporated into the underlying model.
Thus it is important to support the process of gathering overrides with the ability to communicate reasons and comments. Our experience is that if the sales persons’ input is directed and intelligently applied, it will improve the short term forecast.
Without proper guidance, sales input often reflects the “hockey stick effect” (”hockey stick effect”, A term coined by Jane Lee who ran the S\&OP process for the Ethylene Copolymers Business at DuPont for 10 years in the 90’s). Let’s say that the forecast for a product/customer combination is 90 for the next 6 months, with quarterly targets of 45. This is reflected in the statistical forecast of 15 units per month. Let’s say that for month 2, the sales person gets information that the customer’s plant will be taken off line for the month. The sales person immediately communicates that override, but because their metrics are tied to quarterly and annual quotas, and having the optimistic “can-do” outlook of most sales organizations, they also change the forecast in month 3 through 6 hoping that the customer’s plant will run at an accelerated rate for the rest of the year.

Coming out of the shutdown, the customer has some difficulty getting the plant running and communicates to the sales person that they will only order 10 units in month 3. The sales person modifies the statistical forecast accordingly, but being the perpetual optimist, feels that the customer must surely consume their quota and simply adds any unconsumed forecasts to the end of the budget period. This systematic shifting of the forecast to later periods looks like a “hockey stick” when plotted.
If the forecast overrides of this salesperson were to be considered in their entirety, it is very likely that any FVA analysis would suggest these inputs be ignored. But this would be like throwing the baby out with the bathwater because the sales person has critical information on what the customer will consume in the immediate future. It is just that their longer term numbers are affected by their metrics and other considerations.
Adjustments to a statistical forecast are not all provided at the same aggregation level in a typical business. The adjustments provided by individual sales persons may be at the product/customer level of detail. However, sales management generally provides input at an aggregated regional level. In some organizations, this is complicated by customers that span different geographical regions so that some sales managers with key customers have responsibilities that intersect with regional managers.

In this example, the initial statistical forecast is 40 and 60 for product A and product B respectively. The sales person changes only product “A” but the manager reduces the combined forecast to 90. There are a number of ways to adjust the forecast, two of which are illustrated on the next page:
It is not possible to determine if a forecast process step which involves the collective inputs of the sales person and the sales manager, has added value without first determining how the overrides or adjustments are applied over the horizon.
Marketing decisions including promotions are normally planned well in advance to allow for the preparation needed to implement the marketing initiative. More often than not, the initial forecast presented to the sales person for adjustment incorporates the marketing overrides that were entered months ago. The goal of marketing is to make certain that the impact of new marketing initiatives is reflected in the tactical forecast. This is information that is not in the historical data because in many statistical forecasting engines, the aberrations caused by previous marketing initiatives are deliberately removed to make the history “forecastable”.
Management changes apply to periods even further out in the future because their purpose is to align strategic business goals with the tactical forecast. They deal more with shaping the overall demand across different segments, industries, and regions. Again, these changes ought to be already incorporated in the forecast presented to the sales organization for change.

The typical forecasting step starts with an input forecast and results in an output forecast which takes into account the adjustments provided. These adjustments are provided at different levels of aggregation and for different time horizons.

In order to determine if a forecasting process step adds value, it is not sufficient to simply look at one type of input (for example, sales overrides) but an intelligent combination of inputs. Extending this further, different inputs (or the same) combined and aggregated differently can be thought of as different forecasts.
In their paper, “Improving Forecast Accuracy by Combination”, Feng Zhang and Robin Roundy propose an analytical framework that intelligently combines multiple forecast into a single and more accurate combined forecast. The method they propose creates weights for the different forecast, and combines them in an optimal way to generate an improved forecast.
Without going into the mathematics, or into the details of aggregation, let’s see how it would work in practice:

The analysis may indicate that the sales person’s input in the first period is very reliable, but the reliability progressive decreases in later periods. In weighing the different forecasts, it would assign a weight of 1.0 to the sales overrides and 0 to the rest. In the other periods, the weights would be assigned based on how the different forecasts have performed in the past. An example assignment of weights might be:

By multiplying the weights and the individual forecasts, we come up with a consensus forecast:

The real measure of Forecast Value Added (FVA) is if the consensus forecast proves to be better, and not whether the individual inputs improve the forecast.
In the previous example, we did not focus on the analysis that generated the weights used to combine different forecasts. While there are many schemes in the literature, Zhang and Roundy suggest a simple procedure that produces good results. While the actual method is not that critical, it is important that the weights adapt as business conditions and the forecasting process change. For example, the statistical forecast may improve over time so that sales modifications no longer add value.
This notion that forecast inputs can add value over some portions of the horizon, and not others is called Time Phased Forecast Value Added (TPFVA). Over time, the weights that are calculated to combine the forecasts change. If the weights associated with a particular input get too low, that input can be effectively eliminated. Conversely, an improvement in the process of gathering inputs from say marketing may result in more accurate marketing numbers. If this accuracy continues, the method of calculating the weights will begin to assign a higher weight to the marketing inputs, effectively recognizing the process change.
Using an adaptive weighting scheme like this has an additional advantage. It can distinguish between sales persons that provide accurate information and those that do not. If a particular sales person’s input is highly inaccurate, the weights assigned to it will be small and their overrides will be effectively eliminated. As their forecast improves, the weights will increase and their overrides will have a greater impact on the final forecast.
Ultimately our goal is to make a forecast more accurate and reliable so that it adds business value to the planning process. Increasing the accuracy of the forecast is not an end in itself. It is important only if it helps to improve the rest of the planning process.
Our experience indicates that sales inputs should be applied sparingly. A good indication that sales overrides are justified is if there is an accompanying cause or reason. It is also our experience that sales inputs should not be gathered too far into the future. This ends up “overriding the overrides” without any significant improvement.
The right question is not if each of these inputs adds value, but if each of these inputs can be combined in a meaningful way to create a better forecast that effectively integrates analytics with planner expertise. Professor Robin Roundy’s work that is now commercialized in Arkieva software provides a scalable method to accomplish this.