Portcast offers demand forecast for shipping lines, cargo airlines and forwarders. We generate forecasts for tens of thousands of time series per client based on origins, destinations, products and customer segments. We provide forecasts on different metrics like weight, size, value, package, price, revenue, etc.

All of that sounds a bit overwhelming and it is easy to lose focus when you are generating forecasts for these many time series. In this article, we show you how we analyse our forecasts and how we use treemap to prioritise our efforts.

There are many popular accuracy/error measurements for time series forecasts. In this article, all plots are generated based on weighted absolute percentage error (WAPE). The formula is given below which in layman's terms, it is the weighted average of all absolute errors divided by actuals. We are choosing this measure because it gives more weight to trade lanes that have more volume, which is in line with business needs.

$$ \text{WAPE} = \frac{\sum_{i=1}^{n}|p_i-a_i|}{\sum_{i=1}^{n}a_i} $$

Now that we have decided on the accuracy measurement, we need to decide on how to apply it. Let's say we are doing 90-day rolling forecasts, e.g. at any given point, we will predict how demand varies for the next 90 days. We will look at the measurement in three different ways:

We want to know how our accuracy deteriorates as we predict more into the future, e.g. our 1-day ahead prediction accuracy compared to our 90-day ahead prediction accuracy
We want to know our overall accuracy for different types of time series: high-volume smooth, low volume smooth, intermittent/sparse etc
We also want to know the overall accuracy by segments, e.g. our prediction accuracy for greater China vs our prediction accuracy for Japan

For 1, we created heatmaps and looked at days_ahead errors across all the forecast dates at the origin-destination level. We could see that errors do not change much based on the days_ahead in our case. There are certain patterns in errors because of the weekly moving window setting in our cross-validation. Problems with heatmap are: It is difficult to look at a more granular level as the total number of time-series increases. Also, it does not tell us anything about the type of time-series, whether it is important for our business case, intermittent, or low volume.

Sample heatmap for error analysis.

For 2, we generated box plots based on time-series type (high-volume smooth, low-volume smooth, intermittent/sparse). This gives us an overall idea about what our error levels are for a given product/customer category and time-series type. Here, Type-1 is basically a high volume smooth time series, important for our business case as it covers most of the total volume. Forecast error is less compared to other types here. Forecasts on Date-3 are worse because they fall into the COVID-19 period. Also, it tells us about the anomalies that we have. Although it gives us an overall idea about accuracy for different types of time-series, we need to dig deeper.

Sample box plot for error analysis

Problem: We are not able to generate enough actionable insights based on 1 and 2. 1 tells us our prediction does not deteriorate significantly as we predict more into the future. 2 tells us we have more anomalies for sparse time series which is expected. Neither are actionable. We want to see exactly where we need to improve, where we are already doing better, and if our forecasts are biased for certain product/customer categories or origins/destinations. Also, we want to highlight/give more importance to the efficiency of the forecasts for the high-volume (more important) time-series.

It is cumbersome to look at individual time-series plots, numbers, and mean statistics. We want something where we can get an overall idea by just looking at a few plots. This leads us to 3, using treemap to look at accuracy by segments.

Treemap