Modeling variables improves daily estimates of gas demand.

**Fred J. Connell** is manager, supply planning, in the energy supply services department of NiSource, Inc., parent company of Columbia Gas of Ohio.

On October 14, 2004, the *Columbus Dispatch* quoted Jack Partridge, president of Columbia Gas of Ohio, “I’m proud to say that my furnace has not been turned on yet.” Living in a household with five others, Partridge said, “I cave in when the dog starts shivering.”

Dogs differ. At what daily temperature do customers turn on their furnaces? Or more realistically, given individual behavior, over what range of temperatures do they turn on their furnaces? What is the impact of model error on the daily balancing requirement? How does the seasonal variation in ground temperature affect gas demand? Given that wind speed affects daily gas demand in the winter but not the summer, over what range of temperatures does wind speed begin to have an impact? How many cold days cause customers to “cave in”?

The National Oceanic and Atmospheric Administration (NOAA) “measures heating energy demand” from a base of 65 degrees F. Customer conservation and building thermal efficiency have reduced this base for Columbia Gas of Ohio customers. To estimate the current base for its customers, Columbia used daily demand and temperature data for the three-year period from April 2005 through March 2008 *(see Figure 1)*. Columbia included all customers with annual demand less than 15,000 MCF/year. This group includes all residential customers and the smaller commercial customers.

Columbia Gas of Ohio’s winter demand of about 2 million dekatherms (Dth) at 0 degrees F exceeds the summer daily demand of about 100,000 Dth by a factor of 20. The bend in the curve is gradual because customers don’t turn on their furnaces in unison. Columbia approximated this gradual bend by modeling three portions to the fitted curve, portions when none, half, and all of the customers have their furnaces turned on. If the temperature range with only half of the furnaces operating is Base ± Tran, then heating degree days (HDD) is defined as: HDD = max (0, 0.5 * (Base + Tran – Temp), Base – Temp), where Temp is the average of the high and low temperatures for the day. Regression yields best fit values: Base = 59 degrees F and Tran = ±6 degrees F, so that as an approximation, half of the customers have their furnaces turned on the temperature range from 53 degrees F to 65 degrees F.

Figure 2 shows the model errors or “residuals,” the daily difference of actual less fitted demand. Residuals on a few days have magnitude greater than 200,000 Dth/day. The root mean square error (RMSE) is 61,843 Dth/day. By contrast, the model using NOAA’s Base = 65°F and no transition range has RMSE = 75,835 Dth/day.

Columbia uses storage injections and withdrawals to balance the difference between daily gas supply and the demand of its customers. CHOICE marketers, who serve about 45 percent of Columbia customers, deliver gas as a function of temperature according to demand curves provided by Columbia. By reducing model error in those demand curves, Columbia reduces the daily balancing requirement and the required storage capacity. The demand curves have both the historical model error discussed here plus forecast uncertainty: Price changes and other factors may cause future customer load patterns to differ from the historical pattern.

#### Model Error and Daily Balancing

Columbia uses several variables to reduce model error. First is ground temperature. Water companies in Ohio bury the pipes to residences about 3-feet deep to avoid freezing. Colder ground temperature in the winter increases the energy per gallon to heat water. Colder ground temperature also increases the heat loss through basement walls, increasing the load on the furnace.

Ground temperature varies sinusoidally. In a 1976 article, Williams and Gold wrote, “The temperature of the ground surface remains almost in phase with that of the air. Below the surface, however, the maximum or minimum occurs later than the corresponding values at the surface….”^{1} At the 3-foot depth of water pipes, the coldest temperature occurs in February, generally a month after the coldest air temperature.

Columbia added sine and cosine terms to its model to measure the effect of ground temperature. According to the regression, the combined sine/cosine curve has amplitude 51,611 Dth/day and peaks in February, the month of coldest ground temperature. Ground temperature increases daily demand by 51,611 Dth in February, and decreases demand by the same volume six months later, in August *(see Figure 3)*.

The second variable is wind speed *(see Figure 4)*. According to a Penn State study, after cold outside temperature, “wind is the second greatest source of heat loss (from buildings) during the winter…. In fact, up to one-third of the annual residential heating energy goes to heat … infiltration air many times each winter day.”^{2} But wind has negligible effect on summer gas usage, largely water heating. Over some range of temperatures, wind begins to have an effect. Columbia used a logistic curve to model this increasing effect *(see Figs. 5 and 6)*. Adding wind to the model reduces the RMSE to 45,571 Dth/day.

The third variable is the number of cold days. Successive cold days tend to cause increasing daily demand because the cold builds up in the dwelling structure. And psychologically, residents tire of the cold, cave in, and crank up the thermostat.

To measure this impact, Columbia added HDD from prior days to its model. Columbia tried periods of 1, 5, and 10 prior days. For the 5- and 10-day periods, Columbia assigned half the weight to HDD on the first prior day, and assigned the remainder to the earlier days, with each earlier day getting less weight *(see Figure 7)*. Adding prior day HDD to the model reduces the RMSE *(see Figure 8)*.

Continuing on to 15 prior days provided diminishing returns: RMSE reduced only slightly to 30,426 Dth/day. Columbia’s results use the 10-day lag period.

The fourth demand variable is the effect of holidays and weekends. Large industrial customers have large demand reductions on holidays and weekends. However, for residential and small commercial customers, the reductions are small: 18,487 Dth/day on holidays, 10,215 on Fridays, 18,019 on Saturdays, and 7,650 on Sundays. Including holidays and weekends in the model reduces the RMSE to 29,679 Dth/day. These reductions represent gas days which, in the Eastern time zone, run from 10 am to 10 a.m.—*e.g.*, the Friday gas day ends at 10 a.m. Saturday.

#### Model With Variables

When the model results include all explanatory variables, the fitted demands, plotted against temperature, are fuzzy *(see Figure 9)*, in contrast to Figure 1, where the fitted demands are crisp. Days with a given temperature have a variety of wind speeds, prior day temperatures, *etc.*, causing these fuzzy results. Figure 10 shows the residuals, most of which have amplitude less than 100,000 Dth/day. The RMSE, 29,679 Dth/day, represents a 52-percent reduction in the 61,843 Dth/day RMSE of the original model that included HDD as the only explanatory variable.

Figure 11 shows the values of the coefficients. The values of Base, Tran, the intercept and the sine amplitude have changed from those reported earlier, the result of adding successive variables to the model.

This approach to modeling effectively reduced model error in Columbia Gas of Ohio’s demand curves, reducing the company’s daily balancing and storage capacity requirements.

**Author’s Technical Note:** Analysts usually use linear regression to determine the best-fit model. However, four of the parameters in the Columbia model are non-linear: Base, Tran, Bet, and Alp, requiring the use of non-linear regression. Columbia used Statistical Analysis System (SAS) PROC NLIN.

The following example illustrates the non-linearity of Base. Assuming a model with Base = 60 degrees F and two groups of customers:

**Group 1**: Demand = 100,000 + 25,000 * max(0, 60 degrees F – Temp), and

**Group 2**: Demand = 10,000 + 3,000 * max(0, 60 degrees F – Temp), then if all the coefficients are linear, the combined group has demand as follows:

**Combined Group**: Demand = 110,000 + 28,000 * max(0, 120 degrees F – Temp), The combined group would indeed have an intercept of 110,000 Dth and a temperature response of 28,000 Dth/degrees F, but would have a HDD Base of 60 degrees F, not 120 degrees F. The model is non-linear in the Base.

Columbia used non-linear regression to determine Base and the other non-linear parameters. Alternatively, one could use linear regression with trial and error, trying Base = 65 degrees F, 64 degrees F, 63 degrees F, *etc.*

#### Endnotes:

1. G.P. Williams and L. W. Gold, Ground Temperatures, *Canadian Building Digest*, July 1, 1976.

2. David Meredith, Penn State Institutes of Energy and the Environment, *Intro to Building Environmental Systems*, Chapter 10*.*