Empir Econ DOI 10.1007/s00181-015-1054-4
Power laws in oil and natural gas production Andrew Balthrop1
Received: 24 March 2014 / Accepted: 3 December 2015 © Springer-Verlag Berlin Heidelberg 2016
Abstract Power-law distributions can be generated by a variety of theoretical processes and have been found in many areas of economics and finance. This paper demonstrates that the distribution of cumulative oil and gas recovery in Texas is best described by a power-law distribution, with exponent approximately equal to 1.1 for oil production and 1.6 for gas production. Estimation is carried out using lease and welllevel data from a cross section of over 600,000 observations gathered by DrillingInfo. The goodness of fit of the hypothesized power law is verified with regression-based and likelihood ratio tests, as well as nested and composite hypotheses. Results are significant because they show that production data are heavy tailed, that empirical variance estimates do not converge, and that 1 % of oil leases are responsible for 70 % of cumulative recovery. The distribution has consequences for efficient management and implications about peak oil because wells close to the mean account for so little of the cumulative distribution of recovery. Keywords Hydrocarbons · Oil · Natural gas · Power law · Pareto distribution · Scaling distribution · Fractal
1 Introduction The median well in Texas produces around 8,625 barrels (bbls) of oil over a 3-year span (Kellogg 2010). In comparison, when Spindletop was first tapped in 1901, it produced 100,000 bbls per day. Mean cumulative lease recovery in Texas is 73,110 bbls; this contrasts to Al-Ghawar field in Saudi Arabia which has produced over 55 billion
B 1
Andrew Balthrop
[email protected] 8317 Freret Street, New Orleans, LA 70118, USA
123
A. Balthrop
barrels, frequently accounting for over 60 % of Saudi yearly production and 6–8 % of world production (Simmons 2005). Such figures provide anecdotal evidence that the distribution of oil production is disperse. More to the point, when compared to the mean, the most productive areas are far more productive than the average plot of land. In fact, 75 % of oil ever discovered lies within less than 1 % of fields (Sorrell et al. 2012). This paper extends earlier work by mathematicians, geologists and geophysicists (Drew et al. 1982; Mandelbrot 1995; Barton and Scholz 1995; La Pointe 1995; Turcotte 2002), and leverages recently developed techniques to demonstrate that the distribution of lease productivity is well fitted by a power-law distribution. “Fat tails” have policy and managerial significance, as well as significance for empirical research. For example, the Macondo well associated with the Deepwater Horizons Oil Spill was comparatively productive, pouring 62,000 bbls of oil into the Gulf of Mexico each day—more than the median well recovers over the course of its entire life. Statistical analysis based on thin-tailed distributions (such as the normal) might dismiss the spill as an unlucky fluke that a blowout should occur on such a productive well. However, when the distribution of production is appropriately characterized, one finds a surprising degree of production occurring in the right tail. Assuming blowouts are random across leases, then the possibility of “black swan” spills of the magnitude of Deepwater Horizons cannot be discounted. At the same time, power-law distributions have the property of being simultaneously robust and fragile (Newman 2010). By focusing oversight and resources on only the most productive wells, largest spills may be prevented. The history of recovery from Al-Ghawar field demonstrates that what is important for understanding aggregate oil production is not what the mean field produces, but what the extreme right tail of the distribution produces. The distribution of cumulative recovery is not described by a thin-tailed distribution like the normal, where most of the mass is clustered around the mean. Rather, a power-law distribution is a much better fit to the data. Power-law distributions are of the form p(x) = cx −α , where the probability, p(x), is given by random variable, x, and power-law density exponent, α, which is also known as the scaling parameter. Power laws are often identified by their linearity when graphed on double-log paper. The frequency of an observation is inversely proportional to its size, or, said differently, the frequency of an observation varies proportionally with its rank in the distribution. Famous examples are the distribution of wealth, where the probability that an individual has wealth greater than x follows a power-law distribution (Pareto 1896), and the frequency of word usage, where the frequency of usage varies inversely proportionally according to its rank (Zipf 1949). In each case the distribution has fat tails: a few individuals are surprisingly wealthy, and a few words occur with great frequency. In this paper I examine the distribution of oil and natural gas recovery in Texas and find that recovery is power-law-distributed, which is to say that the probability that a lease recovers more than x barrels of oil is proportional to 1/x. This study is not the first to characterize oil and gas in this manner: geologists have used power laws to model oil and gas reserves since Drew et al. (1982). Nevertheless, this property of minerals has not appeared in economic studies, even though it is of great consequence to the analysis of production. Moreover, recent research has made available new tools that allow more rigorous and precise characterization of the distribution of cumulative
123
Power laws in oil and natural gas production
production, including new regression-based methods (Gabaix and Ibragimov 2011), rigorous estimation of cutoff threshold and goodness of fit (Clauset et al. 2009) and tests of composite and nested distributions (Malevergne et al. 2005, 2011). Using a Texas data set composed of over 600,000 observations, I test the goodness of fit of the power-law distribution among several competing distributions. I find that the probability of recovering more than x barrels of oil is well approximated in the tail of the data by a power law given by P(Recovery > x) = k/x γ , with γ 1.1 for cumulative oil recovery, and γ 1.6 for cumulative natural gas recovery.1 These estimates are consistent across regression-based and maximum likelihood estimators. The paper proceeds as follows. Section 2 contains a literature review of important power laws in economics as well as other fields. The data set is summarized in Sect. 3. Section 4 gives a brief description of power laws. Statistical methods and results are discussed in Sect. 5. Section 6 concludes.
2 Literature review Power laws are ubiquitous in economics and elsewhere. One reason for this is that power laws can be generated by a variety of processes, including models of highly optimized tolerance (Carlson and Doyle 1999), models of preferential attachment (Willis and Yule 1922; Simon 1955), combinations of exponential distributions (Jones 2015), models of networks and phase transition (Newman 2010), models of selforganized criticality (Bak et al. 1987; Sornette 2006) and others.2 Another explanation for the ubiquity of power laws is that they account for one of three classes of limiting distributions of extreme values (Reiss and Thomas 1997; de Haan and Ferreira 2006). In economics, power laws have been used to describe the distribution of firm size in terms of employees and receipts (Axtell 2001), the distribution of firm income (Okuyama et al. 1999), the distribution of city sizes (Gabaix 1999; Gabaix and Ioannides 2004) and perhaps most famously, the distribution of income and wealth (Atkinson and Piketty 2007; Pareto 1896). In the realm of financial economics, there is evidence to suggest that stock returns and trading volume are power-law-distributed (Lux 1996; Gabaix et al. 2003; Gabaix 2009; Mandelbrot 1997; Plerou et al. 2000; Gopikrishnan et al. 1999, 2000). Power laws are prevalent in consumer markets as well. Kohli and Sah (2003) find that market share for brands of food and sporting goods is well described by the power-law distribution. Cox et al. (1995) find the number of performers producing n gold records to be power-law-distributed. Fox and Kochanowski (2004) corroborate this result, as well as providing a theoretical explanation. Adamic and Huberman (1999) show the distribution of webpage views are power-law-distributed. Finally, even research output in the economics literature is power-law-distributed (Cox and Chung 1991). 1 The literature reports both the exponent for the probability distribution function, α, and the exponent
for the counter-cumulative distribution function, γ . For clarity, if the the counter-cumulative distribution is P(Recovery > x) = kx −γ , then the associated PDF is p(x) = kγ x −(γ +1) . With γ + 1 ≡ α, the pdf can be written more succinctly as p(x) = cx −α .
2 Gabaix (2009) provides a primer on power laws.
123
A. Balthrop
While power laws have been fitted in a variety of economic subdisciplines, economists have not applied the power-law distribution to explain the production of natural resources. Yet an understanding of the distribution of natural resources is critical to any statistical analysis of production and has managerial and even macroeconomic consequences. Geologists have been using power laws to model hydrocarbon reserves since the work of Drew et al. (1982); Benoit Mandelbrot modeled reserves and cumulative production using power laws even earlier in a then unpublished IBM research note [which has since been published in the collection of papers edited by Barton and Scholz (1995)]. The US Geological Survey (USGS) and Minerals Management Service (MMS) even incorporated power laws into their estimation of US nationwide reserves in 1989. Turcotte (2002) and La Pointe (1995) enumerate the geophysical reasons for why reserves accumulation conforms to power laws. This in turn explains why cumulative production is power-law-distributed, because power laws tend to be conserved through a wide range of mathematical transformations (Gabaix 2009). Yet there is a competing tendency in the geophysical literature to model mineral reserves as lognormal distributed (Krige 1960; McCrossan 1969; Rendu 1988). Kaufman (1993) summarizes research on field assessment and undiscovered reserves and argues that the lognormal distribution provides an excellent fit to field size for all but the very largest and very smallest fields. The choice of distribution is not merely a point of academic dispute—it has economic consequences. Attanasi and Charpentier (2002) examine the consequences of using a lognormal versus a power-law distribution in the estimation of undiscovered oil and gas reserves. Using a lognormal rather than a power law reduces the expected value of total undiscovered oil by 16 % and total gas by 15 %. The USGS and MMS revised field assessment methodologies (partially based on power laws) resulted in assessments that are 33 % lower than previous estimates for oil and 40 % lower for gas (Kaufman 1993). Malevergne et al. (2011) explain why the power law and lognormal can be difficult to distinguish, and develop a uniformly most powerful test for the purpose (this test is implemented in Sect. 5.3). Barton and Scholz (1995) and La Pointe (1995) explain the difference between the distributions as a matter of economic truncation. The power law better fits the right tail of the data early in development because the largest fields are generally the first to be discovered and developed. As more small fields are discovered and brought into production, more mass is accumulated in the lower part of the distribution, so that the left-hand side can be increasingly well approximated by a lognormal distribution. This paper differs from earlier research in that its focus is an economic variable, production, as opposed to trap size, field size or undiscovered reserves. Production, while clearly related to the size of the trap, is not solely a geological variable. As economic research has had renewed focus on hydrocarbon production, it is important that economists understand the distribution of hydrocarbon recovery. This study highlights the fact that recovery is driven by the tails of the distribution, which is not widely known in the economics literature, and is worthy of emphasis. There have also been advances in econometric techniques in identifying and characterizing power laws. The earlier geophysical literature is primarily regression-based and graphical. More rigorous methods of characterization have since been developed by Clauset et al. (2009), Gabaix and Ibragimov (2011), Malevergne et al. (2005) and Malevergne et al.
123
Power laws in oil and natural gas production
(2011). These techniques are implemented in Sect. 5. Malevergne et al. (2011) is particularly useful in addressing the use of lognormal versus the power-law distribution in modeling recovery.
3 Data Identifying power laws is data intensive. Power-law behavior is most apparent in the tails of a distribution, which is also where there are the fewest observations. It therefore takes very large data sets to be able to distinguish between different distributions. The data set provided by DrillingInfo (formerly hydrocarbon production incorporated), which compiles time series for oil and natural gas production for 31 states as well as the federal offshore areas in the Gulf and the Pacific, is uniquely well suited for the purpose. Here, I limit my focus to the oil leases and gas wells in Texas (oil is reported at the lease level, gas at the well level). Time series for these data go back to as early as 1934. I focus on cumulative oil production and cumulative gas production yielding 591,764 observations. The data set is truncated in the sense that no production is observed after October 2011. This data truncation causes the variables of interest (cumulative oil and gas recovery) to be censored. There are observations still in production for which cumulative production represents only some fraction of potential cumulative recovery. For the right 5 % tail of cumulative recovery, completed production spells are observed for 68 % of oil leases and 62 % of gas wells. The question is whether, given enough time, the right tail of the distribution would change either because new productive areas are discovered, or because existing areas continue to be produced. Two features of oil and gas production ensure that the distribution is likely to be stable into the future. First, the largest fields contribute most to the right tail of cumulative production, and these fields are developed first (Kaufman 1993; Arps and Roberts 1958). As an example, Drew (1990) follows the discoveries in the Frio Strand Plain (onshore) in Texas. Over time, the discovered new field sizes decline. After 1960, there are no new discoveries of fields in excess of 100 million barrels of oil equivalent, the largest fields in the formation being greater than 1000 million barrels of oil equivalent; from 1975 through 1985, the majority of fields discovered are less than 1 million barrels.3 Texas as a whole reflects this pattern as well; it is in a mature state of development, with comparatively few large onshore discoveries since World War II. It is unlikely that new giant fields will be discovered and produced that would significantly alter the tail distribution of cumulative recovery. The second feature of petroleum production that tends to keep the distribution of cumulative recovery stable over time is the production dynamics of the field. After a short ramp-up period, a field will plateau at maximum production before going into exponential decline. The International Energy Agency estimates that the giant fields in North America have entered into the decline phase of recovery, being 78 % depleted 3 The issue of censored production is more salient in the estimation of the lognormal distribution. As more time passes, the mode of the distribution would progressively decrease as the less productive fields come into production (Barton and Scholz 1995).
123
A. Balthrop Table 1 Moments of sample distribution Gas (MCF)
Oil (bbls)
Mean
5.336 × 105
7.311 × 104
Median
1.068 × 104
266
Maximum
2.492 × 109
1.189 × 109
Minimum
0
0
Variance
4.115 × 1013
7.892 × 1012
Skewness
2.168 × 102
2.720. × 102
Kurtosis
6.659 × 104
9.469 × 104
Summary statistics from 591,764 observations of cumulative hydrocarbon production in Texas. Oil production is reported at the lease level in barrels (bbls), and natural gas is reported at the well level in thousand cubic feet (MCF) Data provided by DrillingInfo
(Agency 2008). Typical decline rates are between 4 and 6.5 % per year (Sorrell et al. 2012). Thus, continued production from the large censored fields would change the distribution of cumulative recovery only slowly. Summary statistics for the sample distributions can be found in table 1. Both the oil distribution and gas distribution are right-skewed: for both samples, the mean is significantly higher than the median, and the sample estimates for skewness are large and positive. Also important, and what is very suggestive of the power-law behavior of the distribution, is that the data span nine orders of magnitude. The minimum cumulative oil and gas produced are 0, while the maximum in both cases is measured in billions. The dispersion is not a result of the units chosen, and parameter estimates of power-law distributions are independent of the units of measure.4 For gas production, 50 % of the data lie within the first 4 orders of magnitude, and 50 % of observations lie within the first 3 orders of magnitude for oil production. Large sample estimates for kurtosis indicate substantial weight in the tails of the distribution.
4 Power laws For cumulative oil and natural gas recovery to be power-law-distributed, the probability that recovery is greater than a threshold, x, must be given by P[X > x] =
k xγ
(1)
where γ > 0 is a parameter to be estimated and k is a constant. Intuitively, the frequency of an observation varies inversely according to the size of the observation. Kernel density plots of the data sets in Figs. 1 and 2 clearly demonstrate that a power law 4 Indeed, they are the only family of distributions where the parameters do not depend on the units of measurement; hence, they are also known as scale-free or scaling distributions (Farmer and Geanakoplos 2008).
123
Power laws in oil and natural gas production
Fig. 1 Graph represents 591,764 observations. X-axis is the natural logarithm of oil production in barrels (bbls). The solid line is a Gaussian kernel-smoothed estimate with a bandwidth of .2998 log points
is not adequate to describe the whole distribution. A power law predicts a monotonic decline in the probability of recovery through the entire support. The figures, however, show ranges where the probability is increasing. More generally, for any power-law distribution, as x → 0 the probability of the event diverges. A cutoff parameter is needed to define where the power-law distribution takes hold. In this paper, I define samples for analysis based on two cutoff points. Power-law distributions are usually thought of as tail distributions, and so I initially limit observations to those in the right-hand 5 % tail as recommended by Gabaix (2009). The 5 % samples (one for oil, one for natural gas) each contain 29,588 observations. In the second set of samples, I estimate the cutoff using techniques described in the following paragraphs. The probability distribution function for a power law is then given by p(x) =
α−1 xmin
x xmin
−α
,
(2)
where α is the parameter of interest. The parameter xmin is the threshold at which the power-law behavior begins. The associated moment generating function for the power-law distribution is given by ∞ α−1 m xm , x m p(x)dx = (3) x = α − 1 − m min xmin where p(x) and xmin are as in Eq. 2. It is important to take note that moments become infinite unless m < α −1. This paper presents evidence that α < 3 for both oil and gas.
123
A. Balthrop
Fig. 2 Graph represents 591,764 observations. X-axis is the natural logarithm of gas production in thousand cubic feet (MCF). The solid line is a Gaussian kernel-smoothed estimate with a bandwidth of .3701 log points
This implies that there is no finite moment beyond the mean. Of course, in any finite sample it is possible to calculate higher-order moments such as variance, skewness and kurtosis. Nevertheless, as the sample size is increased, the sample estimate for moments will increase, never converging to anything. There is a finite amount of oil and gas resources on the planet, a finite number of wells that can be drilled, and so practically speaking, the moments cannot increase without bound. Yet the power law still captures important aspects of the distribution. To drive the point home, take an example from Newman (2005). The magnitude of flooding is thought to be power-law-distributed with α < 2. In this case there is not a well-defined mean for the distribution. It is possible to calculate the average flood from the historical data, but this is not particularly useful, because most of the data will be far from that average. The quantiles of the distribution are informative; however, which is why instead of talking about the average flood we rate flood severity relative to historic floods, such as the Great Mississippi Flood of 1927.
5 Methodology and results The paper employs three methods for estimating the power-law exponent. First, I implement a set of maximum likelihood procedures to estimate the parameter of the distribution. It is possible to estimate statistically significant parameters using maxi-
123
Power laws in oil and natural gas production
mum likelihood tests even though the distribution is mis-specified. I therefore perform robustness tests under the hypothesis that the tail distribution for recovery is something other than power law. These tests indicate the power law as the best fit for the data. A straight line on log–log plot made up of nearly 30,000 observations provides further graphical evidence for the power-law distribution of the data. Finally, in Sect. 5.3, I investigate the performance of power laws relative to more general distributions. 5.1 Maximum likelihood The log likelihood function is written as n xi ln(α − 1) − ln xmin − α ln L= xmin i=1
= n ln(α − 1) − n ln xmin − α
n
ln
i=1
xi . xmin
(4)
By taking the derivative of the log likelihood function with respect to α, setting it equal to zero and solving for α, the maximum likelihood estimate is α
MLE
=1+n
n i=1
ln
xi
−1
xmin
;
(5)
the standard error of the estimate is σ =
α MLE − 1 . √ n
(6)
One point of contention with MLE estimation is specifying the threshold parameter, xmin . Researchers have traditionally “eyeballed” the data to determine where powerlaw behavior begins. Gabaix (2009) recommends limiting the analysis to data at the 95 % quantile and above, which I follow for one set of estimates. The choice of xmin can be more rigorously data driven. A common technique is to choose xmin in order to minimize the mean squared error (MSE) of the power-law parameter estimate (Lux 2000). In this paper, I estimate xmin following the procedure recommended by Clauset et al. (2009), which minimizes the Kolmogorov–Smirnov (KS) goodness of fit statistic. That is, xmin is chosen to minimize the distance between the empirical cumulative distribution, E(x), and the estimated power-law cumulative distribution ˆ function, P(x). The KS-statistic is given as ˆ K S = max |E(x) − P(x)|. x≥xmin
(7)
The search for the appropriate xmin is a two-step process, where a candidate xmin is chosen and the power law is estimated; then, the KS-statistic is calculated for each
123
A. Balthrop Table 2 Power-law estimates, 5 % tail
Gas γ Hill
Estimates for power-law exponent based on 5 % tail of cumulative recovery for Texas gas wells and oil leases. Standard errors are in parentheses
γ OLS Observations
Oil
1.207
1.044
(0.0070)
(0.0061)
1.368
1.079
(0.0113)
(0.0089)
29,588
xmin . This procedure gives estimates at least as good as those minimizing the MSE, while being more straightforward to implement (Clauset et al. 2009). The thresholds that minimize the KS-statistic are reported in Table 3. Asymptotically equivalent to α MLE −1 is Hill’s estimator for the counter-cumulative parameter, which is given by γ Hill = n−1
(n − 2)
i=1 (ln x i
− ln xmin )
(8)
The standard error for the Hill estimator is given by γ Hill (n − 3)−1/2 . Finally, the power-law exponent can be estimated via the following OLS specification. ln(i) = β0 − γ OLS ln xi + i
(9)
where (i) represents the observation’s rank in the distribution, β0 and γ OLS are the parameters to be estimated, and i is the error term. The asymptotic standard error is given by γ OLS (n/2)−1/2 . Results for the sample of the 5 % tail of the distribution are presented in Table 2. The threshold for the 95 % cutoff for oil is 175,375 bbls, and the threshold cutoff for gas is 2,051,885 MCF. For both cumulative oil and gas recovery, OLS estimates exceed the Hill estimate. Cumulative oil recovery is very close to following Zipf’s law (a powerlaw relationship where γ = 1). The Hill estimate for oil recovery implies that 82.5 % of oil is recovered on 1 % of the leases. For natural gas recovery, the exponent is slightly larger. The parameter implies that 1 % of wells account for 45.5 % of cumulative gas recovery. For both oil and natural gas, estimates of the scaling parameters imply the distributions have non-convergent variance. Results for estimates where the sample is chosen to minimize the Kolmogorov– Smirnov statistic are presented in Table 3. The procedure chooses a threshold cutoff, xmin , farther in the right tail for both oil and natural gas; therefore, the sample size diminishes. Compared to the 5 % threshold, the estimated exponent for natural gas increases substantially. Although less pronounced, the exponent increases for the sample of oil leases as well. Again, parameter estimates indicate that neither distribution has a finite variance. Exponents imply that 1 % of gas wells recover 16.9 % of gas, while 1 % of oil leases are responsible for 70 % of oil recovered.
123
Power laws in oil and natural gas production Table 3 Power-law estimates, estimated threshold
Gas γ Hill
Estimates for PL exponent are based on samples determined by a cutoff estimated to minimize the KS-statistic (xmin ) Standard errors are in parentheses
γ OLS
Iil
1.629
1.086
(0.0280)
(0.0093)
1.613
1.098
(0.0392)
(0.0132)
xmin
1.160 × 107 MCF
3.725 × 105 bbls
KS
0.0105
0.0051
Observations
3379
13780
Table 4 Gabaix–Ibragimov test of power law
Gas
Oil
Test statistic
1.790 × 10−10
4.110 × 10−10
Threshold
0.0080
95 % Sample
Estimated threshold sample For Gabaix–Ibragimov test statistics, reject PL distribution if test statistic> threshold
Test statistic
1.569 × 10−8
2.185 × 10−8
Threshold
0.0237
0.0117
5.2 Robustness tests Parameter estimation alone is not sufficient to assert oil, and natural gas recovery is power-law-distributed. Although the data appear linear on a log–log graph, over short enough intervals, other distributions, such as lognormal, can also appear linear. To test the distributional assumption, I implement a test based on the linear regression proposed by Gabaix and Ibragimov (2011). Define x ∗ as Cov (ln x j )2 , ln x j , x = 2Var(ln x j ) ∗
(10)
then regress the following equation, 1 = α + ζ lni +q(ln xi − x ∗ )2 + i . ln i − 2
(11)
The test statistic is q/ ˆ ζˆ 2 . The null hypothesis that cumulative oil and natural gas recovery are power-law-distributed is rejected if ˆqˆ2 > 1.95(2n)−1/2 . Results of the ζ tests are printed in Table 4. The null hypothesis cannot be rejected in either the 95 % or the estimated threshold samples. Log–log plots of the data provide further evidence of the appropriateness of the distribution. Graphs of the data can be found in Figs. 3 and 4. By taking the logarithm of both sides of Eq. 1, it is apparent that a power-law distribution implies a
123
A. Balthrop
Fig. 3 Log cumulative production versus log empirical probability. The data and their empirical probabilities are plotted with circles. The likelihood maximizing power-law distribution is the solid line, the exponential is the dashed line, and the lognormal is the dotted line. The data are the 5 % tail
linear relationship between the logarithm of counter-cumulative probability and the logarithm of recovery. The slope of the line is the power-law parameter. Circles in Figs. 3 and 4 represent empirical probabilities. The best fitting distributions from the class of power-law, exponential and lognormal distributions are overlaid. The best fit exponential and lognormal distributions do not fit the data in the extreme tails of the distribution. Figures 5 and 6 show the same best fitting distributions for the samples with selected cutoffs—note, however, this competition is rigged because the data sets were cut to achieve the best possible fit to a power law. The qualitative evidence provided by the graphs can be made quantitative by implementing a likelihood-ratio-type test, as recommended by Clauset et al. (2009). The test compares the predicted likelihoods of two competing distributions, favoring the distribution that is more likely. In particular, the test is computed as n
ln pˆ1 (xi ) − ln pˆ2 (xi ) , R=
(12)
i=1
where pˆ1 (x) and pˆ2 (x) are the probabilities predicted by two distributions. The predicted values are obtained after estimating parameters for the competing distributions via maximum likelihood. Clauset et al. (2009) demonstrate that R is normally distributed and give formulas for calculating p-values. I compare the likelihoods computed
123
Power laws in oil and natural gas production
Fig. 4 Log cumulative production versus log empirical probability. The data and their empirical probabilities are plotted with circles. The likelihood maximizing power-law distribution is the solid line, the exponential is the dashed line, and the lognormal is the dotted line. The data are the 5 % tail
under the assumption of power law to those under the assumption of exponential and lognormal. Results are presented in Table 5. Results indicate that the power-law distribution has much better fit than the competing distributions across both samples for both oil and natural gas. 5.3 Nested and composite distributions Two further robustness tests of power-law findings are possible. First, any power law can be be approximated to an arbitrary degree of accuracy by a stretched exponential distribution Malevergne et al. (2005). Because the power law is nested within the family of stretched exponentials, it is possible to use Wilks’s test to test the null hypothesis that a power law is sufficient to describe the data Malevergne et al. (2005). The density of the stretched exponential distribution is given by P(x|b, c) = bu c x c−1 e− c (( u ) b
x c
−1)
(13)
where b and c are model parameters. From this it is possible to perform the Wilks test to see whether the more parsimonious power-law distribution characterizes the data as well as the stretched exponential distribution. The test is:
123
A. Balthrop
Fig. 5 Log cumulative production versus log empirical probability. The data and their empirical probabilities are plotted with circles. The likelihood maximizing power-law distribution is the solid line, the exponential is the dashed line, and the lognormal is the dotted line. The data are the endogenous threshold sample that minimizes the KS-statistic
Table 5 Likelihood ratio tests of competing distributions PL exponential
PL lognormal
Gas 95 % Sample Estimated threshold sample
18560.26
15585.86
(0.000)
(0.000)
2819.377
1441.832
(0.000)
(0.000)
Oil 95 % Sample Estimated threshold sample
30499.41
32302.33
(0.000)
(0.000)
13827.24
14751.59
(0.000)
(0.000)
Likelihood ratios computed as power-law log likelihood minus competing distribution log likelihood. Positive numbers indicate the power-law distribution is the better fit P-values for significant differences in likelihoods are in parentheses
123
Power laws in oil and natural gas production
Fig. 6 Log cumulative production versus log empirical probability. The data and their empirical probabilities are plotted with circles. The likelihood maximizing power-law distribution is the solid line, the exponential is the dashed line, and the lognormal is the dotted line. The data are the endogenous threshold sample that minimizes the KS-statistic
W = 2 × supL 1 − supL 0
(14)
L 1 is the likelihood function of the stretched exponential distribution, and L 0 is the likelihood function for the power-law distribution. For the power law, maximum likelihood estimates of parameters are the same as in Tables 2 and 3. The procedure for estimating maximum likelihood parameters for the stretched exponential distribution and implementing the Wilks test can be found in Appendix D of Malevergne et al. (2005). The test statistic, W , is distributed χ 2 with one degree of freedom in the limit as N → ∞. Results of the test are reported in Table 6. The stretched exponential distribution provides a better fit for the data for both oil and gas samples constructed from the right 5 % tails. Nevertheless, both oil and gas samples possess a threshold at which the the power-law distribution provides an adequate description of the data. Indeed, the stretched exponential distribution does not perform any better statistically for the samples which were cut to achieve the best power-law fit in Table 3. I also fit the data to a composite distribution composed of a lower-truncated lognormal distribution and a power-law distribution. Malevergne et al. (2011) propose a uniformly most powerful unbiased test that the upper part of a distribution follows a power law against the alternative of lognormal as follows:
123
A. Balthrop Table 6 Wilks’s test: power law versus stretched exponential Oil 5 %
Oil thresh.
Gas 5 %
Gas tresh.
b
1.004
1.071
0.990
1.635
c
0.041
0.015
0.249
−0.005
u
1.754 × 105
3.725 × 105
2.052 × 106
1.160 × 107
W
36.632
2.405
484.582
0.028
P-value
0.000
0.121
0.000
0.867
Stretched exponential
The table contains results from Wilks’s test described in Eq. 14. The column Oil 5 % performs estimates on the 5 % sample of oil observations, Oil Thresh. on the sample where the threshold has been estimated, and similarly for gas. Parameters for power-law estimates are given in Tables 2 for the 5 % tail sample and Table 3 for the estimated threshold sample. W is distributed χ 2 (1), so the null hypothesis that a power law is sufficient to explain the data is rejected at the 5 % level when W > 3.84 Table 7 Composite power law lognormal Oil
Gas
Threshold (u)
2.929 × 105
3.244 × 106
γ0
1.2496
1.4579
Results from equation 15. Threshold (u) represents the threshold at which the power law takes hold, while γ0 is the resulting power-law exponent
g(γ /ξ ) −γ ln(x/u)− f 1 (x|u, γ , ξ ) = ξx G(γ /ξ ) e u γ0 f 2 (x|γ0 , u) = γ0 x 1+γ0
ξ2 2 2 [ln(x/u)]
if x ≤ u if x > u (15) g() and G() represent the standard normal density and cumulative distribution function, respectively. Parameters are chosen to maximize the composite likelihood equation, L = p n 1 L 1 (1 − p2 )n 2 L 2 , where L 1 is the lognormal likelihood below the threshold u, and L 2 the power-law likelihood above the threshold. Estimation is implemented according to the procedure in Cooray and Ananda (2005), with results listed in Table 7. The composite likelihood test finds the onset of the power law to be further to the above than the 95 % quantile, but lower than the estimated threshold sample. For the 95 % quantile sample, 61 % of natural gas observations lie above the threshold, and 59 % of oil observations lie above the threshold. This corroborates findings in Table 5 that the power law provides a better fit to the data better than the lognormal distribution (the majority of the sample is better fit by the power-law distribution). f (x|γ0 , u, γ , ξ ) =
5.4 Results summary Power-law parameters are estimated for the right 5 % tail of cumulative oil and gas production and for a sample cut from the full distribution in order to produce the best power-law fits (Table 3). Regression-based tests and tests of competing distributions
123
Power laws in oil and natural gas production
(lognormal and exponential) indicate that the power law provides an excellent fit to the data, and that power-law behavior is well established in the right 5 % tail of the distribution. Hypothesis tests on nested and composite distributions, however, indicate that power-law behavior begins at higher thresholds than the 95 % quantile. These tests do not reject the estimates of the estimated threshold sample of Table 3: the power law performs as well as the more general stretched exponential distribution, and the estimated threshold sample lies well to the right of the estimated point at which lognormal behavior ends and power-law behavior begins. Table 3 should therefore be regarded as the paper’s best estimates.
6 Conclusion This paper presents strong evidence for a power-law tail distribution for cumulative oil and natural gas production. Lease productivities span many orders of magnitude, log–log graphs of cumulative production demonstrate a striking linear relationship, and quantitative robustness tests indicate the power-law distribution to be a good approximation. Of course, given the infinite number of distributions to select from, it is possible to find one that better fits the data. Yet the power-law distribution illustrates key features in the data (its heavy tails) and does so parsimoniously. The power-law result is significant for both management and regulation, particularly in the case of oil production. By overseeing just 1 % of oil leases, regulators can monitor nearly 70 % of cumulative production. Similarly for production companies, their profitability is determined not by the vast majority of the leases operated, but by their most productive 1 %. How this distribution should affect managerial decisions and markets is an avenue for further research.
References Adamic LA, Huberman BA (1999) The nature of markets and the World Wide Web. Working paper Agency IE (2008) World energy outlook 2008. International Energy Agency, Paris Arps J, Roberts T (1958) Economics of drilling for cretaceous oil and gas on the east flank of DenverJulesburg basin. Am Assoc Pet Geol Bull 42(11):2549–2566 Atkinson A, Piketty T (2007) Top incomes over the twentieth century. Oxford University Press, Oxford Attanasi ED, Charpentier RR (2002) Comparisons of two probability distributions used to model sizes of undiscovered oil and gas accumulations: Does the tail wag the assessment? Math Geol 34(6):767–777 Axtell R (2001) Zipf distribution of US firm sizes. Science 293:1818–1820 Bak P, Tang C, Wiesenfeld K (1987) Self-organized criticality: an explanation of 1/f noise. Phys Rev Lett 59(4):381–384 Barton C, Scholz CH (1995) The fractal size and spatial dimension of hydrocarbon accumulations. In: Barton C, La Pointe P (eds) Fractals in petroleum geology and earth processes. Plenum Press, New York, pp 13–35 (Chapter 2) Carlson J, Doyle J (1999) Highly optimized tolerance: a mechanism for power laws in designed systems. Phys Rev E 60(2):1412–1427 Clauset A, Shalizi CR, Newman M (2009) Power law distributions in empirical data. SIAM Rev 51:661–703 Cooray K, Ananda M (2005) Modeling actuarial data with a composite lognormal-pareto model. Scand Actuar J 5:321–334 Cox RA, Felton JM, Chung KH (1995) The concentration of commercial success in popular music: an analysis of the distribution of gold records. J Cult Econ 19:333–340
123
A. Balthrop Cox RA, Chung KH (1991) Patterns of research output and author concentration in the economics literature. Rev Econ Stat 73(4):740–747 de Haan L, Ferreira A (2006) Extreme value theory: an introduction. springer series in operations research and financial engineering. Springer, New York Drew L (1990) Oil and gas forecasting. Oxford University Press, Oxford Drew L, Schuenemyer J, Bawiec W (1982) Estimation of the future rates of oil and gas discoveries in the western Gulf of Mexico. US geological survey professional paper 1252 Farmer JD, Geanakoplos J (2008) Power laws in economics and elsewhere. Working paper Fox MA, Kochanowski P (2004) Models of superstardom: an application of the Lotka and Yule distributions. Popul Music Soc 27(4):507–522 Gabaix X (1999) Zipf’s law for cities: an explanation. Q J Econ 114(3):739–767 Gabaix X (2009) Power laws in economics and finance. Annu Rev Econ 1:225–293 Gabaix X, Gopikrishnan P, Plerou V, Stanley HE (2003) A theory of power-law distributions in financial market fluctuations. Nature 423:267–270 Gabaix X, Ibragimov R (2011) Rank-1/2: a simple way to improve the OLS estimation of tail exponents. J Bus Econ Stat 29(1):24–39 Gabaix X, Ioannides Y (2004) The evolution of the city size distributions. In: Henderson VTJ (ed) Handbook of regional and urban economics, vol 4. Elsevier, Oxford, pp 2341–2378 Gopikrishnan P, Plerou V, Amaral LAN, Meyer M, Stanley HE (1999) Scaling of the distribution of fluctuations of financial market indices. Phys Revi E 60(5):5305–5316 Gopikrishnan P, Plerou V, Gabaix X, Stanley HE (2000) Statistical properties of share volume traded in financial markets. Phys Rev E 62(4):R4493–R4496 Jones CI (2015) Pareto and Piketty: the macroeconomics of top income and wealth inequality. J Econ Perspect 29(1):29–46 Kaufman GM (1993) Statistical issues in the assessment of undiscovered oil and gas resources. Energy J 14(1):183–215 Kellogg R (2010) The effect of uncertainty on investment: evidence from Texas oil drilling. NBER working paper series, No. 16541 Kohli R, Sah R (2003) Market shares: some power law results and observations. Harris School working paper series 04.1 Krige D (1960) On the departure of ore value distributions from the lognormal model in South African gold mines. J South Afr Inst Min Metall 61(4):231–244 La Pointe P (1995) Estimation of undiscovered hydrocarbon potential through fractal geometry. In: Barton C, La Pointe P (eds) Fractals in petroleum geology and earth processes. Plenum Press, New York Lux T (1996) The stable paretian hypothesis and the frequency of large returns: an examination of major German stocks. Appl Financ Econ 6:463–475 Lux T (2000) On moment condition failure in German stock returns: an application of recent advances in extreme value statistics. Empir Econ 25(4):641–652 Malevergne Y, Pisarenko V, Sornette D (2005) Empirical distributions of log-returns: between the stretched exponential and the power law. Quant Finance 5(4):379–401 Malevergne Y, Pisarenko V, Sornette D (2011) Testing the pareto against the lognormal distributions with the uniformly most powerful unbiased test applied to the distribution of cities. Phys Rev E 83(3):036111 Mandelbrot B (1995) The statistics of natural resources and the law of pareto. In: Barton, Christopher Cramer, and Paul R. La Pointe (eds) Fractals in petroleum geology and earth processes. Plenum Press, New York, pp 1–12 (Chapter 1) Mandelbrot B (1997) Fractals and scaling in finance: discontinuity, concentration, risk. Springer, New York McCrossan R (1969) An analysis of the size frequency distribution of oil and gas reserves of Western Canada. Can J Earth Sci 6(2):201–211 Newman M (2005) Power laws, Pareto distributions and Zipf’s law. Contemp Phys 46(5):323–351 Newman M (2010) Networks: an introduction. Oxford University Press, New York Okuyama K, Takayasu M, Takayasu H (1999) Zipf’s law in income distribution of companies. Phys A 269:125–131 Pareto V (1896) Cours D’Economie Politique. Droz, Geneva Plerou V, Gopikrishnan P, Amaral LAN, Gabaix X, Stanley HE (2000) Economic fluctuations and anomalous diffusion. Phy Rev E 62(3):R3023–R3026 Reiss R-D, Thomas M (1997) Statistical analysis of extreme values from insurance, finance hydrology and other fields. Birkhauser Verlag, Boston
123
Power laws in oil and natural gas production Rendu J-MM (1988) Lognormal distributions: theory and applications, applications in geology. Taylor and Francis, New York (chapter 14) Simmons MR (2005) Twilight in the desert: the coming Saudi oil shock and the world economy. John Wiley and Sons Ltd, Hoboken Simon HA (1955) On a class of skew distribution functions. Biometrika 42(3/4):425–440 Sornette D (2006) Critical phenomena in natural sciences: chaos, fractals, self-organization and disorder: concepts and tools, 2nd edn. Springer, Berlin Sorrell S, Speirs J, Bentley R, Miller R, Thompson E (2012) Shaping the global oil peak: a review of the evidence on field sizes, reserve growth, decline rates, and depletion rates. Energy 37(1):709–724 Turcotte D (2002) Fractals in petrology. Lithos 65(1):261–271 Willis J, Yule GU (1922) Some statistics of the evolution and geographical distribution of plants and animals, and their significance. Nature 109(2728):177–179 Zipf G (1949) Human behavior and the principle of least effort. Addison-Wesley, Cambridge
123