SSJ
3.3.1
Stochastic Simulation in Java
|
This document provides an introduction to SSJ via a series of examples. We first give a brief overview of some basic facilities to generate random numbers, handle probability distributions, collect statistics, and manage event lists, for example. An overview of how SSJ is organized in various inter-related packages can be found on the main page, in the Introduction and overview Section. Our series of examples start with elementary Monte Carlo experiments in which we show how to generate random variates to simulate a model, collect statistics, and make some plots from the results. We show how to generate vectors from multivariate distributions and stochastic processes, and how the random numbers can be replaced by randomized quasi-Monte Carlo points. Later, we provide more elaborate examples of discrete-event simulations. The Java code of all the examples is available with the SSJ source code in the src/main/java/tutorial/ directory.
While studying the examples, one can refer to the functional definitions (the APIs) of the SSJ classes. Before that, we recommand looking at the short overview of the different packages in the Introduction and overview on the main page, then the overview given below, and perhaps the overview of each package given at the beginning of its documentation, to have a general idea of what is available and where.
Random numbers feed simulation models and Monte Carlo experiments. To simulate random variables from a given probability distribution, one typically starts from independent uniform random numbers, uniformly distributed over the interval \([0,1)\), and transform these numbers appropriately to obtain the desired distribution. The so-called uniform random numbers produced by SSJ and other simulation software are of course not truly random and independent. They are only imitations produced by deterministic mathematical algorithms, and are sometimes called pseudorandom. Nevertheless, they are very good imitations from a statistical viewpoint, and we follow the common usage in the simulation community of just calling them random numbers [119], [145], [159]. An important consequence of the deterministic nature of these algorithmic random number generators (RNGs) is that if you run the same simulation program several times, unless you reset the seed of the RNG, you will get exactly the same results each time. This is really not like throwing a dice or flipping a coin, but this behavior is actually desirable. The ability to reproduce Monte Carlo experiments with exactly the same random numbers is a key advantage (and requirement) in modern simulation software [119], [157], [158], [159], [144].
The selection of an RNG is based on several criteria such as uniformity, performance, and portability [146], [157]. The package umontreal.ssj.rng provides basic tools to generate uniform random numbers. It defines an interface called umontreal.ssj.rng.RandomStream implemented by any RNG supported by SSJ. This interface requires that each RNG provides multiple streams and substreams of random numbers. Such facilities are very useful when performing multiple simulation runs and comparing similar systems by simulation, for example [158], [142], [144]. The interface permits one to easily interchange RNGs without changing the code of the simulator, because they are accessed through the same set of methods specified by the interface. The selected type of RNG is specified when creating the RandomStream
object.
One can also easily replace the 'independent' random numbers with (possibly randomized) highly-uniform (quasi-Monte Carlo) point sets implemented in the package umontreal.ssj.hups, again without changing the simulation code. The point set objects all inherit from umontreal.ssj.hups.PointSet which provides a umontreal.ssj.hups.PointSetIterator that implements the same umontreal.ssj.rng.RandomStream interface as the RNGs. The replacement of random numbers by quasi-random ones can be easily done without modifying the model implementation, except for the setup code that creates the point sets.
The package umontreal.ssj.latnetbuilder provides an interface to the LatNet Builder software, implemented in C++, which provides tools to construct hups of various types such as lattices, polynomial lattices, digital nets, etc., in arbitrary dimension, for an arbitrary number of points, a rich variety of uniformity criteria, etc. Note that LatNet Builder uses the NTL library and for this reason it does not run under Windows, but only under Linux-type systems.
To generate non-uniform random numbers, one can select a probability distribution from the umontreal.ssj.probdist package and a RandomStream
object as a source of randomness. The probability distribution may have been estimated in a modeling phase based on data from the system of interest, or selected in any other way. The package umontreal.ssj.probdist contains several commonly-used discrete and continuous distributions, implemented as subclasses of the abstract classes umontreal.ssj.probdist.ContinuousDistribution and umontreal.ssj.probdist.DiscreteDistribution. The methods to access the density or mass, cdf, complementary cdf, inverse cdf, mean, variance, etc., are common to all distributions, so much of the code may be written independently of the selected distribution. Static methods are also available for the case where one does not want to create a distribution object with fixed parameters, which often involves significant overhead because it precomputes constants and tables. This precomputation setup is often worthwhile if the same distribution (with same parameters) is used many times, but not if the parameters change all the time, in which case the setup has to be redone each time.
The default (most general) way of defining a non-uniform generator is to match a distribution with a RandomStream
. The package umontreal.ssj.randvar provides facilities to do that. By default, the random variates are obtained by applying the inverse cdf (computed via the inverseF
method) to the uniform random numbers obtained from the stream. However, inversion is sometimes too slow or inapplicable (the inverse cdf or even a good approximation of it may be too hard to compute). The package umontreal.ssj.randvar permits one to generate non-uniform randm variates in many other ways than by inversion. In particular, it offers classes to generate random variates from specific discrete and continuous distributions by specialized methods, without relying on the probdist
package. These specific generators inherit from the common classes umontreal.ssj.randvar.RandomVariateGen and umontreal.ssj.randvar.RandomVariateGenInt. Each specialized class provides a static method which can be preferable to use when it is not worthwhile to create a generator object and make the associated precomputations. One drawback of these static methods, however, is that their signatures that are specific to the distribution, because they must transmit the distribution parameters, and therefore one may have to change the simulation code when changing a distribution. They may also have to perform setup operations on each variate generation.
The packages umontreal.ssj.probdistmulti and umontreal.ssj.randvarmulti are the counterparts of probdist
and randvar
for multivariate distributions. The package umontreal.ssj.stochprocess offers tools to generate discretely observed sample paths (skeletons) of various types of stochastic processes.
Output from simulations is collected in statistical probes that implement the abstract class umontreal.ssj.stat.StatProbe from the umontreal.ssj.stat package. There are two main types of probes: umontreal.ssj.stat.Tally probes collect series of observations of the form \(X_1,…,X_n\) whereas umontreal.ssj.simevents.Accumulate probes collect statistics for a process that evolves in continuous (simulated) time, with a piecewise-constant trajectory. During the simulation, one can add observations to such probes. After the simulation (or at any time), measures such as the sample average, the sample variance or standard deviation, confidence intervals, etc., can be observed. Statistical reports can also be obtained for the different probes, in the form of a character String
that can be displayed or printed. The stat
package also provides a way to detach statistical collection from the model implementation by using bound properties.
The observations given to a statistical probe can be arrays of real numbers, whose dimension is the number of fields on which data is collected. After n observations are collected, the probe will contain (virtually) a table whose rows are the observations and whose columns are the fields. In addition to the empirical means, variances, etc., for the individual fields, one can also obtain empirical covariances and correlations between the fields, and other measures of dependence.
Subpackages of stat
provide facilities to manage lists or arrays of statistical probes. The package umontreal.ssj.gof implements goodness-of-fit tests such as Kolmogorov-Smirnov and Anderson-Darling tests.
umontreal.ssj.simexp and umontreal.ssj.mcqmctools provide additional facilities for performing simulation experiments using independent replications, batch means, and (randomized) quasi-Monte Carlo methods.
SSJ supports discrete-event, continuous, and mixed simulation, via the package umontreal.ssj.simevents. This package manages the simulation clock and the event list, two essential components of discrete-event simulations. The simulation clock tracks the simulation time whereas the event list stores the scheduled events to execute them in the right order. Events are user-defined subclasses of umontreal.ssj.simevents.Event. When an event occurs, any type of actions can then be taken. The package provides a class called umontreal.ssj.simevents.ListWithStat which implements a linked list with integrated statistical probes to collect data on sojourn times in the list and the time-dependent size of the list. Continuous simulation can be performed using the class umontreal.ssj.simevents.Continuous. It uses the event framework to (approximately) simulate differential equations numerically with time discretization.
We start with elementary examples that illustrate how to generate uniform and nonuniform random numbers, construct probability distributions, collect elementary statistics, compute confidence intervals, compare similar systems, and use randomized quasi-Monte Carlo point sets, with SSJ. The models considered here are quite simple and some of the performance measures may be computed by numerical methods rather than by simulation. The purpose of these simple models is just to give a first idea of how to use SSJ.
We want to estimate the distribution of the number of collisions in a hashing system. There are \(k\) locations (or addresses) and \(m\) distinct items. Each item is assigned a random location, independently of the other items. A collision occurs each time an item is assigned a location already occupied. Let \(C\) be the number of collisions. We want to estimate the probability distribution of the random variable \(C\), as well as its expectation \(\mathbb E[C]\) and variance Var \([C]\), by simulation. A theorem states that when \(k\to\infty\) while \(\lambda= m^2/(2k)\) remains fixed, \(C\) converges in distribution to a Poisson random variable with mean \(\lambda\) [138]. Thus, this Poisson approximation should be good when \(k\) is very large and \(\lambda\) is not too large. We may want to use Monte Carlo simulation to assess the error made by the Poisson approximation for finite values of \(k\) and \(m\). To do this, we can generate \(n\) independent realizations of \(C\), say \(C_1,…,C_n\), compute their empirical distribution and empirical mean, and compare with the Poisson distribution with mean \(\lambda\).
The Java program in Listing Collision simulates \(C_1,…,C_n\), and computes the empirical distribution of $C$, its mean and variance, and a 95% confidence interval on \(\mathbb E[C]\). The results for \(k = 10000\), \(m = 500\), and \(n = 100000\), are in Listing Collision results.
Here we have \(\lambda=12.5\), whereas the reported confidence interval is \((12.268, 12.272)\). This indicates that the asymptotic result underestimates \(\mathbb E[C]\) by about 2.6%.
The Java program imports the SSJ packages rng
, probdist
, and stat
. It uses only three types of objects from SSJ: (1) a RandomStream
object from the package umontreal.ssj.rng, that generates a stream of independent random numbers from the uniform distribution; (2) a PoissonDist
object from the package umontreal.ssj.probdist, which implements a Poisson distribution with mean \(\lambda\); and (3) a Tally
object, from the package umontreal.ssj.stat, used to collect the \(n\) realizations of \(C\) and produce a statistiscal report. In SSJ, RandomStream
is actually just an interface that specifies all the methods that must be provided by its different implementations, which correspond to different brands of random streams (i.e., different types of uniform random number generators). The class MRG32k3a
, whose constructor is invoked in the main program, is one such implementation of RandomStream
. This is the one we use here. The class Tally
provides the simplest type of statistical collector. It receives observations one by one, and after each new observation, it updates the number, average, variance, minimum, and maximum of the observations. At any time, it can return these statistics or compute a confidence interval for the theoretical mean of these observations, assuming that they are independent and identically distributed with the normal distribution. Other types of collectors that memorize the observations are also available in stat
.
Simulating the number of collisions in a hashing system [Collision]
The class Collision
offers the facilities to simulate copies of \(C\). Its constructor specifies \(k\) and \(m\), computes \(\lambda\), and constructs a boolean array of size \(k\) to memorize the locations used so far, in order to detect the collisions. The method simulate
initializes the boolean array to false
, generates the \(m\) locations, and computes \(C\). The method simulateRuns
first resets the statistical collector statC
, then generates \(n\) independent copies of \(C\) and pass these \(n\) observations to the collector via the method add
. The method statC.report
computes a confidence interval from these \(n\) observations and returns a statistical report in the form of a character string. This report is printed, together with the value of \(\lambda\). See Listing Collision results.
In addition to the statC
collector, the program maintains an array of counters to count how many times each value of \(C\) has been observed. The values larger or equal to maxCounts
are aggregated in the same counter. These counts for \(C=\) 0 to maxCounts
are printed in the output, together with their expected values according to the Poisson (approximation) model, for comparison. The poisson
object is used to calculate these values.
Results of the program Collision
[Collision results]
The program in Listing Nonuniform simulates the following artificial model. Define the random variable
\[ X = Y_1 + \cdots+ Y_N + W_1 + …+ W_M, \]
where \(N\) is Poisson with mean \(\lambda\), \(M\) is geometric with parameter \(p\), the \(Y_j\)’s are gamma with parameters \((\alpha, \beta)\), the \(W_j\)’s are lognormal with parameters \((\mu,\sigma)\), and all these random variables are independent. We want to generate \(n\) copies of \(X\), say \(X_1,…,X_n\), estimate the distribution of \(X\) by a histogram, and get the 0.10, 0.50, 0.90, and 0.99 quantiles of its empirical distribution.
The method simulateRuns
generates \(n\) copies of \(X\) and pass them to a statistical collector of class TallyStore
, that stores the individual observations. These observations are sorted in increasing order by invoking quickSort
, and the appropriate empirical quantiles are printed, together with a short report.
Simulating nonuniform variates and observing quantiles [Nonuniform]
Results of the program Nonuniform
[Nonuniform results]
To simplify the program, all the parameters are fixed as constants at the beginning of the class. This is much simpler, but not recommended in general because it does not permit one to perform experiments with different parameter sets with the same program. Passing the parameters to the constructor as in Listing Collision would require more lines of code, but would provide more flexibility.
The class initialization constructs a RandomStream
of type LFSR113
(this is a faster uniform generator that MRG32k3a
) used to generate all the random numbers. For the generation of \(N\), we construct a Poisson distribution with mean \(\lambda\) (without giving it a name), and pass it together with the random stream to the constructor of class PoissonGen
. The returned object genN
is random number generator that generate Poisson random variables with mean \(\lambda\), via inversion. As similar procedure is used to construct genY
and genW
, which generate gamma and lognormal random variates, respectively. Note that a RandomVariateGenInt
produces integer-valued random variates, while a RandomVariateGen
produces real-valued random variates. For the gamma distribution, we use a special type of random number generator based on a rejection method, which is faster than inversion. These constructors precompute some (hidden) constants once for all, to speedup the random variate generation. For the Poisson distribution with mean \(\lambda\), the constructor of PoissonDist
actually precomputes the distribution function in a table, and uses this table to compute the inverse distribution function each time a Poisson random variate needs to be generated with this particular distribution. This is possible because all Poisson random variates have the same parameter \(\lambda\). If a different \(\lambda\) was used for each variate, then we would use the static method of PoissonDist
instead of constructing a Poisson distribution, otherwise we would have to reconstruct the distribution each time. The static method reconstructs part of the table each time, with the given \(\lambda\), so it is slower if we want to generate several Poisson variates with the same \(\lambda\). As an illustration, we use the static method to generate the geometric random variates (in simulate
), instead of constructing a geometric distribution and variate generator. For this particular distribution, the static method is almost as fast. To generate \(M\), we invoke the static method inverseF
of the class GeometricDist
, which evaluates the inverse geometric distribution function for a given parameter \(p\) and a given uniform random variate. One important drawback of using the static method inside the simulate
method is that changing the geometric distribution for another one would require changing the code inside simulate
. Also here the three other distributions are hardcoded at the beginning of the class. To make the program general, one could pass them as parameters in the constructor, or read them in a file inside the constructor.
The results of this program, with \(n = 100000\), are in Listing Nonuniform. We see that \(X\) has a coefficient of variation larger than 1, and the quantiles indicate that the distribution is skewed, with a long tail to the right. We have \(X < 439\) about half the time, whereas the average is 685.6 and values over several thousands are not uncommon. This probably happens when \(N\) or \(M\) takes a large value. There are also many cases where \(N=M=0\), in which case \(X=0\). Looking at the histogram confirms this evaluation and provides a clearer idea of the distribution.
Consider a simple inventory system where the demands for a given product on successive days are independent Poisson random variables with mean \(\lambda\). If \(X_j\) is the stock level at the beginning of day \(j\) and \(D_j\) is the demand on that day, then there are \(\min(D_j, X_j)\) sales, \(\max(0, D_j - X_j)\) lost sales, and the stock at the end of the day is \(Y_j = \max(0, X_j - D_j)\). There is a revenue \(c\) for each sale and a cost \(h\) for each unsold item at the end of the day. The inventory is controlled using a \((s,S)\) policy: If \(Y_j < s\), order \(S - Y_j\) items, otherwise do not order. When an order is made in the evening, with probability \(p\) it arrives during the night and can be used for the next day, and with probability \(1-p\) it never arrives (in which case a new order will have to be made the next evening). When the order arrives, there is a fixed cost \(K\) plus a marginal cost of \(k\) per item. The stock at the beginning of the first day is \(X_0 = S\).
We want to simulate this system for \(m\) days, for a given set of parameters and a given control policy \((s,S)\), and replicate this simulation \(n\) times independently to estimate the expected profit per day over a time horizon of \(m\) days. Eventually, we might want to optimize the values of the decision parameters \((s,S)\) via simulation. (In practice, this is usually done for more complicated models.)
A simulation program for the simple inventory system [Inventory]
ListingInventory gives a Java program that performs a
simulation experiment for \(n=500\), \(m=2000\), \(s=80\), \(S=200\), \(\lambda=100\), \(c=2\), \(h=0.1\), \(K=10\), \(k=1\), and \(p=0.95\).
The import
statements at the beginning of the program retrieve the SSJ packages/classes that are needed. The Inventory
class has a constructor that initializes the model parameters (saving their values in class variables) and constructs the required random number generators and the statistical collector. To generate the demands \(D_j\) on successive days, we create (in the last line of the constructor) a random number stream and a Poisson distribution with mean \(\lambda\), and then a Poisson random variate generator genDemand
that uses this stream and this distribution. This mechanism will (automatically) precompute tables to ensure that the Poisson variate generation is efficient. This can be done because the value of \(\lambda\) does not change during the simulation. The random number stream streamOrder
, used to decide which orders are received, and the statistical collector statProfit
, are also created when the Inventory
constructor is invoked. The code that invokes their constructors is outside the Inventory
constructor, but it could have been inside as well. On the other hand, genDemand
must be constructed inside the Inventory
constructor, because the value of \(\lambda\) is not yet defined outside. The random number streams can be viewed as virtual random number generators that generate random numbers in the interval \([0,1)\) according to the uniform probability distribution.
The method simulate
simulates the system for \(m\) days, with a given policy, and returns the average profit per day. For each day, we generate the demand \(D_j\), compute the stock \(Y_j\) at the end of the day, and add the sales revenues minus the leftover inventory costs to the profit. If \(Y_j < s\), we generate a uniform random variable \(U\) over the interval \((0,1)\) and an order of size \(S - Y_j\) is received the next morning if \(U < p\) (that is, with probability \(p\)). In case of a successful order, we pay for it and the stock level is reset to \(S\).
The method simulateRuns
performs \(n\) independent simulation runs of this system and returns a report that contains a 90% confidence interval for the expected profit. The main program constructs an Inventory
object with the desired parameters, asks for \(n\) simulation runs, and prints the report. It also creates a timer that computes the total CPU time to execute the program, and prints it. The results are in Listing Inventory. The average profit per day is approximately 85. It took 0.17 seconds (on a 2.4 GHz (???) computer running Windows 10 and Eclipse) to simulate the system 500 times for 2000 days each, compute the statistics, and print the results.
Results of the program Inventory
[Inventory results]
Comparing two inventory policies with common random numbers [InventoryCRN]
In Listing InventoryCRN, we extend the Inventory
class to a class InventoryCRN
that compares two sets of parameters \((s,S)\) for the inventory control policy. The method simulateDiff
simulates the system with policies \((s_1, S_1)\) and \((s_2, S_2)\) independently, computes the difference in profits, and repeats this \(n\) times. These \(n\) differences are tallied in statistical collector statDiff
, to estimate the expected difference in average daily profits between the two policies.
The method simulateDiffCRN
does the same, but using common random numbers across pairs of simulation runs. After running the simulation with policy \((s_1, S_1)\), the two random number streams are reset to the start of their current substream, so that they produce exactly the same sequence of random numbers when the simulation is run with policy \((s_2, S_2)\). Then the difference in profits is given to the statistical collector statDiff
as before and the two streams are reset to a new substream for the next pair of simulations.
Why not use the same stream for both the demands and orders? In this example, we need one random number to generate the demand each day, and also one random number to know if the order arrives, but only on the days where we make an order. These days where we make an order are not necessarily the same for the two policies. So if we use a single stream for both the demands and orders, the random numbers will not necessarily be used for the same purpose across the two policies: a random number used to decide if the order arrives in one case may end up being used to generate a demand in the other case. This can greatly diminish the power of the common random numbers technology. Using two different streams as in Listing InventoryCRN ensures at least that the random numbers are used for the same purpose for the two policies. For more explanations and examples about common random numbers, see [118], [132], [130].
The main program estimates the expected difference in average daily profits for policies \((s_1, S_1) = (80, 198)\) and \((s_2, S_2) = (80, 200)\), first with independent random numbers, then with common random numbers. The other parameters are the same as before. The results are in Listing InventoryCRN results. We see that use of common random numbers reduces the variance by a factor of about 20 in this case. This means that with CRNs, one needs about 20 times less simulation than with independent random numbers to estimate the difference with the same accuracy.
Results of the program InventoryCRN
[InventoryCRN results]
We consider here a single-server queue, where customers arrive randomly and are served one by one in their order of arrival, i.e., first in, first out (FIFO). We suppose that the times between successive arrivals are exponential random variables with mean \(1/\lambda\), that the service times are exponential random variables with mean \(1/\mu\), and that all these random variables are mutually independent. The customers arriving while the server is busy must join the queue. The system initially starts empty. We want to simulate the first \(m\) customers in the system and compute the mean waiting time per customer.
This simple model is well-known in queuing theory and is called an \(M/M/1\) queue. Simple formulas are available for this model to compute the average waiting time per customer, average queue length, probability that a customers waits more than \(x\) seconds, etc., over an infinite time horizon [109]. For a finite number of customers or a finite time horizon, these expectations can also be computed by numerical methods, but here we just want to show how it can be simulated.
In a single-server queue, if \(W_i\) and \(S_i\) are the waiting time and service time of the \(i\)th customer, and \(A_i\) is the time between the arrivals of the \(i\)th and \((i+1)\)th customers, we have \(W_1=0\) and the \(W_i\)’s follow the recurrence
\[ W_{i+1} = \max(0,\; W_i + S_i - A_i), \tag{lindley} \]
known as Lindley’s equation [109].
A simulation based on Lindley’s recurrence [QueueLindley]
The program of Listing QueueLindley exploits ( lindley ) to compute the average waiting time of the first \(m\) customers in the queue, repeats it \(n\) times independently, and prints a summary of the results. Here, for a change, we pass the model parameters to the methods instead of to the constructor, and the random variates are generated by static methods instead of via a RandomVariateGen
object as in the Inventory class (previous example). This illustrates various ways of doing the same thing. The instruction “Wi += …
” could also be replaced by
which directly implements inversion of the exponential distribution. Hardcoding the exponential distributions in the simulate
method as we do here makes the program simpler, but it has the drawback that one cannot reuse the same class for other distributions than the exponential.
To make it more general, the distributions could be created outside the class and passed to the constructor.
Listing QueueObs adds a few ingredients to the program QueueLindley
, in order to illustrate the observer design pattern implemented in package stat
. This mechanism permits one to separate data generation from data processing. It can be very helpful in large simulation programs or libraries, where different objects may need to process the same data in different ways. These objects may have the task of storing observations or displaying statistics in different formats, or putting them in files for treatment by other software such as R, for example, and they are not necessarily fixed in advance.
The observer pattern, supported by the umontreal.ssj.stat.ObservationListener interface in SSJ, offers the appropriate flexibility for that kind of situation. A statistical probe maintains a list of registered umontreal.ssj.stat.ObservationListener objects, and broadcasts information to all its registered observers whenever appropriate. Any object that implements the interface umontreal.ssj.stat.ObservationListener can register as an observer.
A StatProbe
object from package stat
, or an instance of its subclasses Tally
and Accumulate
, contains a list of ObservationListener
’s. Whenever it receives a new statistical observation, e.g., via Tally.add
or Accumulate.update
, they send the new value to all registered observers. To register as an observer, an object must implement the interface umontreal.ssj.stat.ObservationListener This implies that it must provide an implementation of the method newObservation
, whose purpose is to recover the information that the object has registered for, usually to do somethinmg with it.
In the example, the statistical collector waitingTimes
transmits to all its registered listeners each new statistical observation that it receives via its add
method. More specifically, each call to waitingTimes.add(x)
generates in the background a call to o.newObservation(waitingTimes, x)
for all registered observers o
.
The method notifyObs
is used to turn the tally into such a notifying agency. In fact, the collector is both a tally and a distribution agency, but its tally functionality can be disabled using the stopCollectStat
method. This can be useful when the registered observers already perform statistical collection.
In our example, two observers register to receive observations from waitingTimes
. They are anonymous objects of classes ObservationTrace
and LargeWaitsCollector
, respectively. Each one is informed of any new observation \(W_i\) via its newObservation
method. The task of the ObservationTrace
observer is to print the waiting times \(W_5\), \(W_{10}\), \(W_{15}\), …, whereas the LargeWaitsCollector
observer stores in an array all waiting times that exceed 2. The statistical collector waitingTimes
itself also stores appropriate information to be able to provide a statistical report when required.
The ObservationListener
interface specifies that newObservation
must have two formal parameters, of classes StatProbe
and double
, respectively. The second parameter is the value of the observation. In the case where the observer registers to several ObservationListener
objects, the first parameter of newObservation
tells it which one is sending the information, so it can adopt the correct behavior for this sender.
A simulation of Lindley’s recurrence using observers [QueueObs]
A geometric Brownian motion (GBM) \(\{S(\zeta), \zeta\ge0\}\) satisfies
\[ S(\zeta) = S(0) \exp\left[(r - \sigma^2/2)\zeta+ \sigma B(\zeta)\right] \]
where \(r\) is the risk-free appreciation rate, \(\sigma\) is the volatility parameter, and \(B\) is a standard Brownian motion, i.e., a stochastic process whose increments over disjoint intervals are independent normal random variables, with mean 0 and variance \(\delta\) over an interval of length \(\delta\) (see, e.g., [69]). The GBM process is a popular model for the evolution in time of the market price of financial assets. A discretely-monitored Asian option on the arithmetic average of a given asset has discounted payoff
\[ \tag{payasian} X = e^{-rT} \max[\bar{S} - K, 0] \]
where \(K\) is a constant called the strike price and
\[ \tag{arithmetic-average} \bar{S} = \frac{1}{t} \sum_{j=1}^t S(\zeta_j), \]
for some fixed observation times \(0 < \zeta_1 < \cdots< \zeta_t = T\). The value (or fair price) of the Asian option is \(v = E[X]\) where the expectation is taken under the so-called risk-neutral measure (which means that the parameters \(r\) and \(\sigma\) have to be selected in a particular way; see [69]).
This value \(v\) can be estimated by simulation as follows. Generate \(t\) independent and identically distributed (i.i.d.) \(N(0,1)\) random variables \(Z_1,…,Z_t\) and put \(B(\zeta_j) = B(\zeta_{j-1}) + \sqrt{\zeta_j - \zeta_{j-1}} Z_j\), for \(j=1,…,t\), where \(B(\zeta_0) = \zeta_0 = 0\). Then,
\[ S(\zeta_j) = S(0) e^{(r-\sigma^2/2)\zeta_j + \sigma B(\zeta_j)} \]
for \(j = 1,…,t\) and the payoff can be computed via (payasian). This can be replicated \(n\) times, independently, and the option value is estimated by the average discounted payoff. The Java program of Listing AsianGBM implement this procedure.
Note that generating the sample path and computing the payoff is done in two different methods. This way, other methods could eventually be added to compute payoffs that are defined differently (e.g., based on the geometric average, or with barriers, etc.) over the same generated sample path.
Pricing an Asian option on a GMB process [AsianGBM]
The method simulateRuns
performs \(n\) independent simulation runs using the given random number stream and put the \(n\) observations of the net payoff in the statistical collector statValue
. In the main
program, we first specify the \(d=12\) observation times \(\zeta_j = j/12\) for \(j=1,…,12\), and put them in the array zeta
(of size 13) together with \(\zeta_0=0\). We then construct an AsianGBM
object with parameters \(r=0.05\), \(\sigma=0.5\), \(K = 100\), \(S(0)=100\), \(d=12\), and the observation times contained in array zeta
. We then create the statistical collector statValue
, perform \(10^6\) simulation runs, and print the results. The discount factor \(e^{-rT}\) and the constants \(\sigma\sqrt{\zeta_j - \zeta_{j-1}}\) and \((r-\sigma^2/2)(\zeta_j - \zeta_{j-1})\) are precomputed in the constructor AsianGBM
, to speed up the simulation.
The program in Listing AsianGBMQMC extends the class AsianGBM
to AsianGBMQMC
, whose method simulateRunsRQMC
estimates the option value via randomized quasi-Monte Carlo (RQMC). This method takes as input an RQMC point set, and makes \(m\) independent randomizations of it. For each randomization, it computes the average payoff over the \(n\) points of the point set. The \(m\) independent averages are given to the collector statRQMC
, which is returned by the method.
The method simulateRunsRQMC
creates an iterator stream
which will be used to enumerate the points. These point set iterators, available for each type of point set in package hups
, implement the RandomStream
interface and permit one to easily replace the uniform random numbers by QMC or RQMC points or sequences, without changing the code of the model itself. The method resetStartStream
, invoked immediately after each randomization of prqmc
, resets the iterator to the first coordinate of the first point. The number \(n\) of simulation runs is equal to the number of points. The points correspond to substreams in the RandomStream
interface. The method resetNextSubstream
, invoked after each simulation run in simulateRuns
, resets the iterator to the first coordinate of the next point. Each generation of a uniform random number (directly or indirectly) with this stream during the simulation moves the iterator to the next coordinate of the current point.
Pricing an Asian option on a GMB process with randomized quasi-Monte Carlo [AsianGBMQMC]
The main
program constructs an AsianGBMQMC
object and first makes a Monte Carlo (MC) experiment. Then it constructs the point set and its randomization for the RQMC experiment. The point set p
used in this example is a Sobol’ net with \(n = 2^{16}\) points in \(t\) dimensions. The randomization rand
is a left matrix scramble followed by a random digital shift.
See umontreal.ssj.hups for more details on what these classes are doing. By putting together p
and rand
, we obtain the RQMCPointSet
prqmc
.
The program invokes simulateRunsRQMC
to make the RQMC experiment. It then computes the empirical variance and CPU time per simulation run for both MC and RQMC. It prints the ratio of variances, which can be interpreted as the estimated variance reduction factor obtained when using RQMC instead of MC in this example, and the ratio of efficiencies, which can be interpreted as the estimated efficiency improvement factor. (The efficiency of an estimator is defined as 1/(variance \(\times\) CPU time per run.) The results are in Listing AsianGBMRQMC results: RQMC reduces the variance by a factor of around 250 and improves the efficiency by a factor of about 646. RQMC not only reduces the variance, it also runs faster than MC. The main reason for this is the call to resetNextSubstream
in simulateRuns
, which is a bit costly for a random number stream of class MRG32k3a
(with the current implementation) and takes negligible time for an iterator over a digital net in base 2. In fact, in the the case of MC, the call to resetNextSubstream
is not really needed. Removing it for that case would reduce the CPU time.
Results of the program AsianGBMQMC
[AsianGBMQMC results]
\(\ \ \)
Examples of discrete-event simulation programs, based on the event view supported by the package simevents
, are given in this section.
We return to the single-server queue considered in Section A single-server queue with Lindley’s recurrence. This time, instead of simulating a fixed number of customers, we simulate the system for a fixed time horizon of 1000.
Event-oriented simulation of an \(M/M/1\) queue [QueueEv]
Listing QueueEv gives an event-oriented simulation program, where a subclass of the class umontreal.ssj.simevents.Event is defined for each type of event that can occur in the simulation: arrival of a customer (Arrival
), departure of a customer (Departure
), and end of the simulation (EndOfSim
). Each event instance is inserted into the event list upon its creation, with a scheduled time of occurrence, and is executed when the simulation clock reaches this time. Executing an event means invoking its actions
method. Each event subclass must implement this method. The simulation clock and the event list (i.e., the list of events scheduled to occur in the future) are maintained behind the scenes by the class Sim
of package simevents
.
When QueueEv
is instantiated by the main
method, the program creates two streams of random numbers, two random variate generators, two lists, and two statistical probes (or collectors). The random number streams are attached to random variate generators genArr
and genServ
which are used to generate the times between successive arrivals and the service times, respectively. We can use such an attached generator because the means (parameters) do not change during simulation. The lists waitList
and servList
contain the customers waiting in the queue and the customer in service (if any), respectively. Maintaining a list for the customer in service may seem exaggerated, because this list never contains more than one object, but the current design has the advantage of working with very little change if the queuing model has more than one server, and in other more general situations. Note that we could have used the class LinkedListStat
from package simevents
instead of java.util.LinkedList
. However, with our current implementation, the automatic statistical collection in that LinkedListStat
class would not count the customers whose waiting time is zero, because they are never placed in the list.
The statistical probe custWaits
collects statistics on the customer’s waiting times. It is of the class Tally
, which is appropriate when the statistical data of interest is a sequence of observations \(X_1, X_2, …\) of which we might want to compute the sample mean, variance, and so on. A new observation is given to this probe by the add
method each time a customer starts its service. Every add
to a Tally
probe brings a new observation \(X_i\), which corresponds here to a customer’s waiting time in the queue. The other statistical probe, totWait
, is of the class Accumulate
, which means that it computes the integral (and, eventually, the time-average) of a continuous-time stochastic process with piecewise-constant trajectory. Here, the stochastic process of interest is the length of the queue as a function of time. One must call totWait.update
whenever there is a change in the queue size, to update the (hidden) accumulator that keeps the current value of the integral of the queue length. This integral is equal, after each update, to the total waiting time in the queue, for all the customers, since the beginning of the simulation.
Each customer is an object with two fields: arrivTime
memorizes this customer’s arrival time to the system, and servTime
memorizes its service time. This object is created, and its fields are initialized, when the customer arrives.
The method simulateOneRun
simulates this system for a fixed time horizon. It first invokes Sim.init
, which initializes the clock and the event list. The method Sim.start
actually starts the simulation by advancing the clock to the time of the first event in the event list, removing this event from the list, and executing it. This is repeated until either Sim.stop
is called or the event list becomes empty. Sim.time
returns the current time on the simulation clock. Here, two events are scheduled before starting the simulation: the end of the simulation at time horizon, and the arrival of the first customer at a random time that has the exponential distribution with rate \(\lambda\) (i.e., mean \(1/\lambda\)), generated by genArr
using inversion and its attached random stream. The method genArr.nextDouble
returns this exponential random variate.
The method actions
of the class Arrival
describes what happens when an arrival occurs. Arrivals are scheduled by a domino effect: the first action of each arrival event schedules the next event in a random number of time units, generated from the exponential distribution with rate \(\lambda\). Then, the newly arrived customer is created, its arrival time is set to the current simulation time, and its service time is generated from the exponential distribution with mean \(1/\mu\), using the random variate generator genServ
. If the server is busy, this customer is inserted at the end of the queue (the list waitList
) and the statistical probe totWait
, that keeps track of the size of the queue, is updated. Otherwise, the customer is inserted in the server’s list servList
, its departure is scheduled to happen in a number of time units equal to its service time, and a new observation of 0.0 is given to the statistical probe custWaits
that collects the waiting times.
When a Departure
event occurs, the customer in service is removed from the list (and disappears). If the queue is not empty, the first customer is removed from the queue (waitList
) and inserted in the server’s list, and its departure is scheduled. The waiting time of that customer (the current time minus its arrival time) is given as a new observation to the probe custWaits
, and the probe totWait
is also updated with the new (reduced) size of the queue.
The event EndOfSim
stops the simulation. Then the main
routine regains control and prints statistical reports for the two probes. The results are shown in Listing QueueEv. When calling report
on an Accumulate
object, an implicit update is done using the current simulation time and the last value given to update
. In this example, this ensures that the totWait
accumulator will integrate the total wait until the time horizon, because the simulation clock is still at that time when the report is printed. Without such an automatic update, the accumulator would integrate only up to the last update time before the time horizon.
Results of the program QueueEv
[QueueEv results]
We consider a classical prey-predator system, where the preys are food for the predators (see, e.g., [118], page 87). Let \(x(t)\) and \(z(t)\) be the numbers of preys and predators at time \(t\), respectively. These numbers are integers, but as an approximation, we shall assume that they are real-valued variables evolving according to the differential equations
\begin{align*} x’(t) & = r x(t) - c x(t) z(t) \\ z’(t) & = -s z(t) + d x(t) z(t) \end{align*}
with initial values \(x(0)=x_0>0\) and \(z(0)=z_0>0\). This is a Lotka-Volterra system of differential equations, which has a known analytical solution. Here, in the program of Listing PreyPred, we just show how to simulate its evolution, to illustrate the continuous simulation facilities of SSJ. Instead of using the default simulator as in the other examples of this section, this program explicitly creates a discrete-event umontreal.ssj.simevents.Simulator object sim
to manage the execution of the simulation. This is only to illustrate how this can be done. One could actually create several such Simulator
objects that could run in parallel, each one having its own event list.
Simulation of the prey-predator system [PreyPred]
The program prints the triples \((t, x(t), z(t))\) at values of \(t\) that are multiples of h
, one triple per line. This is done by an event of class PrintPoint
, which is rescheduled at every h
units of time. This output can be redirected to a file for later use, for example to plot a graph of the trajectory. The continuous variables x
and z
are instances of the classes Preys
and Preds
, whose method derivative
give their derivative \(x’(t)\) and \(z’(t)\), respectively. The differential equations are integrated by a Runge-Kutta method of order 4.
This is the old Example 1.4.1 of [25], page 14. A bank has a random number of tellers every morning. On any given day, the bank has \(t\) tellers with probability \(q_t\), where \(q_3 = 0.80\), \(q_2 = 0.15\), and \(q_1 = 0.05\). All the tellers are assumed to be identical from the modeling viewpoint.
Event-oriented simulation of the bank model [BankEv]
The bank opens at 10:00 and closes at 15:00 (i.e., 3 p.m.). The customers arrive randomly according to a Poisson process with piecewise constant rate \(\lambda(t)\), \(t\ge0\). The arrival rate \(\lambda(t)\)
is 0.5 customer per minute from 9:45 until 11:00 and from 14:00 until 15:00, and one customer per minute from 11:00 until 14:00. The customers who arrive between 9:45 and 10:00 join a FIFO queue and wait for the bank to open. At 15:00, the door is closed, but all the customers already in will be served. Service starts at 10:00.
Customers form a FIFO queue for the tellers, with balking. An arriving customer will balk (walk out) with probability \(p_k\) if there are \(k\) customers ahead of him in the queue (not counting the people receiving service), where
\[ p_k = \begin{cases} 0 & \text{if $k\le5$;} \\ (n-5)/5 & \text{if $5 < k < 10$;} \\ 1 & \text{if $k\ge10$.} \end{cases} \]
The customer service times are independent Erlang random variables: Each service time is the sum of two independent exponential random variables with mean one.
We want to estimate the expected number of customers served in a day, and the expected average wait for the customers served on a day.
Listing BankEv gives and event-oriented simulation program for this bank model. There are events at the fixed times 9:45, 10:00, 11:00, 14:00, and 15:00. At 9:45, the counters are initialized and the arrival process is started. The time until the first arrival, or the time between one arrival and the next one, is (tentatively) an exponential with a mean of 2 minutes. However, as soon as an arrival turns out to be past 11:00, its time must be readjusted to take into account the increase of the arrival rate at 11:00. The event 11:00 takes care of this readjustment, and the event at 14:00 makes a similar readjustment when the arrival rate decreases. We give the specific name nextArriv
to the next planned arrival event in order to be able to reschedule that particular event to a different time. Note that a single arrival event is created at the beginning and this same event is scheduled over and over again. This can be done because there is never more than one arrival event in the event list. (We could have done that as well for the \(M/M/1\) queue in Listing QueueEv.)
At the bank opening at 10:00, an event generates the number of tellers and starts the service for the corresponding customers. The event at 15:00 cancels the next arrival.
Upon arrival, a customer checks if a teller is free. If so, one teller becomes busy and the customer generates its service time and schedules his departure, otherwise the customer either balks or joins the queue. The balking decision is computed by the method balk
, using the random number stream streamBalk
. The arrival event also generates the next scheduled arrival. Upon departure, the customer frees the teller, and the first customer in the queue, if any, can start its service. The generator genServ
is an ErlangConvolutionGen
generator, so that the Erlang variates are generated by adding two exponentials instead of using inversion.
The method simulateDays
simulates the bank for numDays
days and prints a statistical report. If \(X_i\) is the number of customers served on day \(i\) and \(Q_i\) the total waiting time on day \(i\), the program estimates \(E[X_i]\) and \(E[Q_i]\) by their sample averages \(\bar{X}_n\) and \(\bar{Q}_n\) with \(n = \)numDays
. For each simulation run (each day), simulOneDay
initializes the clock, event list, and statistical probe for the waiting times, schedules the deterministic events, and runs the simulation. After 15:00, no more arrival occurs and the event list becomes empty when the last customer departs. At that point, the program returns to right after the Sim.start()
statement and updates the statistical counters for the number of customers served during the day and their total waiting time.
The results are given in Listing Bank results.
Results of the BankEv
program [Bank results]
We consider here a simplified model of a telephone contact center (or call center) where agents answer incoming calls. Each day, the center operates for \(m\) hours. The number of agents answering calls and the arrival rate of calls vary during the day; we shall assume that they are constant within each hour of operation but depend on the hour. Let \(n_j\) be the number of agents in the center during hour \(j\), for \(j=0,…,m-1\). For example, if the center operates from 8 am to 9 pm, then \(m=13\) and hour \(j\) starts at ( \(j+8\)) o’clock. All agents are assumed to be identical. When the number of occupied agents at the end of hour \(j\) is larger than \(n_{j+1}\), ongoing calls are all completed but new calls are answered only when there are less than \(n_{j+1}\) agents busy. After the center closes, ongoing calls are completed and calls already in the queue are answered, but no additional incoming call is taken.
The calls arrive according to a Poisson process with piecewise constant rate, equal to \(R_j = B \lambda_j\) during hour \(j\), where the \(\lambda_j\) are constants and \(B\) is a random variable having the gamma distribution with parameters \((\alpha_0,\alpha_0)\). Thus, \(B\) has mean 1 and variance \(1/\alpha_0\), and it represents the busyness of the day; it is more busy than usual when \(B > 1\) and less busy when \(B < 1\). The Poisson process assumption means that conditional on \(B\), the number of incoming calls during any subinterval \((t_1, t_2]\) of hour \(j\) is a Poisson random variable with mean \((t_2 - t_1) B \lambda_j\) and that the arrival counts in any disjoint time intervals are independent random variables. This arrival process model is motivated and studied in [240] and [12]. More refined and realistic arrival process models can be found in [93], [94], [190],
Incoming calls form a FIFO queue for the agents. A call is lost (abandons the queue) when its waiting time exceed its patience time. The patience times of calls are assumed to be i.i.d. random variables with the following distribution: with probability \(p\) the patience time is 0 (so the person hangs up unless there is an agent available immediately), and with probability \(1-p\) it is exponential with mean \(1/\nu\). The service times are i.i.d. gamma random variables with parameters \((\alpha,\beta)\).
We want to estimate the following quantities in the long run (i.e., over an infinite number of days): (a) \(w\), the average waiting time per call, (b) \(g(s)\), the fraction of calls whose waiting time is less than \(s\) seconds for a given threshold \(s\), and (c) \(\ell\), the fraction of calls lost due to abandonment.
Suppose we simulate the model for \(n\) days. For each day \(i\), let \(A_i\) be the number of arrivals, \(W_i\) the total waiting time of all calls, \(G_i(s)\) the number of calls who waited less than \(s\) seconds, and \(L_i\) the number of abandonments. For this model, the expected number of incoming calls in a day is \(a = E[A_i] = \sum_{j=0}^{m-1} \lambda_j\). Then, \(W_i/a\), \(G_i(s)/a\), and \(L_i/a\), \(i=1,…,n\), are i.i.d. unbiased estimators of \(w\), \(g(s)\), and \(\ell\), respectively, and can be used to compute confidence intervals for these quantities in a standard way if \(n\) is large.
Simulation of a simplified call center [CallCenter]
Listing CallCenter gives an event-oriented simulation program for this call center model. When the CallCenter
class is instantiated by the main
method, the random streams, list, and statistical probes are created, and the model parameters are read from a file by the method readData
. The line Locale.setDefault(Locale.US)
is added because real numbers in the data file are read in the anglo-saxon form 8.3 instead of the form 8,3 used by many other countries. The main
program then simulates \(n = 100000\) operating days and prints the value of \(a\), as well as 90% confidence intervals on \(a\), \(w\), \(g(s)\), and \(\ell\), based on their estimators \(\bar{A}_n\), \(\bar{W}_n/a\), \(\bar{G}_n(s)/a\), and \(\bar{L}_n/a\), assuming that these estimators have approximately the Student distribution. This is justified by the fact that \(W_i\), and \(G_i(s)\), and \(L_i\) are themselves “averages” over several observations, so we may expect their distribution to be not far from a normal.
To generate the service times, we use a gamma random variate generator called genServ
, created in the constructor after the parameters \((\alpha,\beta)\) of the service time distribution have been read from the data file. For the other random variables in the model, we simply create random streams of i.i.d. uniforms (in the preamble) and apply inversion explicitly to generate the random variates. The latter approach is more convenient, e.g., for patience times because their distribution is not standard and for the inter-arrival times because their mean changes every period. For the gamma service time distribution, on the other hand, the parameters always remain the same and inversion is rather slow, so we decided to create a generator that uses a faster special method.
The method simulateOneDay
simulates one day of operation. It initializes the simulation clock, event list, and counters, schedules the center’s opening and the first arrival, and starts the simulation. When the day is over, it updates the statistical collectors. Note that there are two versions of this method; one that generates the random variate \(B\) and the other that takes its value as an input parameter. This is convenient in case one wishes to simulate the center with a fixed value of \(B\).
An event NextPeriod(j)
marks the beginning of each period \(j\). The first of these events (for \(j=0\)) is scheduled by simulateOneDay
; then the following ones schedule each other successively, until the end of the day. This type of event updates the number of agents in the center and the arrival rate for the next period. If the number of agents has just increased and the queue is not empty, some calls in the queue can now be answered. The method checkQueue
verifies this and starts service for the appropriate number of calls. The time until the next planned arrival is readjusted to take into account the change of arrival rate, as follows. The inter-arrival times are i.i.d. exponential with mean \(1/R_{j-1}\) when the arrival rate is fixed at \(R_{j-1}\). But when the arrival rate changes from \(R_{j-1}\) to \(R_j\), the residual time until the next arrival should be modified from an exponential with mean \(1/R_{j-1}\) (already generated) to an exponential with mean \(1/R_j\). Multiplying the residual time by \(\lambda_{j-1}/\lambda_j\) is an easy way to achieve this. We give the specific name nextArrival
to the next arrival event in order to be able to reschedule it to a different time. Note that there is a single arrival event which is scheduled over and over again during the entire simulation. This is more efficient than creating a new arrival event for each call, and can be done here because there is never more than one arrival event at a time in the event list. At the end of the day, simply canceling the next arrival makes sure that no more calls will arrive.
Each arrival event first schedules the next one. Then it increments the arrivals counter and creates the new call that just arrived. The call’s constructor generates its service time and decides where the incoming call should go. If an agent is available, the call is answered immediately (its waiting time is zero), and an event is scheduled for the completion of the call. Otherwise, the call must join the queue; its patience time is generated by generPatience
and memorized, together with its arrival time, for future reference.
Upon completion of a call, the number of busy agents is decremented and one must verify if a waiting call can now be answered. The method checkQueue
verifies that and if the answer is yes, it removes the first call from the queue and activates its endWait
method. This method first compares the call’s waiting time with its patience time, to see if this call is still waiting or has been lost (by abandonment). If the call was lost, we consider its waiting time as being equal to its patience time (i.e., the time that the caller has really waited), for the statistics. If the call is still there, the number of busy agents is incremented and an event is scheduled for the call completion.
The results of this program, with the data in file CallCenter.dat
, are shown in Listing CallCenter results.
Simulation of a simplified call center [CallCenter results]
This model is certainly an oversimplification of actual call centers. It can be embellished and made more realistic by considering different types of agents, different types of calls, agents taking breaks for lunch, coffee, or going to the restroom, agents making outbound calls to reach customers when the inbound traffic is low (e.g., for marketing purpose or for returning calls), and so on. One could also model the revenue generated by calls and the operating costs for running the center, and use the simulation model to compare alternative operating strategies in terms of the expected net revenue, for example.
A more elaborate simulation library for call centers, built over SSJ, can be found in [29], [30].