M.J.Espino: Input Data Analysis

Simulation Input Modeling

Discrete-event simulation models typically have stochastic components that mimic the probabilistic nature of the system under consideration. Successful input modeling requires a close match between the input model and the true underlying probabilistic mechanism associated with the system. The general question considered here is how to model an element (e.g., arrival process, service times) in a discrete-event simulation given a data set collected on the element of interest. For brevity, it is assumed that data is available on the aspect of the simulation of interest. It is also assumed that raw data is available, as opposed to censored data, grouped data, or summary statistics. Most simulation texts (e.g., Law and Kelton 1991) have a broader treatment of input modeling than presented here. Nelson et al. (1995) and Nelson and Yamnitsky (1998) survey advanced techniques. 1 COLLECTING DATA There are two approaches that arise with respect to the collection of data.

In this tutorial we first review introductory techniques for simulation input modeling. We then identify situations in which the standard input models fail to adequately represent the available input data. In particular, we consider the cases where the input process may (i) have marginal characteristics that are not captured by standard distributions; (ii) exhibit dependence; and (iii) change over time. For case (i), we review flexible distribution systems, while we review two widely used multivariate input models for case (ii). Finally, we review nonhomogeneous Poisson processes for the last case. We focus our discussion around continuous random variables; however, when appropriate references are provided for discrete random variables. Detailed examples will be illustrated in the tutorial presentation.

Input Data Collection

Data Collection Problems

We are having a data collection issue for some of our accounts and are actively working on a fix.
We have the problem mostly resolved and are fine tuning. Data collection may be bumpy as we tune the collection process.
We appear to be having a bump in the road and are investigating.
We’re hitting file handle limits that are taking everything down. We will switch back to a previous known working configuration.
We’ve reverted to a previous configuration but are still seeing problems.
The problem appears to have moved to our database tier. The UI is now affected for all customers.
We have reverted back to a previous configuration and things are stabilizing. The database issue has been solved. The UI should be back.
We have been stable now since 11:30am. We believe this incident is closed.

Over the past several weeks we’ve been moving our data collection tier over to a new system. The new system is designed to handle significantly more load than the existing one. The conversion had been going well until this incident. Testing revealed a design flaw, for which we put a fix in place. We will begin migrating to the new system again once we’re confident this problem won’t occur again. We apologize for the downtime.

Practical Suggestions

This section provides a number of practical suggestions to improve the capacity of organisations and individuals involved in Sport & Development.

Recognise potential risks

Experience shows that being aware of potential risks and taking suitable measures to anticipate them can help to avoid problems in the future. Attempts should be made to empathise with the constraints and challenges the local partner in its local situation faces. Experience shows that regarding sport as an integral part of the programme with a view of capacity building that incorporates sport and development elements, helps to ensure better quality programmes in the long-run with increased sustainability.

Attempt capacity building at all levels

Capacity building can be divided into interventions at three levels: Human Resource Development (HRD), Organisational Development (OD) and Institutional Development (ID). Good and sustainable capacity building is conditional upon investing in all three levels.

It is difficult for sport organisations to be active at all three levels but it is a necessity if one aims to improve the sustainability of a programme. It is difficult to negotiate with ministries for the recognition of diplomas, for supporting legislation (for example, physical education), so that trained sport instructors can work in schools. It can be useful to try to accomplish these tasks through networking with organisations that might use other methodology to reach an overall development goal.

As a consequence, in some countries the sport projects of sport and development organisations are ‘filling the gap’ in the lack of physical education on the school curriculum. Investments will therefore also have to be made at OD and ID levels in order to ensure these sport activities are sustainable and are used to achieve the objectives set at the initial stages of the programme.

A joint consultation process should take place to decide what changes are needed internally and what possibilities there are for institutional change. A possible next step can be networking with other stakeholders in the field and/or signing agreements with the local/national governments.

There are many examples of different forms of cooperation between organisations. For instance, KNVB and UNICEF offer their core capacities within the MYSA project, in which KNVB concentrates on activities in relation to football and UNICEF is responsible for guiding the elements of the programme that relate to social change.

Local ownership is essential

Many of the projects currently being implemented in Sport & Development are not locally owned. A cooperative relationship must be developed with a local partner organisation before starting a project.

Assessment of partners and potential of cooperation

It is important to assess capacities and resources of each organisation entering into a partnership before joining forces. Questions such as: “Do the organisations match in terms of mission and vision? What resources does each partner bring to the programme? Does the local partner organisation have the capacity to absorb the programme and what can other partners do to enhance this? What measures are in place to encourage learning and to share experiences between the partners involved?” should be asked before entering into a formal partnership.

A tool has been developed by Commonwealth Games Canada for the selection of a suitable partner: the Partnership Filter. Potential partners are screened on the basis of several criteria.

Go to the partnerships section to read more about this instrument: examples of how it can be used and advice is provided for selecting and successfully developing with partners.

Qualifications of trainers posted abroad

People who had been trained as sport leaders used to be sent abroad to implement Sport & Development programmes. Qualified staff need to be enlisted. Feedback from experts has shown that a shift is now taking place in the background of people who are posted overseas, with a decreasing focus on recruitment of unqualified staff.

Sustainability

Many projects train local people. This is capacity building at the level of HRD. Often the project ends after the training. But what happens next with the individual capacity that has been developed? Is it used by the partner organisation? Are more sport activities offered? To what extent does the target group take part? It is of great importance to monitor and evaluate the implementation that takes place after the training sessions and to plan measures in the design phase of projects and programmes that will lead to sustainability.

Effects of Period of Time

Many process related with human activities
are not stable even within small time periods.
• For example, arrivals rates in airports,
restaurants, banks will be significantly effected
by time of day.
• Period of time may not be important if we are
interested in a small portion of time period
(e.g. worst case scenario for times having
peak demans).
• If period of time is significant;
– Collect data from a whole range of different
time periods,
– Examine data collected, and
– Divide data into intervals for different time
periods if required.

Input Modeling Strategy

Histograms

• A graphical display of tabulated frequencies (a set
of data intervals & sample counts for them).
• Data samples are commonly represented as times
for occurance of some events or completion of a
process.
• No definite rule to select correct histogram
parameters.
• Iterate through;
– Adjusting starting point and interval width,
– And setting the number of intervals to
cover all the data.
• Select an appropriate histogram for
representing the data samples.
• If interval widths are so large,
– Chart will be too coarse, and
– Details of the shape of the data will be lost.
• If interval widths are so small,
– Chart will be too noisy, and
– Overview of the shape of the data will be lost.
• There is no best histogram.
• As a suggestion try to cover at least 3 to 5
samples in each interval.

Probability Distribution

• Describes the values and probabilities
associated with a random event
(probability distribution function,
probability density function).

Selection of Probability Distribution

• Use a set of criteria to rank goodness of fit of
the fitted distributions to the data.
• If any of the top/ranked models are terribly
inconsistent with the range of limits of value,
rule them out.
• Use a reasonable set of criteria to determine
if the best of the fitted distributions is a
reasonable representation of data.
• If best one provides a reasonable
representation of data,
– Use it in simulation,
• Otherwise,
– Use an emperical distribution to represent
data directly.

Evaluating Goodness Of Fit

• Consider a number of measures of goodness
of fit rather than a single one
– Since each will be unrealible in some cases.
• Do not depend on goodness of fit measures
– That rely on overly clean data samples (e.g.
ignored problematic samples) or
– On user supplied parameters (e.g.
histogram configurations).
– Since they can provide inconsistent results.

• In the context of simulation input modeling,
– Classical goodness of fit methods in
statistics are not completely appropriate for
final assessment of quality of fit.
– Statistical methods have definite
assumptions that are sometimes not true for
simulation modeling.
– So, graphical heuristic methods should also
be used to assess which is best and which
is good enough.

• For evaluation;
– Histograms, and
– Emperical cummulative probability
distribution function of sample data can be
used.

M.J.Espino

Miyerkules, Nobyembre 30, 2011

Input Data Analysis