How to quickly simulate an industrial process using Python

There comes a point when your spreadsheet models of business processes just don’t cut it.

You observe some complexity that is just too difficult to explore. You have questions that remain unanswered because you can’t do the analysis.

One way of tackling this is to build a model of your business operations so that you can simulate them and gain some useful insight.

We are going to look at an everyday approach to creating a model of an industrial process or service. We shall consider how we can ask questions of the model and use that to improve our understanding.

With this understanding we shall then look at building a simple tool to simulate the operation of the process. This simulation will produce results that we can use to experiment with different scenarios so that when we go back to the business, we can take more informed actions.

First, we need to understand the system.

Understanding business operations

Manufacturing facilities vary in both size and complexity. One factory might have two or three areas where different processes take place. Another factory might contain hundreds of machines.

Each of the factories will have evolved to cope with the manufacture of different products, different mixes of order types, different customer demand profiles, varying quality of raw materials, unpredictable machine breakdowns and so on.

The list of possible interruptions to a neatly ordered continuous flow of efficient output is endless.

Shop floor supervisors manage these variations using their analytical skills and experience. At some point during the working week, they’ll be required to answer the following questions:

  1. When will order X be finished?
  2. How much stock is tied-up in the factory?
  3. What is the utilisation of the work centre?

These questions might be asked by different stakeholders.

Question 1 probably comes from the customer, via the sales department, perhaps because the order is late.

Question 2 might come from purchasing who are concerned with re-order quantities for input materials. Or it might be the accountants who are assessing cashflow.

Question 3 is certainly asked by the accountants, so that they can put a measure on the production potential of a factory. But it is also posed by the planners who want to find additional production capacity for more customer orders.

In a smaller organisation these roles may be undertaken by the same person. In larger companies the functions will be separate departments. Whatever the size of the facility, the questions are the same. The answers are likely to be the same also.

When faced with such questions there are too many variables to consider, for you to make a reasoned judgement. Such answers start with “it depends”.

Attempting to quantify the lateness of an order is dependent upon the jobs in front of the late order, the reliability of the process, whether the operator is working at peak performance, the quality of the tooling and raw materials, etc.

If the process in question is fed, or feeds into other processes, the opportunities for error are compounded. This leads to the use of estimates which might be generous and therefore may build inefficiencies into how we manage the overall operations.

What we need is a model of the facility. This model captures the essential characteristics of the business unit and lets us change some of those characteristics so that we can see what the effects of those changes might be.

Our supervisor might have had an idea to reduce the batch sizes of their orders, but not felt able to try it out as their machine utilisation measures might drop.

If something went wrong and an order was late, the change initiated by the supervisor might be cited as the cause of the reduction in output.

But if that change could be applied to a model, that has no physical connection to the real facility, perhaps we could learn more about how the system behaves. If we understand the system better, we stand to make better decisions in the future.

This practice is referred to as simulation and it has long been the preserve of industrial mathematicians, or scientists who study operational research. Such work creates a lot of value for organisations, by creating models and allowing production personnel to experiment with different strategies.

However, these mathematical approaches are often inaccessible and significant training is required to interpret the models.

We can often obtain much of the benefit of simulation without the need for advanced mathematical skills, and this is the approach that we shall take in this article.

Everything is a queue

Let’s assume that you visit the local supermarket to buy a few items. You select your items and make your way to the checkouts to pay for the shopping.

There are a number of checkouts in the supermarket but for some reason only one of the checkouts is operational. You are not the only customer in the store, and there are three other people already at the checkout, waiting in a queue to be served.

When they have been through the checkout, it will be your turn to be served. Fig. 1 illustrates the scenario.

queueing model of a supermarket checkout

Fig. 1 Supermarket queueing with one checkout

We are going to assume that each person and their shopping in the queue represents one job.

Just for a moment, think about your answers to the following questions:

  1. When will your job (you and your shopping) be finished?
  2. How many jobs are there in the queue?
  3. What capacity of the checkout is utilised?

You may recognise these questions from earlier. What was your answer to Question 1?

Since we don’t know how long it takes to process any of the shopping, we would have to say “it depends”.

It depends on how much shopping each person in the queue has; this might range from a hand basket to an over-laden trolley.

Question 2 is a little simpler. We know that each person and their shopping is classed as a job, so we just count the number of jobs in the queue. If there are three jobs waiting in front of you, there must be a job in progress at the checkout, which makes four jobs.

And then there is you, bringing the total to five.

And what about Question 3?

When thinking about utilisation we need to consider potential interruptions such as:

  • the checkout operator being changed at the end of a shift;
  • a request from a customer services supervisor for a price because the barcode on an item is unreadable;
  • a power cut causing the till to stop working.

If there is a queue of customers, and there are no disruptions to the actual process, we can assume that the checkout is kept busy. Once the queue becomes zero (all the jobs have been processed), or there is an interruption, the checkout becomes idle and the utilisation drops to zero.

Now that we have a basic representation of our supermarket checkout in place, let’s see how we can alter the performance of the system.

The supermarket manager realises that if customers have to queue for too long they may become frustrated, or even leave the store without making a purchase. This is not good for business, so another checkout is opened up as in Fig. 2.

supermarket queueing model with two checkouts

Fig. 2. Supermarket queueing with two checkouts.

Now, you approach the checkouts and find that there are two checkouts working. Each checkout is processing one job each, with a queue of one job waiting also. You are free to join either queue.

Let’s assume that it takes the same amount of time to process each job. If that is the case, since both queues are shorter, you will have to queue for less time before your job is processed. The utilisation of the checkouts reduces however, unless there are more jobs arriving behind you.

We can thus deduce that there is some form of relationship between the number of available checkouts, the number of jobs to be processed, and the overall time taken to process an individual job.

If the supermarket manager had such a model, they could experiment with the optimum number of checkouts to service their customer demand patterns. This would help them allocate the correct number of checkout staff for busy periods, while reducing the instances of checkouts being idle during quieter times.

The model would permit them to plan for seasonal adjustments in shoppers’ behaviours.

But, if the model can be executed quickly, it could also be a tool to explore a scenario that is unfolding – such as a large influx of customers that were unexpected – and this is where modelling and simulation can become a powerful tool for the management of business operations.

Modelling an industrial process

We shall now consider an industrial scenario. A joinery company produces wooden window frames. Each of the frames is cut from lengths of timber that are shaped and cut to length by a machine.

The company receives orders of varying quantities of windows, which means that varying numbers of timber lengths are required from the first machine. The company only cuts timber lengths for orders and does not make products to put into stock.

Each order is considered to be a job. Just as was the case with the shopper and their variable amount of shopping, each job can vary in size.

Each job must then spend a certain amount of time waiting in a queue, before being processed by the machine. The total time that the timber is in the system is queueing time + processing time.

Both the queueing time and the processing time are dependent upon the size of the respective order.

We can see now that the model for creating lengths of timber window frame is essentially the same as our first supermarket model.

We have jobs, a queue, and a processing station, where the actual work gets done. This scenario is illustrated in Fig. 3.

Queueing model for a single industrial machine

Fig. 3. Queueing model for a single industrial machine.

For instance, what impact is a longer queue going to make on a) resource utilisation, and b) the overall time that a job spends in the system?

A longer queue suggests that there will be less interruption to flow, so the utilisation will be higher.

However, the longer the queue the more time that a particular job takes to be completed, so the delivery time is longer.

The next stage is to build a simulation so that we can verify our thoughts.

Designing a process simulation

We now have an illustration of how we can model a single industrial process. That model is part of the initial specification of a simulation that we can execute. The simulation will execute a virtual production run, and that will give us an idea of how the model can perform.

The simulation allows us to change different parameters of the model, without incurring the cost or disruption of moving physical plant around.

So far, our model describes:

  • a process of material conversion, where lengths of timber are given a profile and then cut into shorter lengths that are suitable for window frames;
  • a single machine that performs the operations described above;
  • each job is processed one at a time. Multiple jobs cannot be processed simultaneously;
  • jobs arrive for processing and wait to be processed in a queue;
  • a job that has been processed is deemed to be complete and exits the system.

We now need some more information to allow us to build the simulation.

First, we should describe the rate at which jobs arrive for processing.

Second, we need to specify the time taken to process a job.

Third, we need to consider whether there is any variation in the size of a job. For this first example we shall assume that each job requires roughly the same amount of time to process. We shall explore variable job sizes later.

There are many different simulation tools that can be used to build queueing models. We shall be using “Ciw” (which is Welsh for “queue”).

Ciw is a simulation framework that uses Python and as such is free to acquire and use. Just type ‘ciw python’ into Google to find it.

Within Ciw, there are three parameters that are of relevance to our industrial process model.

  1. arrival_distributions: this is the rate at which jobs arrive to be processed. We shall assume that the jobs arrive approximately every 15 minutes, or four times per hour;
  2. service_distributions: this is the time that each job spends being processed, or the time taken to do the shaping and cutting to length of the timber by the server (machine). We shall assume that each job takes 15 minutes;
  3. number_of_servers: this represents the number of machines at a workstation. In our example, we have one machine, or one server.

It is important to note at this point the difference between parameters that are static, and those that might vary.

For instance, for a given simulation we can assume that the number of machines (servers) doesn’t alter, so we give it the value of 1 as we want to investigate the scenario with one machine.

However, while we can say that jobs arrive at a rate of four times per hour, or every 15 minutes, that isn’t strictly realistic.

Sometimes there are interruptions to the deliveries. A forklift truck might drop the timber when loading it from the lorry, or there may be a physical blockage preventing the wood being placed next to the machine.

Similarly, the time taken to process the timber won’t always take 15 minutes. This is just an approximation that – on average – takes 15 minutes.

Sometimes the timber might blunt the cutting blades of the machine and it will take longer to finish the operation.

Conversely, when the tooling is new or freshly sharpened the machining time will be less than 15 minutes.

We want our simulation to take account of these variances and we do this by specifying a distribution function. This tells the simulation to use a range of values, whose mean is the arrival rate that we are suggesting.

So, for an arrival rate of 15 minutes, the simulation will generate a set of values that vary, with a mean time of 15 minutes.

This allows the simulation to be more realistic as it will take account of naturally occurring variations in waiting and processing times.

We are now ready to build the simulation.

Building the simulation in Ciw

Create a new text file called:

timber_conversion.py

We shall enter some snippets of code now to quickly create a simulation to produce some results. Try not to worry about some of the details just yet as they will be explained later.

What is important is to execute a simulation so that we can start to understand the timber conversion process better. First, we specify the arrival and service distributions, along with the quantity of servers:

import ciw

N = ciw.create_network(
    # jobs arrive every 10 minutes, or 6 times per hour
    arrival_distributions=[ciw.dists.Exponential(0.1)],
    # jobs take 15 minutes to process which is 4 jobs
    completed per hour
    service_distributions=[ciw.dists.Exponential(0.067)],
    # the number of machines available to do the processing
    number_of_servers=[1]
)

You might have noticed that the value contained in

[ciw.dists.Exponential(0.1)]

does not seem to relate to an arrival rate of 6 times per hour. This distribution function requires a decimal value, so we divide the arrival rate of 6 (arrivals per hour) and divide it by 60 (the number of minutes in an hour).

Similarly, for the service time, the rate of processing per hour is 4 and is represented as 4/60 = 0.067.

The next piece of code to add is:

ciw.seed(1)
Q = ciw.Simulation(N)
# run the simulation for one shift (8 hours = 480 minutes)
Q.simulate_until_max_time(480)

This is an instruction to tell the computer to create a simulation and to run it for a simulated time of one shift (8 hours/480 minutes).

That is all that is required to create the simulation. However, there are no instructions to tell the computer to report the results. The following program code does this:

waitingtimes = [r.waiting_time for r in recs]
servicetimes = [r.service_time for r in recs]
avg_waiting_time = sum(waitingtimes) / len(waitingtimes)
print(`Avg. wait time: ',avg_waiting_time)
avg_service_time = sum(servicetimes) / len (servicetimes)
print(`Avg. processing time: ',avg_service_time)
print(`Avg. machine utilisation %:',
		Q.transitive_nodes[0].server_utilisation)

There are three results that are reported (look for the ‘print’ keyword).

First, the average waiting time in minutes for each job.

Second, the average time taken to process each job in minutes.

Finally, the average utilisation of the machine (server) as a percentage.

When you execute your simulation you should see the following results in the console:

Avg. wait time: 51.51392337104136
Avg. processing time: 12.643780078229085
Avg. machine utilisation: 0.9969939361643851

This tells us that on average, a job took nearly 13 minutes to process and had to wait approximately 52 minutes in the queue. The machine was operating for most of the time (99.7% utilisation).

This is excellent for a shopfloor supervisor who has to report the percentage of time that a machine spends idle.

Hardly any downtime for the machine in this situation.

However, let’s use the simulation to start investigating different scenarios.

We shall now explore the effect of increasing the number of machines from one to two.

Edit the following line to increase the number of servers (machines) to 2:

number_of_servers=[2]

If we execute the simulation again, we observe the following results:

Avg. wait time: 8.79660702065997
Avg. processing time: 14.249724856289776
Avg. machine utilisation: 0.6827993160305518

We can see that the addition of an extra machine has dramatically reduced the wait time from 52 minutes to around 9 minutes. The utilisation of the two resources has also fallen to 68%, meaning that machining resources are idle for approximately 32% of the shift.

While there is a reduction in waiting time, and therefore the overall lead time to delivery of a product, there is the additional capital cost of extra plant. Depending on how the machine is operated, there may also be extra labour required to run both machines at the same time.

The shopfloor supervisor has a conversation with the company owner and it is clear that there is no cash with which to purchase another machine. The next course of action is to try and increase the output of the timber conversion process.

The service time is 15 minutes, which means that 4 jobs per hour are processed.

What difference would it make if we could process 5 jobs per hour?

Edit the following line to reflect a service rate of 5 jobs per hour (5/60=0.08):

service_distributions=[ciw.dists.Exponential(0.08)]

Here are the results:

Avg. wait time: 26.20588722740488
Avg. processing time: 11.300597865271746
Avg. machine utilisation: 0.9485495709367999

The machine utilisation has increased, but the waiting time is much less than it was with a service time of 15 minutes. This illustrates that there is a significant benefit to be had by making even small changes to the service time of a process.

Such thinking is central to “lean manufacturing” techniques, where potential opportunities for the removal of waste are identified.

There might be some different tooling that enables the timber to be cut at a faster rate, or there might be a better way of organising the material so that the cutting-to-length operation is optimised for the fewest cuts.

Confidence

Once we have built a simulation, it is important that we are confident that it represents the situation that we are modelling.

If we look at the results we have observed so far, what do we notice about the average processing time?

We have obtained three different values: 12.6, 14.2 and 11.3 minutes. This is a significant range of values and it suggests that the simulation might not be taking a sufficient number of scenarios into account.

For a given scenario, there is a time when the simulation queue is empty, and then partially complete, until a steady state of operation is achieved. Similarly, towards the end of a simulation there will be a number of jobs that remain unfinished.

When we report the statistics of how the process has performed, we are collecting the data for jobs that have been completed.

Depending on the time require to ‘wind-up’ and ‘wind-down’ a simulation, there could be a disproportionate effect on the performance that we observe. This would decrease our confidence in ability of the simulation to be used as a tool for experimentation.

We deal with this in two ways. First, we run the simulation for a longer time and then report only the performance from the system once it is in a steady state of operation.

For our 8 hour shift, we could add an hour before the start and at the end for warm-up and cool-down periods.

Second, we can run the simulation many times, altering a number (called a ‘seed’) so that each run has some variation introduced into it.

Create a new file called

timber_conversion_2.py

and enter the following code:

import ciw

N = ciw.create_network(
    # jobs arrive every 10 minutes, or 6 times per hour
    arrival_distributions=[ciw.dists.Exponential(0.1)],
    # jobs take 15 minutes to process which is
    	4 jobs completed per hour
    service_distributions=[ciw.dists.Exponential(0.067)],
    # the number of machines available to do the processing
    number_of_servers=[1]
)

runs = 1000 # this is the number of simulation runs
average_waits = []
average_services = []

for trial in range(runs):
    ciw.seed(trial) # change the seed for each run
    Q = ciw.Simulation(N)
    # run the simulation for one shift (8 hours = 480 minutes) 
    	+ 2 hours (120 minutes)
    Q.simulate_until_max_time(600, progress_bar=True)
    recs = Q.get_all_records()
    waits = [r.waiting_time for r in recs if r.arrival_date >
    	60 and r.arrival_date < 540]
    mean_wait = sum(waits) / len(waits)
    average_waits.append(mean_wait)
    services = [r.service_time for r in recs if r.arrival_date > 60 and r.arrival_date < 540]
    mean_services = sum(services) / len(services)
    average_services.append(mean_services)
    
print(`Number of simulation runs: ',runs)
print(`Avg. wait time: ', sum(average_waits)/len(average_waits))
print(`Avg. processing time: ',
	sum(average_services)/len(average_services))

Execute the code and you will observe the following results:

Number of simulation runs: 1000
Avg. wait time: 115.69878479543915
Avg. processing time: 14.87316389724181
Avg. machine utilisation: 0.8560348867271905

You can now edit the variable <runs=1000> to change the number of times that the simulation executes.

As the value of <runs> increases the statistics start to stabilise. This indicates that we can have confidence that the simulation is providing results that we trust. This is regarded as good practice for the modelling and simulation of systems.

Conclusion

We have looked at the application of queueing to the modelling of an industrial process. Our queueing model helps us understand the system better, and it also helps specify the various parameters that are important to include in our analysis.

This specification can then be used with a simulation tool. We have used Ciw to quickly construct a simulation that represents our queueing model.

As the simulation runs we collect summary statistics that can help us understand the inter-relationships between parameters such as job arrival rates, processing times and the number of resources available to do the work.

We can then explore different scenarios by changing these parameters and this helps us understand what the limits of the system might be. Exploring different situations via simulations is an inexpensive and quick way to find the limits of a system, or to identify new possibilities.

For example, you might want to find ways of increasing the output of a factory temporarily to complete a particular rush order for an important customer.

You know that you can increase capacity by adding another shift or by buying new plant. But you might want to know how many additional operators you need to bring in to complete the extra work. You’ll also want to see how this might impact the rest of the orders for other customers.

You might not be able to buy, install and commission new plant quickly enough, but a simulation can give you a good idea as to whether you should out-source some of the work or not.

An example of using simulation strategically is to consider the potential impact of the sales team’s forecast for the next quarter; you could use this forecast to investigate the demands that would be made on your business resources and see what resources you might need.

If you need to, you’ll be in a much stronger position to justify the acquisition of new plant or additional staffing.

Have a go for yourself

Using the program code from above, experiment with different values.

You can change the parameters for the number of simulation runs for instance, but you can also change the ‘shift length’; this refers to the amount of simulated time that the program executes.

Simulation code allows us to try out different values quickly, to see what the different effects might be. This is convenient when we have a specific question to answer.

However, we often need to perform deeper analysis of a simulation model, and in such cases it is useful to record the effects of our changes.

Try to adopt good practice by recording the values that you change, noting the effects of these changes in a table. This habit will help you when your models increase in complexity.

Some good questions to ask of this model could be:

  1. What is the effect on machine utilisation as the arrival rate of jobs declines?
  2. How would you find an optimum set of values to ensure that the system is balanced?

When you start to build simulations, you quickly gain a deeper appreciation of the dynamics of systems. An important part of simulation is being able to discover, and then communicate the results of your simulation.

Using the program code above plus the details available in Ciw documentation, develop some additional information to report.

For example, it would be interesting to see what the average length of the queue is before the machine.

This will then tell us what the total inventory that is being processed amounts to (Work in Progress, or WIP).

The code above currently reports the average (mean) of a set of values. Enhance the reporting to include additional summary statistics such as standard deviation.