A Very Geeky Analysis of 8 Months of Data Collection on Running Trains

I rejoined my model railroad club in 2016. Over those 3 years, I had a feeling I ran certain trains and locomotives more than others. Since I work as an analyst, I decided to do some data collection on this. This post is a geeky analysis of how I collected the data, the results, and the plan for this year.

The Collection Goals

My goals for this exercise were simple: how often did I run specific locomotives and specific trains1. For example, how often did I run my Erie Lackawanna SD45s, and how often did I run my coal train?

Collecting the Data:

In the beginning, I just kept a Numbers file with the following information:

Date Reporting Mark Engines Train Name
4/13/19 EL SD45s Freight
4/13/19 UP WideCabs Freight

It was basic information. The date the train ran, the road name of the engines pulling the train, the specific engines that I ran, and a generic name of the train. The challenge of modeling two eras is I do need to be specific on the locomotives and trains. I can’t run my Erie Lackawanna power on a modern train since the railroad ceased to exist in 1976.

By the end of the year, I realized the information was too basic. In the case of the Union Pacific train, generalizing the locomotives made sense since I only had two UP locomotives. By the end of the year, I had four engines that were era-appropriate. I had to get more specific, but not screw up my count. If I ran the coal train with 4 engines, I wanted to count the train running once, and each of the engines running once, but I didn’t want to count the coal train running four times. So, I couldn’t list each of the locomotives as running a coal train since that would inflate the count.

I also didn’t want to track how many laps around the layout each train ran. I thought about it, but decided there were too many variables. A good example is a train I am making adjustments too at the club. I may make 5 laps with it, but I am adjusting problematic cars. Meanwhile, the train that runs fine may only make one lap. I decided I didn’t care about that data. I also didn’t care often a specific locomotive ran a specific train.

In the end, I ended up with a Numbers spreadsheet with the following data:

Date Reporting Mark Engines Train Name Train Class
12/7/2019 UP SD70ACE UPFRT1 Manifest
12/7/2019 UP ES44AC DPU No count
12/7/2019 CSX SD40-2 DPU No count

This is a single entry for one train running: UPFTR1. I got away from generic names, and now most trains have detailed names. What this entry notes is on 12/7/2019 I ran my UP Freight. Pulling the train were 3 engines (two UP and one CSX). DPU is a railroading term for Distributed Power Unit — typically power that runs mid- or end-of-train. I adopted it to mean any additional power that runs on a train. This means that I can get a single count for running the SD70ACE, the ES44AC, and the SD40, but keeping the count for UPFRT1 as one. Train class is a higher-level designation to see how often I ran a manifest, passenger, etc., train2. A No Count entry just means don’t count the two DPU units as Manifest train. It’s just a value to filter out in the report. As with the train name, I only care about one train class per train name. Now, if I run two different manifest trains with the same power on each, then I want two counts. There is also a count for running locomotives during the club’s operating session, but I didn’t need much specificity on the usage.

There is also a worksheet that lists all of my locomotives and rolling stock. The rolling stock also shows what cars the train is currently assigned to.

Analyzing the Data

I was just using a variety of CountIF statements in Numbers. When the data I collected was basic this worked ok. As I wanted to do a deeper analysis on this the limits of CountIFs became a burden. Adjusting each of the CountIFs as I changed the naming was a pain. Numbers doesn’t really support pivot tables, and even then, I wanted a little more flexibility and ease of use.

Enter Tableau.

I use Tableau at work infrequently as a reporting package. I was rusty and wanted to try and keep sharp on using the tool. Tableau doesn’t support Numbers as a data source, so I moved it into Google Sheets.

Tableau made short work of filtering out the data. I could exclude the No Counts, filter out a few other things. Counting operating sessions became more of an aside.

While I cared more about locomotives and trains, the rolling stock sheet also lets me easily print out a grouped list for when I go to the shows. This way I can reduce the likelihood I buy a duplicate car.

Results of the data

Train Class Run Count
Passenger 25
Manifest 20
Unit 12
Operations 7
Local 5
Steam 2
Grand Total 71

I am not going to get into the specific trains run. Instead, I will just talk about the high-level view: the train class.

That passenger trains were around 30% of the total train types run does not surprise me. There are two reasons for this: they are my favorite type of train to run, and they are the easiest to set up and run. Each of my passenger trains can be stored and transported in one storage bin, locomotives included. If I am only going to be at the club for a short time, or just want something easy to set up, the passenger train is my usual choice.

There are only three trains that I classify as Unit trains (Coal, Grain, and Intermodal), so I was a little surprised to see them count as high as they did. Numbers-wise, it’s a wash on them. I ran the Intermodal 5 times, the coal train 4, and the grain 3. The grain train was the last train I put together, so it being low is unsurprising. After thinking about it, I like the look of passenger trains, and unit trains — where all the cars are the same type — have the same look.

Operations is just a catch-all that I ran units 6 times at the club operating sessions. From a data collection standpoint, it is hard to define. For instance, if I loan out a locomotive, how do I count it? For now, I am just noting a train class of Operations for the entire day, and counting which units I ran at each session.

Collecting Data in 2020.

The main item I am looking forward to in 2020 is getting an entire year’s data. I decided to gather the data in July 2019, and only had verifiable data from February onwards3.

I also decided I am going to collect data on the laps run. I don’t know what it will tell me due to a lot of variables that affect running trains: how long am I at the club; how the layout behaving; and how well my train is running4. I decided at least for 2020 I would collect the data to see if there are trains that make more laps than others. It’s easier to collect that from the start of the year.

Another goal that I am carrying over from 2019 is to not cook the books and run a train just to inflate a run count. I do track the last time I ran a train, so I will use that to run a train I haven’t run in a while. There is one train I haven’t run since July, so that is a candidate for running soon.

I am also tracking car usage. Mostly the date the car ran, the train it ran on, and its current assignment. Since the Google Sheets roster is kept up-to-date, it’s a quick cut and paste when I run a train. I can also capture if a car ran on multiple trains on a single visit.

Admittedly, this was a solution in need of a problem. It started with just ticking off a virtual sheet running a train and a locomotive and ended with a 5-tab Google Sheet fed into a Business Intelligence tool. Two goals were met: I have some data on running trains, and I got better at data collection and analysis.

If you are curious about the data, I posted the Viz here. There is a lot in the Viz I didn’t cover in the post, and the average laps data is obviously only going to be accurate for 2020.


  1. I didn’t really care about how often specific locomotives ran specific trains ↩︎
  2. I made a late-year change to make trains that ran cars of the same type (Coal, Grain, etc.) Unit Trains. ↩︎
  3. I’m not sure I am missing much. Grad School meant I didn’t get to the club much. ↩︎
  4. A poorly running train could require a lot of fixing, and thus more laps for testing. ↩︎