CasperJS and ArbiterSports

Sport event management seems to be nearly monopolized by ArbiterSports. My lacrosse officiating assignments are administered via Arbiter, and boy oh boy does the UI leave much to be desired.

Not only is the UI lacking, but they charge for additional "mobile" features, like remote/API calendar access, let alone synchronization. So, let's see what we can do to export this calendar information anyway.


Goal

Arbiter supports an "Outlook Export" button which specifies the time range to export a schedule. Clicking this button downloads a CSV:

Start Date,Subject,Start Time,End Date,End Time,All day event,Reminder on/off,Reminder Date,Reminder Time,Description,Location,Priority  
"3/12/2018","St. Mark's","5:00 PM","3/12/2018","6:30 PM",FALSE,"False","3/11/2018","5:00 PM","R: John Smith 555-555-5555 || U: Jane Doe 1(666)666-6666 || F: Joe Schmoe 777-777-7777","Saint Mark",Normal

This CSV can be used to import into a Google calendar. So if we can periodically download this CSV and then synchronize it with Google, we'll have our "mobile" features.

For now, let's aim to just download the CSV.


CasperJS

Blindly dealing with HTTP calls and HTML responses can be done in frameworks like Beautiful Soup for Python. But it's limited when JavaScript functions generate required data or manipulate the Document Object Model (DOM) (webpage) without you knowing.

CasperJS to the rescue. It's a headless browser with a JavaScript engine for evaluating HTML and any subsequent scripts. This removes the concern of having to manipulate raw HTML, and instead lets you focus on interacting with webpages to get you what you want.

Without further ado, the script:

var casper = require('casper').create({  
  viewportSize: {
    width: 1920,
    height: 1080,
  },
  logLevel: 'info',
  verbose: false,
});

// Shortcuts to commonly used functions
var x = require('casper').selectXPath;  
var dump = require('utils').dump;

// CLI parameters: --username=foo --password=bar --organization=NCLRA
var username = casper.cli.options.username;  
var password = casper.cli.options.password;  
var organization = casper.cli.options.organization;

// Janky global state variable filled on 'resource.requested'
var requestedResource = null;

// Load the Arbiter Sports front page and click 'Log In'
casper.start('http://www.arbitersports.com/', function() {  
}).thenClick(x('//*[@id="menu-item-76"]/a'), function() {
  // Fill in username and password information, without submitting the form
  this.fill(
    x('//*[@id="aspnetForm"]'),
    {
      'ctl00$ContentHolder$pgeSignIn$conSignIn$txtEmail': username,
      'txtPassword': password,
    }
  );
  // Submit the form via a click
}).thenClick(x('//*[@id="ctl00_ContentHolder_pgeSignIn_conSignIn_btnSignIn"]'), function(
) {
  // Select the organization to target -- unfortunately this is case sensitive
}).thenClick(x('//tr[./td[text() = "'+organization+'"] and .//span[text() = "Official"]]'), function(
) {
  // Go to the 'Schedule' tab
}).thenClick(x('//*[@id="lnkNavTabSchedule"]/a'), function(
) {
  // Aim to 'Outlook Export' the schedule
}).thenClick(x('//*[@id="ctl00_ContentHolder_pgeGameScheduleEdit_cmnUtilities_tskExport"]'), function() {
  /* The 'resource.requested' event hasn't been registered until now because
   * there are actually many 'POST's made to a single endpoint within Arbiter.
   * It's hard to tell them apart without inspecting individual fields being
   * submitted.
   *
   * So instead just wait to attach a handler until now, expecting that the
   * immediately following 'resource.requested' is the 'POST' that represents
   * the schedule.
   */
   // Set up the fileToDownload event handler
   var doOnlyOnce = true;
   casper.on('resource.requested', function(resource) {
     if (
       resource.method === 'POST' &&
       resource.url === 'https://www1.arbitersports.com/Official/GameScheduleExport.aspx' &&
       doOnlyOnce
     ) {
       doOnlyOnce = false;
       requestedResource = resource;
     }
   });

  // You can't 'fill' in a checkbox, so manually uncheck the 'Reminder' box
  this.click(x('//*[@id="ctl00_ContentHolder_pgeGameSchedulePrint_conGameSchedulePrint_isEnable"]'));
  // Fill in the to and from dates, clicking 'submit'
  this.fill(
    x('//*[@id="aspnetForm"]'),
    {
      'ctl00$ContentHolder$pgeGameSchedulePrint$conGameSchedulePrint$txtFromDate': '01/01/'+(new Date()).getFullYear(),
      'ctl00$ContentHolder$pgeGameSchedulePrint$conGameSchedulePrint$txtToDate': '12/31/'+(new Date()).getFullYear(),
    }
  );
}).thenClick(x('//*[@id="ctl00_ContentHolder_pgeGameSchedulePrint_navGameSchedulePrint_BtnExport"]'), function() {
  // We captured the requestedResource, replay it with an explicit call to download()
  this.download(
    requestedResource.url,
    'Export.csv',
    requestedResource.method,  // 'POST'
    requestedResource.postData
  );
  // Once downloaded, gracefully exit the script
  this.exit();
});

// Kick off the execution of the script
casper.run();  

Breakdown

General statements:

  • Event-oriented programming is weird.
  • Writing CasperJS is more akin to writing down the steps a human would take when interacting with a page, not a script of instructions.
  • Some CasperJS functions will wait for the resource to arrive, and be evaluated before giving you control ... (start(), open(), thenClick())
  • Other CasperJS functions will immediately start executing after its completion, potentially before your resource is ready. (then(), click())
  • Global state and registering functions mid-execution is... OK... -ish.
  • Man JavaScript is weird.
  • XPath selection doesn't make any sense, until it does.

Let's get into interesting bits...


casper object
var casper = require('casper').create({  
  viewportSize: {
    width: 1920,
    height: 1080,
  },
  logLevel: 'info',
  verbose: false,
});
  • This creates the casper object from the Casper module. The dictionary provided to .create() is also accessible via casper.settings once created.

  • Setting the viewportSize is required for Arbiter, otherwise it treats you as a mobile device (with too small of a screen, defaulting to 400x300).


Shortcuts
// Shortcuts to commonly used functions
var x = require('casper').selectXPath;  
var dump = require('utils').dump;  
  • It's pretty neat that you can pass functions around as objects. x() is used heavily within the script.

--Help, --help me Rhonda
// CLI parameters: --username=foo --password=bar --organization=NCLRA
var username = casper.cli.options.username;  
var password = casper.cli.options.password;  
var organization = casper.cli.options.organization;  
  • Ain't nobody got time for a --help message. Instead, I just expect --username, --password, and --organization to exist.

Global-ler
// Janky global state variable filled on 'resource.requested'
var requestedResource = null;  
  • I'm not a fan of having to do this, but I can't see how else to pass this object around from function to function without a higher global state.
    • I suppose I could performed the majority of this script within another function, and technically that wouldn't then be global... oh well.

start()
casper.start('http://www.arbitersports.com/', function() {  
}).thenClick(x('//*[@id="menu-item-76"]/a'), function() {
  // Fill in username and password information, without submitting the form
  this.fill(
    x('//*[@id="aspnetForm"]'),
    {
      'ctl00$ContentHolder$pgeSignIn$conSignIn$txtEmail': username,
      'txtPassword': password,
    }
  );
  // Submit the form via a click
})
  • start() will HTTP GET (configurable) the URL provided, and execute the anonymous function when the page has finished loading.
    • In this case, and all cases, I've left the anonymous functions blank.
  • Once the main page has loaded, click on the "Log In" button.
    • This is the first instance of an XPath being used. It targets the proper element via its id parameter, menu-item-76, which should be unique in the entire DOM.
    • Once this Log In page has loaded, fill in the form's username and password with the --username and --password parameters.
    • I choose not to "submit" the form because fill() doesn't wait for the next resource to become ready. Instead, I eventually call thenClick() on the submit button, which does wait for the next resource -- just personal preference.
  • I think I would like to try targeting elements by their value, and not their id. This would allow XPath targets like x('//*[text() = "Log In"]'), which would work even if the underlying id changed -- as long as the text Log In remained consistent, the script would work.
    • (I didn't use it because CasperJS wasn't liking the select for some reason. 🤷‍♂ī¸)

Core navigation
  .thenClick(x('//*[@id="ctl00_ContentHolder_pgeSignIn_conSignIn_btnSignIn"]'), function(
) {
  // Select the organization to target -- unfortunately this is case sensitive
}).thenClick(x('//tr[./td[text() = "'+organization+'"] and .//span[text() = "Official"]]'), function(
) {
  // Go to the 'Schedule' tab
}).thenClick(x('//*[@id="lnkNavTabSchedule"]/a'), function(
) {
})
  • Clicks the "submit" button described in the above section.
  • Clicks the proper organization row, which identifies your role as an "Official" within Arbiter.
    • You can be an Official for multiple organizations at once, so you must choose your specific role.
  • This Xpath is funky, combining two logical statements:
    • //tr[./td[text() = "${ORGANIZATION}"]]: Select a tr element, who has a td child whose value (identified via text()) equals "${ORGANIZATION}" (the CLI argument), and
    • .//span[text() = "Official"]: who has a(n) nth-span-grandchild, whose value is "Official".
  • Navigates to the "Schedule" tab.

Export form
  // Aim to 'Outlook Export' the schedule
  .thenClick(x('//*[@id="ctl00_ContentHolder_pgeGameScheduleEdit_cmnUtilities_tskExport"]'), function() {
  /* The 'resource.requested' event hasn't been registered until now because
   * there are actually many 'POST's made to a single endpoint within Arbiter.
   * It's hard to tell them apart without inspecting individual fields being
   * submitted.
   *
   * So instead just wait to attach a handler until now, expecting that the
   * immediately following 'resource.requested' is the 'POST' that represents
   * the schedule.
   */
   // Set up the fileToDownload event handler
   var doOnlyOnce = true;
   casper.on('resource.requested', function(resource) {
     if (
       resource.method === 'POST' &&
       resource.url === 'https://www1.arbitersports.com/Official/GameScheduleExport.aspx' &&
       doOnlyOnce
     ) {
       doOnlyOnce = false;
       requestedResource = resource;
     }
   });

  // You can't 'fill' in a checkbox, so manually uncheck the 'Reminder' box
  this.click(x('//*[@id="ctl00_ContentHolder_pgeGameSchedulePrint_conGameSchedulePrint_isEnable"]'));
  // Fill in the to and from dates, clicking 'submit'
  this.fill(
    x('//*[@id="aspnetForm"]'),
    {
      'ctl00$ContentHolder$pgeGameSchedulePrint$conGameSchedulePrint$txtFromDate': '01/01/'+(new Date()).getFullYear(),
      'ctl00$ContentHolder$pgeGameSchedulePrint$conGameSchedulePrint$txtToDate': '12/31/'+(new Date()).getFullYear(),
    }
  );
})
  • Once we load the "Export Schedule" page, there is a form with "To" and "From" dates; we'll eventually need to fill them in. There's also a "Reminder" checkbox which we'll want to uncheck.
  • Arguably more importantly, however, is the need to register a function when the resource.requested event fires.
    • This is because reasons. Particularly because of ArbiterSports reasons.
    • We can't save the received.resource for some reasons, so we're a bit janky...
  • We record when a resource is requested, and aim to specifically identify the POST with our "To", "From", and "Reminder" fields. Once recorded, we can save it back to global state for future use.
  • There are more resource.requested events coming, so we have a doOnlyOnce flag to help us not execute the callback function on future resources.
    • I tried unloading it -- this.on('resource.requested', function() {}); -- but that didn't work. ☚ī¸
  • The year is dynamically specified via the Date class.

The home stretch
  .thenClick(x('//*[@id="ctl00_ContentHolder_pgeGameSchedulePrint_navGameSchedulePrint_BtnExport"]'), function() {
  // We captured the requestedResource, replay it with an explicit call to download()
  this.download(
    requestedResource.url,
    'Export.csv',
    requestedResource.method,  // 'POST'
    requestedResource.postData
  );
  // Once downloaded, gracefully exit the script
  this.exit();
});

// Kick off the execution of the script
casper.run();  
  • Once we click the "Export" button, we wait for the resource.requested event to be captured (above section), and then aim to download() it (again).
  • Re-using much of the data in requestedResource, we save it to "Export.csv", and exit() gracefully.
  • To start the entire process, we must run() the script.

Closing thoughts

There are more things that could be done here:

  • Download the file only once
  • Store the file in memory to print, and then exit

I will need another script which persists this information to Google calendar.