Data Acquisition

Courtesy of Taylor Friehl via Unsplash

Overview

In part one of the series on creating an NHL game prediction model, I discussed my personal motivation and the application of the model to a betting strategy. In this post, I will go into how I got the data, cleaned it, and got it organized to be usable for modeling.

I wanted to start by making a top down model. This means I would be using team based stats as my features as opposed to building up a model from the stats of the players on the teams. The exception to this is the goalie. Goaltenders in…


Credit: Chris Liverani, Unsplash

For my data science capstone project at Flatiron School, I built an NHL Game Prediction model. Over my next couple blog posts I am going through the various steps and lessons I took in order to build the model. Now without further ado….

Motivation

I have always been a huge hockey fan, and more recently also became a big data nerd. But even before jumping into the world of data and pursuing it as a career path, I was always interested in reading the journalists who delve deep into the analytics of the game. Over the past decade, I have seen…


What job-seeking data professionals can do about it

Image Courtesy of Jordan M. Lomibao via Unsplash

What is cloud computing?

Cloud computing is the delivery of computing services — including servers, storage, databases, networking, software, analytics, and intelligence — over the Internet (“the cloud”) to offer faster innovation, flexible resources, and economies of scale. — Microsoft Azure

Many companies today are dealing in petabytes of data (1 petabyte is one quadrillion bytes). Walmart goes through 2 petabytes of data per hour. Can you imagine how much data a company like Facebook or Netflix goes through?


Photo Credit: Kevin Wu via Unsplash

Over the course of my data science learning, I have had a ton of exposure to Scikit-learn. Scikit learn is one of the most popular machine learning libraries for Python. In this post, I will discuss some important practices to keep in mind when using the Scikit-learn library for building effective models.

Keep Your Data Preprocessing Consistent

Scikit-learn provides numerous libraries and methods for doing data transformations. Any Dataset transformations used when training a model, must also be used on test data or when deployed in production systems. For example, if you use their StandardScaler to scale the X train to fit your model, you…


Credit: Bruce Bennett / Staff

Former NFL coach Bill Parcells once wisely said — “you are what your record says you are”. Ultimately this is true. By the end of the NHL season, I think there is truly a large enough sample of games where every team deserves their place in the standings. But in short samples, sometimes teams may overperform or underperform their record. I want to take a look at some team level statistics and see what correlates highly with a team’s record (given the weirdness of the NHL point system where the loser point exists for OT losses, a team’s record =…


Credit: Bruce Bennett / Staff

In this post, I will demonstrate how to scrape data from tables on Natural Stat Trick, a website for getting standard and advanced NHL statistics. Natural Stat Trick is a reputable and frequently referenced data source for NHL analytics used by writers and fans alike.

First, using the Beautiful Soup library, I’ll go through the steps to scrape the html and convert the data to a Pandas DataFrame. Then, I’ll show a convenient one liner that works for tables on Natural Stat Trick and will work for tables on other webpages depending on their complexity. …

Gary Schwaeber

Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store