What job-seeking data professionals can do about it

Image Courtesy of Jordan M. Lomibao via Unsplash

What is cloud computing?

Many companies today are dealing in petabytes of data (1 petabyte is one quadrillion bytes). Walmart goes through 2 petabytes of data per hour. Can you imagine how much data a company like Facebook or Netflix goes through?


Photo Credit: Kevin Wu via Unsplash

Over the course of my data science learning, I have had a ton of exposure to Scikit-learn. Scikit learn is one of the most popular machine learning libraries for Python. In this post, I will discuss some important practices to keep in mind when using the Scikit-learn library for building effective models.

Keep Your Data Preprocessing Consistent

Scikit-learn provides numerous libraries and methods for doing data transformations. Any Dataset transformations used when training a model, must also be used on test data or when deployed in production systems. For example, if you use their StandardScaler to scale the X train to fit your model, you…


Credit: Bruce Bennett / Staff

Former NFL coach Bill Parcells once wisely said — “you are what your record says you are”. Ultimately this is true. By the end of the NHL season, I think there is truly a large enough sample of games where every team deserves their place in the standings. But in short samples, sometimes teams may overperform or underperform their record. I want to take a look at some team level statistics and see what correlates highly with a team’s record (given the weirdness of the NHL point system where the loser point exists for OT losses, a team’s record =…


Credit: Bruce Bennett / Staff

In this post, I will demonstrate how to scrape data from tables on Natural Stat Trick, a website for getting standard and advanced NHL statistics. Natural Stat Trick is a reputable and frequently referenced data source for NHL analytics used by writers and fans alike.

First, using the Beautiful Soup library, I’ll go through the steps to scrape the html and convert the data to a Pandas DataFrame. Then, I’ll show a convenient one liner that works for tables on Natural Stat Trick and will work for tables on other webpages depending on their complexity. …

Gary Schwaeber

Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store