Final Project – Tableau Story

The following is a post that I have preserved in its original form. It was the final post that I created for a computer science class at Northwestern.

I created a Tableau Story to display the data that I gathered. The story should be visibly embedded below, but can also be found here. I chose to develop a “Factors” story so that we can see exactly what categories can affect a player’s market value, and which categories have a more significant or nuanced effect than others.

Assignment 4 – Tableau Visualization

The following is a post that I have preserved in its original form. It was the 5th of 6 posts that I created for a computer science class at Northwestern.

I loaded my data set into Tableau directly from Google BigQuery. I then generated two charts: a Bar Chart and a Pie Chart.

The generation of the Bar Chart was relatively an easy process because my Google Big Query categories were pre-loaded by Tableau. All I needed to do was drag one “Dimension” and one “Measure” into the Columns and Rows fields. After dragging the categories, Tableau auto-populated the graphs for me. The bar chart can be seen below.

Top100MarketValueByClub

I chose to sort the Top 100 Market Values by club. The club name went to the x axis and the market value on the y axis. This shows which clubs have the most value out of the Top 100 valued players according Transfmarkt. While I imagined that most of the value would be in the top 6 “Big Clubs”, I was surprised to see that there were another 6 clubs that made up the rest of the Top 100 market values.

The next chart was a little bit more difficult to generate. I knew that I wanted to use a different variable to experiment with, so I chose to sort Market Value by Position. In order to do this, I had to experiment for a good 45 minutes to an hour in order to figure out how to label the graph correctly. I ended up finding the magic formula below:

PieMarks

I use the Market Value as the dependent variable, and the Position as the independent variable. The chart is sorted from greatest to least value by position in counter-clockwise direction.

Top100MarketValueByPosition

The Pie Chart shows that, as explained in a previous post, Center Forwards carry the most value. Somewhat unsurprisingly, the next most valuable player group is the Center Back, presumably in order to prevent those valuable forwards from scoring.

After the Center Back, there are 5 midfield groups that cumulatively overwhelm the chart. While some classify Wingers as forwards, they tend to fit more snugly in the midfield than as forwards.

Finally the least valuable parts of the chart show defenders and (unmarked) goalkeepers.

Assignment 3 – Google Big Query

The following is a post that I have preserved in its original form. It was the 4th of 6 posts that I created for a computer science class at Northwestern.

After toying with data from Squawka, I soon realized that it would be much more time-efficient to try and find a larger and more sortable dataset on Kaggle. I found this one, which I then uploaded to Google BigQuery. I liked this dataset because I could easily discover more high-level data about Premier League players and their transfer value. While this dataset did not provide in-game performance data, it did reflect their value according to Transfermarkt, which is thought to be a leading player value analysis website.

Screen Shot Big Query

I ran the query and got a limited set of data (I used  “LIMIT 100”  to limit the query to 100 rows). In my query I limited my search to exclude Fantasy Premier League data, age category, new/foreign signing, and club ID number.

I removed the Fantasy Premier League data because it would require another level of explanation and analysis of how the Fantasy Premier League gets it numbers. While Transfermarkt also generate their own numbers, most transfer fees are all projections until they are actually paid out. These are wildly speculative. Whereas a Fantasy Premier League value only applies to that specific entertainment site.

I rid of the age category, club ID numbers, and new/foreign signing data because they were unnecessary/overcomplicated ways to sort this data. I am looking to simplify.

The query sorts the dataset by market value in descending order. According to the original post on Kaggle (linked above), the data that I did select shows:

name: Name of the player

club: Club of the player

age : Age of the player

position : The usual position on the pitch

position_cat :

  • 1 for attackers
  • 2 for midfielders
  • 3 for defenders
  • 4 for goalkeepers

market_value : As on transfermrkt.com on July 20th, 2017

page_views : Average daily Wikipedia page views from September 1, 2016 to May 1, 2017

region:

  • 1 for England
  • 2 for EU
  • 3 for Americas
  • 4 for Rest of World

nationality

big_club: Whether one of the Top 6 clubs

I exported my query to a Google Sheet that you can view here.

The results show that only 3 players in the Top 50 transfer values were not at a “Big Club”. In fact, this analysis lead me to realize that the data set itself is out of date, due to Virgil Van Dijk’s record-breaking signing for “Big Club” Liverpool. The results can also be used in future assignments to discover which players are desired most, and what their positions might be.

Assignment 2 – Site Structure

The following is a post that I have preserved in its original form. It was the 3rd of 6 posts that I created for a computer science class at Northwestern.

In our 3rd and 4th classes we briefly discussed what it would be like to create a site structure for a professional sports team. While at first this was difficult, we soon learned that we could write out sentences that could help us divide up the website based on what we would want to know. For example, imagine an advertisement posted by the team:

“Manchester United’s next game will be against Brighton and Hove Albion on Friday, May 4th at 2:00PM CT at Falmer Stadium. United will be playing for 2nd place in the Premier League, so make sure to order your favorite player’s shirt now so that it arrives to you in time for kickoff!”

Given that this advertisement is directed a user that the club considers to be their own, we can unpack these two sentences to see what relevant information it contains, and whether it could be broken into separate categories. For emphasis I have highlighted unique pieces of information below:

“Manchester United’s next game will be against Brighton and Hove Albion on Friday, May 4th at 2:00PM CT at Falmer Stadium. United will be playing for 2nd place in the Premier League, so make sure to order your favorite player‘s shirt now so that it arrives to you in time for kickoff!”

Out of these two sentences, I can begin to list information that I should provide on the team’s website. Visitors to the website will want to see who the next opponent is. They will want to see what time they are playing, and whether the game is home or away. They will want to see up-to-date Premier League standings, and where the team falls in the standings. They will want to order a new jersey online, and they will probably want to see player profiles where they can learn more about individual members of the team.

Given the above, we could build out entire categories for Next Match, Standings, Shop and Player Profiles.

Assignment 1.1 – Preview

The following is a post that I have preserved in its original form. It was the 2nd of 6 posts that I created for a computer science class at Northwestern.

Casual passerbys trying to watch a soccer game struggle to find themselves engaged. More often than not this happens because of how rarely it seems that any action actually occurs in a match. People want to see goals. Score one or more goals than your opponent, and your team wins the game.

However, oftentimes teams will play for a full 90 minutes without scoring a goal at all. This is why teams spend so much money to bring in the right players to help score more goals. In order to figure out if they received a strong return on investment, teams need to be able to measure a player’s performance. Squawka’s 2016/17 Goals Scored table is a great example of how soccer data can be kept and organized.

Screen Shot 2018-04-29 at 11.46.59 PM

The site organizes allows the user to sort statistics in a few different ways. We can dissect the LATCH method in order to understand the data. Squawka allows the user to sort by Category – Games Played, Minutes Played, Right Footed Goals, Left Footed Goals, Headed Goals, Other Goals, Goals Inside the Area, Goals Outside the Area and Total Goals. It also allows us to sort Alphabetically – by player name. While the examples are not in this data set, some benefit could be found in sorting soccer data by Location (stadium or national region), Time (chronology of goals in a game) and Hierarchy (professional league performance vs minor league performance.

I hope to dive into those more specific methods in the coming weeks.

Assignment 1.0 – Teaser

The following is a post that I have preserved in its original form. It was the 1st of 6 posts that I created for a computer science class at Northwestern.

I was originally born England. My family lived there for six years before we moved to Chicago in the summer of 2000. While I didn’t live in England long enough to hold onto an accent, much less an understanding of how to play cricket, I did hold onto a love for the game of soccer.

The English Premier League is probably the sport’s most famous league. It produced superstars like David Beckham and Cristiano Ronaldo, and is home to teams like Manchester United and Liverpool. Even for people who are relatively unfamiliar with the sport, those names are generally recognizable. The Premier League is the epicenter of soccer.

Screen Shot 2018-04-29 at 11.17.27 PM

For this class I will be studying and dissecting information that is publicly accessible on Squawka, a “web-app that delivers you real-time data on the football match you are watching on TV”. While there is no evident “master” dataset, the service provides multiple smaller ones. I plan to use each of these in class.

To get the ball rolling (pun not intended), in my next post I will discuss an easy-to-understand dataset that makes sense to those with only basic soccer knowledge.