Final Project – Tableau Story

The following is a post that I have preserved in its original form. It was the final post that I created for a computer science class at Northwestern.

I created a Tableau Story to display the data that I gathered. The story should be visibly embedded below, but can also be found here. I chose to develop a “Factors” story so that we can see exactly what categories can affect a player’s market value, and which categories have a more significant or nuanced effect than others.

Assignment 4 – Tableau Visualization

The following is a post that I have preserved in its original form. It was the 5th of 6 posts that I created for a computer science class at Northwestern.

I loaded my data set into Tableau directly from Google BigQuery. I then generated two charts: a Bar Chart and a Pie Chart.

The generation of the Bar Chart was relatively an easy process because my Google Big Query categories were pre-loaded by Tableau. All I needed to do was drag one “Dimension” and one “Measure” into the Columns and Rows fields. After dragging the categories, Tableau auto-populated the graphs for me. The bar chart can be seen below.

Top100MarketValueByClub

I chose to sort the Top 100 Market Values by club. The club name went to the x axis and the market value on the y axis. This shows which clubs have the most value out of the Top 100 valued players according Transfmarkt. While I imagined that most of the value would be in the top 6 “Big Clubs”, I was surprised to see that there were another 6 clubs that made up the rest of the Top 100 market values.

The next chart was a little bit more difficult to generate. I knew that I wanted to use a different variable to experiment with, so I chose to sort Market Value by Position. In order to do this, I had to experiment for a good 45 minutes to an hour in order to figure out how to label the graph correctly. I ended up finding the magic formula below:

PieMarks

I use the Market Value as the dependent variable, and the Position as the independent variable. The chart is sorted from greatest to least value by position in counter-clockwise direction.

Top100MarketValueByPosition

The Pie Chart shows that, as explained in a previous post, Center Forwards carry the most value. Somewhat unsurprisingly, the next most valuable player group is the Center Back, presumably in order to prevent those valuable forwards from scoring.

After the Center Back, there are 5 midfield groups that cumulatively overwhelm the chart. While some classify Wingers as forwards, they tend to fit more snugly in the midfield than as forwards.

Finally the least valuable parts of the chart show defenders and (unmarked) goalkeepers.

Assignment 3 – Google Big Query

The following is a post that I have preserved in its original form. It was the 4th of 6 posts that I created for a computer science class at Northwestern.

After toying with data from Squawka, I soon realized that it would be much more time-efficient to try and find a larger and more sortable dataset on Kaggle. I found this one, which I then uploaded to Google BigQuery. I liked this dataset because I could easily discover more high-level data about Premier League players and their transfer value. While this dataset did not provide in-game performance data, it did reflect their value according to Transfermarkt, which is thought to be a leading player value analysis website.

Screen Shot Big Query

I ran the query and got a limited set of data (I used  “LIMIT 100”  to limit the query to 100 rows). In my query I limited my search to exclude Fantasy Premier League data, age category, new/foreign signing, and club ID number.

I removed the Fantasy Premier League data because it would require another level of explanation and analysis of how the Fantasy Premier League gets it numbers. While Transfermarkt also generate their own numbers, most transfer fees are all projections until they are actually paid out. These are wildly speculative. Whereas a Fantasy Premier League value only applies to that specific entertainment site.

I rid of the age category, club ID numbers, and new/foreign signing data because they were unnecessary/overcomplicated ways to sort this data. I am looking to simplify.

The query sorts the dataset by market value in descending order. According to the original post on Kaggle (linked above), the data that I did select shows:

name: Name of the player

club: Club of the player

age : Age of the player

position : The usual position on the pitch

position_cat :

  • 1 for attackers
  • 2 for midfielders
  • 3 for defenders
  • 4 for goalkeepers

market_value : As on transfermrkt.com on July 20th, 2017

page_views : Average daily Wikipedia page views from September 1, 2016 to May 1, 2017

region:

  • 1 for England
  • 2 for EU
  • 3 for Americas
  • 4 for Rest of World

nationality

big_club: Whether one of the Top 6 clubs

I exported my query to a Google Sheet that you can view here.

The results show that only 3 players in the Top 50 transfer values were not at a “Big Club”. In fact, this analysis lead me to realize that the data set itself is out of date, due to Virgil Van Dijk’s record-breaking signing for “Big Club” Liverpool. The results can also be used in future assignments to discover which players are desired most, and what their positions might be.