The following is a post that I have preserved in its original form. It was the 4th of 6 posts that I created for a computer science class at Northwestern.
After toying with data from Squawka, I soon realized that it would be much more time-efficient to try and find a larger and more sortable dataset on Kaggle. I found this one, which I then uploaded to Google BigQuery. I liked this dataset because I could easily discover more high-level data about Premier League players and their transfer value. While this dataset did not provide in-game performance data, it did reflect their value according to Transfermarkt, which is thought to be a leading player value analysis website.
I ran the query and got a limited set of data (I used “LIMIT 100” to limit the query to 100 rows). In my query I limited my search to exclude Fantasy Premier League data, age category, new/foreign signing, and club ID number.
I removed the Fantasy Premier League data because it would require another level of explanation and analysis of how the Fantasy Premier League gets it numbers. While Transfermarkt also generate their own numbers, most transfer fees are all projections until they are actually paid out. These are wildly speculative. Whereas a Fantasy Premier League value only applies to that specific entertainment site.
I rid of the age category, club ID numbers, and new/foreign signing data because they were unnecessary/overcomplicated ways to sort this data. I am looking to simplify.
The query sorts the dataset by market value in descending order. According to the original post on Kaggle (linked above), the data that I did select shows:
name: Name of the player
club: Club of the player
age : Age of the player
position : The usual position on the pitch
position_cat :
- 1 for attackers
- 2 for midfielders
- 3 for defenders
- 4 for goalkeepers
market_value : As on transfermrkt.com on July 20th, 2017
page_views : Average daily Wikipedia page views from September 1, 2016 to May 1, 2017
region:
- 1 for England
- 2 for EU
- 3 for Americas
- 4 for Rest of World
nationality
big_club: Whether one of the Top 6 clubs
I exported my query to a Google Sheet that you can view here.
The results show that only 3 players in the Top 50 transfer values were not at a “Big Club”. In fact, this analysis lead me to realize that the data set itself is out of date, due to Virgil Van Dijk’s record-breaking signing for “Big Club” Liverpool. The results can also be used in future assignments to discover which players are desired most, and what their positions might be.