So I saw some of the CN dataset flying around and I wanted to see how much more we could uncover using data exploration techniques.
Character Duo Usage
Chi-squared Test Residuals Without Main Diagonal
The above graph shows how often the characters are paired together considering their popularity. This means that the big red dot of Xingqiu+Ganyu is a very low pairing considering the popularity of both characters. A pairing having 0 means that it’s not distinctive and equal to the average. (For more information: http://www.sthda.com/english/wiki/chi-square-test-of-independence-in-r). We removed the main diagonal to focus more on the rest.
Row Scores Dim1 and Dim 2
There is a lot going on here, all you need to know is that we used an algorithm to plot characters. Characters that are close on the graph are probably often played together. (for more information: http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/113-ca-correspondence-analysis-in-r-essentials/)
We then performed clustering (telling the machine to group characters): (we removed Amber, Kaeya, Razor, and Keqing due to them being outliers)
The height indicates how “close” the characters or groups are. For example, Venti, Diona, Ganyu Mona are closer together than Bennett and Xingqiu.
Clustering was done using the scores of the first 14 dimensions of CA explaining 80% of the variance using the centroid method. More information: https://www.datacamp.com/community/tutorials/hierarchical-clustering-R.
We first built a contingency table where (x,y) is the number of teams with character X and character Y. The main diagonal (x,x) is set to be the number of times character X appears in the data (or the sum of the rest divided by 3). If someone wants the R code, they can DM me or comment.