Despite their prevalence, numerical scores in reviews are frequently disparaged as useless or misleading, especially in the gaming community. In many cases these scores are seen as actively harmful for their ability to influence consumers in ways that do not reflect the actual quality of the title being reviewed. There are plenty of aspects of review scores that contribute to their poor reputation, including arbitrary score assignments, a lack of any explanation for a score, and an apparent mismatch between the final score and the written review. It seems to me that all of these are symptoms of the same problem: a lack of a definitive set of criteria for the scoring system. If the origin for the score were adequately explained and consistent across an individual’s reviews (or even reviews from multiple writers, if the criteria were shared), the numerical designation could much more effectively complement the written review. In attempt to make this appealing hypothetical a reality, I will propose a more thorough scoring system here.
The first step in creating a useful scoring system is diversifying the criteria for review. In most current systems, the only official criterion tends to be whether the game is “good” or not. These systems force the reviewer to condense complex opinions and multifaceted analyses directly into a single number, representing the game’s overall quality. For most people this is a herculean, and indeed impossible, task. Video games are an extraordinarily complex medium, with an arguably unparalleled number of artistic and technical aspects. To evaluate these aspects in a numerical manner properly, the obvious first course of action is to consider them separately. That will be the backbone of this new system: a set of criteria organized by the major aspects of a typical video game. The following ten categories are to be considered individually and assigned an appropriate score on a scale of 0 to 10. The significance of these numbers, as well as the best ways to incorporate the scoring system into reviews, will be discussed later in this article.
This is a more ambiguous category that is often overlooked in traditional scoring systems. “Ambition” is meant to represent how many risks a game takes with its core concept and design choices, relative to what is expected of it. Heavy deviation from the established standards of its genre or previous installments in its series should grant the highest score. These risks can be related to the story, gameplay, aesthetics, or any other artistic or design-oriented aspect of the game. Even if a game is so poorly executed that it is nearly unplayable, any attempt to create something new or special should be recognized and appropriately rewarded. A game should not need to break new ground with all of its design choices to earn an excellent score in this category. One or two outstandingly original ideas or admirable design efforts should warrant a perfect or near-perfect score for ambition. Some good examples of this principle are The Legend of Zelda: Breath of the Wild for attempting an absurdly expansive gameplay scale, or Nier: Automata for grappling with complex and difficult subject matter in its story. Neither of these games are completely ambitious with all of their design choices, but they take enough risks to have earned an excellent score in this category.
This is a fairly rote and uninteresting category to analyze, but it can be critically important nonetheless. “Performance” refers to how well the game runs, plain and simple. Are there numerous obvious or game-breaking glitches? Frequently noticeable texture pop-in? Frame rate drops? AI bugs? If the answer to any of those questions or any others like them is “yes,” then the game should lose points in this category. This aspect is unique in that it is the only category listed here that should earn a perfect score if it is never noticed. Performance issues are to be expected in every game though, so minor issues should have an appropriately small impact on this score. Additionally, it is important to keep in mind that this category will vary based on the platform on which the reviewer plays the game. It is common practice to indicate which platform was used to play a game in a review, so this category should already pair seamlessly with the way reviews are typically written.
This category is fairly straightforward, it simply represents the quality of the visual aspects of the game. This includes not only the visual design, but how well that design is brought to life in the game’s world. Interesting, expertly designed characters and environments will add to this score, while poorly crafted models or sprites will detract from it. A recent example of a game that highlights this difference is Monster Hunter Generations Ultimate on the Switch. The game is exquisitely designed and beautiful in its own right, but suffers from low-poly models and limited textures that make it a bit of an eyesore nonetheless. Subtleties like proper use of color and a consistent visual theme should obviously be considered here as well, along with anything else that adds to the visual appeal of the game.
Another straightforward category. This will refer to any auditory factors, including a game’s soundtrack, sound effects, and voice acting (when present). This category should shine a spotlight on some aspects of game design that are commonly overlooked, like sound effects in a menu or how well the characters’ voices are mixed with the background music.
This category is where this scoring system is going to start making some less conventional distinctions. “Gameplay” will represent the systems with which the player interacts with the playable characters. This includes basic controls, character progression, combat systems, dialogue trees, menus, etc. Anything that gives the player a different option for playing the game should be considered gameplay. This category should be considered separately (whenever possible) from the “level design” of the game, to be defined next. Points should be awarded in this category for well designed gameplay systems that interact with each other in logical ways, and that create an intuitive and enjoyable experience. Points should be detracted for systems that obviously do not work the way they were intended to, are fundamentally uninteresting or tedious to engage with, or do not function properly with each other in a cohesive manner. It should be noted that originality in both gameplay and level design (as well as in all other categories) will likely be covered by the “Ambition” category, so scores for these categories should be largely based on the functionality and enjoyability of these design choices. With that said, if the gameplay systems are so derivative that their quality is lessened by their oversaturation or monotony in the industry, then points can be detracted as deemed appropriate by the reviewer.
A counterpart to the gameplay category, here “Level Design” will describe how the game world interacts with the playable characters. This includes environment design, hazards, enemy placement and AI, and anything else over which the player does not have direct control. Level design covers obstructions that the player must react to, gameplay covers the tools with which the player can react. Since level design is reactionary, part of its assessment should involve how well it is tailored to the gameplay systems. Questions like “Are the enemies fun and challenging to fight with the playable characters’ attacks,” “Does the environment design complement the movement systems,” and “Does the game world create a consistent sense of flow or pace when it is traversed,” can help with this analysis. Points for this category should be awarded for this kind of synergy between the level design and the gameplay, as well as more generally engaging and interesting design.
Telling a story in a video game can often be as complicated and involved as designing its gameplay systems, so this scoring system will contain two aspects related to the writing in a game to adequately represent that complexity. “Writing” will refer explicitly to the written portions of the game, and will not consider how those written portions are presented to the player. This category will be an assessment of the moment-to-moment quality of the writing, as opposed to its structure or implementation into the medium. Factors that can increase the score of this category could include likable, multi-dimensional characters with coherent, believable arcs; an engaging, unpredictable plotline; and clever, purposeful dialogue both during and outside of story sequences. Even if the game contains excessively long cutscenes, relies too heavily on random information drops like audio or text logs, or the writing suffers structurally due to the gameplay systems, points should not be deducted from this category unless the writing in those scenarios is of poor quality. Those factors should instead be considered in the next category.
This category marks a critical distinction for the evaluation of a video game’s quality. Unlike movies, TV shows, comics, books, or most other storytelling media, games almost universally cannot tell stories at their own pace. Player input is required for the game to progress, making the method of presentation of the story just as important as the story itself. This category will include all factors related to combining the written and gameplay elements of the game, as well as general structure and pacing for the story as a whole. Frequency and length of cutscenes, use of dialogue trees, accessibility of crucial story elements like audio logs or item descriptions, and implementation of the chosen plot structure are all elements to consider for this category. It should be noted that synergy between storytelling and gameplay/level design elements can partially contribute to the score for this category, but will also be considered in the next category, “Immersion.”
Some good examples for the distinction between writing and storytelling are The Legend of Zelda: Breath of the Wild (again) and Horizon: Zero Dawn. Both are open-world, action-adventure games with post-apocalyptic settings and survival and crafting elements, but they each take a radically different approach to both writing and storytelling. Horizon: Zero Dawn has a much more unique and complicated story, with an engaging, mysterious narrative and complex, fleshed out characters. Breath of the Wild tells a much simpler tale, with more obvious plot developments and less developed (but still very interesting) characters. Based on this analysis, Horizon has better writing than Breath of the Wild and should earn a higher score in that category. However, Horizon’s storytelling heavily relies on long, meandering conversations between characters (all with identical camera angles), heavy information dumps in the form of audio or text logs, and a plot structure that is restricted from progression outside of central story missions. Breath of the Wild, by comparison and on its own merit, is much more innovative in the presentation of its story. Plot developments are presented in a nonlinear fashion, cutscenes are short and treated as rewards for exploration rather than breaks from it, and story elements are strategically placed at large but manageable distances from each other to maintain a consistent pace. Since these storytelling elements better suit the open-world gameplay, Breath of the Wild would earn a better score in the storytelling category.
This category refers to the ability of a game to create a believable world into which the player can project themselves. Since this can only be accomplished through excellent synergy between all of a game’s elements, “Immersion” should be treated as an amalgamation of all of the preceding categories. Are the audiovisual elements of the game thematically appropriate in the context of the story? Does the level design allow for natural and balanced use of gameplay elements? Is the ambition of the game’s premise executed well in each of its aspects? All of these qualities contribute to the believability of the game world, and greatly enhance the overall experience. No one aspect of a game’s design can fully immerse a player, and even individually excellent design in every aspect can result in a disjointed mess. Points should be awarded for design choices that complement each other across the first eight scoring categories, and should be detracted when any aspect of the game feels inappropriate or unbalanced in the context of the entire game. Since this category is inherently complicated and difficult to analyze, special care should be taken to justify the assigned score. Score justification on both a categorical and broad level will be discussed below.
This final category is meant to be an intuitive evaluation of how “good” the reviewer considers the game to be. The “General Appeal” category is essentially what most current scoring systems are in their entirety: arbitrary, but still useful in its own right. There are no specific qualities, aspects, or examples that can be used to describe the scoring procedure for this category. This number should come naturally at the discretion of the reviewer, though it should still originate from the same scale that was used for every preceding category. No attempt should be made to quantify anything here; if the game “feels” like an 8 out of 10, then it should earn an 8 for general appeal.
Separating the scoring system into these ten categories already grants significantly more meaning to the final score, simply by forcing the reviewer to consider the game’s design on a more intricate level. However, another large issue that many scoring systems suffer from is score inflation. Frequently, a 6 out of 10 is considered a bad score, and a 7 out of 10 is considered an average one. Any score below a 5 is explicitly reserved for games of appallingly abysmal quality, with no redeeming features or enjoyment whatsoever. This is, obviously, an incredibly imbalanced and wasteful system. If review scores are to be more meaningful, then the significance of the resulting numbers needs to be reassigned.
The (seemingly) natural solution to this issue would be to assign a specific meaning to all ten numbers in order to force the reviewer to abide by a more useful scale. Despite this solution’s simplicity, it would be a cumbersome and unintuitive method for reassigning value to the scores. The reviewer should not have to consult or memorize a list of vague qualifiers every time they need to decide on a score. Instead, a generalized meaning will be assigned to key values, with the obvious choices being 0, 5, and 10, allowing the reviewer to use their own discretion for deciding where a category’s score should lie relative to those numbers.
A score of 0 should be reserved for a category in a game that either has absolutely no redeeming or positive qualities, or whose positives have no impact whatsoever on the category as a whole. The former is an unlikely occurrence, but the latter is a situation that could actually come up in especially low-quality games. It is important to remember that every number in the scoring system should be possible to achieve. A good rule-of-thumb on a personal level is to think of the game that you would score the lowest in a given category. If that game does not earn a 0 in that category, then your numerical assessment needs to be adjusted. A 0 should indicate something beyond redemption, even if some middling or arbitrary positive qualities exist.
A score of 5, logically, should serve as the midpoint of the scoring system. A category of a game should earn a 5 when every aspect of it is decidedly average; when there is nothing that can be called “wrong” or “bad,” but nothing noteworthy or praiseworthy either. A 5 can also apply if the positive aspects of a category are roughly as numerous and impactful as the negative ones. Put in the simplest terms, a 5 should be assigned when a category on the whole cannot be definitively assessed as either good or bad. This should be a fairly common score, unless an individual reviewer is only selecting notoriously terrible or notably exceptional games to review. Again, personally, if few games are receiving scores of or near 5, your definition of “average” may need to be adjusted.
Finally, a score of 10 should be reserved for games with perfect or nearly perfect aspects. Similar to a score of 0, this score should be attainable but rare. A category should earn a 10 if its flaws are so minute, unnoticeable, or unimportant that they do not detract from the overall experience in any way. It is important to remember that in the context of a scoring system, “flaws” should not just be aspects that actively detract from the game. Unimpressive mediocrity is a flaw in and of itself, even if nothing is found to be actively wrong (as detailed in the preceding paragraph). A game should only earn a 10 in a category if it has crafted something truly exceptional, at least in part. As a reminder, “ambition” is its own category in this system. If a category is executed perfectly and exceptionally in a manner that is fairly risk- or innovation-free, it may still warrant a perfect score. Risk-averse or “safe” design decisions should only lose points in the ambition category, to avoid punishing a game twice for a single aspect of its design.
These definitions for scores of 0, 5, and 10 are meant to serve as relative guidelines for assigning scores. How they are used is up to the reviewer. As an example, they can be used as starting points for deciding how far above or below average a particular category is, or how many steps it is away from perfection. The exact definitions should not be the focus of the scoring system though, the goal is simply to broaden the range of scores that games can receive to give the numbers more meaning. Ideally, the frequencies of a reviewer’s score assignments should resemble a bell curve. Middling scores of 4, 5, and 6 should make up the majority of the reviewer’s assignments, while scores of 0 and 10 should be by far the rarest. Even with these guidelines this process will vary from reviewer to reviewer. As long as the scores are diversified more than they are in current review systems, the goal will have been achieved.
Each category within this scoring system requires an adequate explanation for its score. There should be a logical connection between the score awarded and the prose in the review discussing the category. With that said, adding a detailed explanation for each category to an already full-length review would result in an excessively long and difficult to read article. The solution is a simple one: each score simply needs to be specifically justified within the text of the review. Reviewers can naturally include considerations for each category as they are writing their article (and indeed already do in most cases), so excessive appended justification of review scores is unnecessary. It should be noted that the scores do not need to be referenced outright, but their reasoning should be made obvious by the opinions presented in the review.
With that said, some explicit explanation of the scores will make the reasoning behind them much clearer, and would reduce the disconnect between review and score that is so common in existing systems. As such, short sentences or a bulleted list should be included with every score to clarify the connection between their evaluation of the game and the score assignment. This is a particularly appealing solution if the score is intended to be something of a summary of the review, for readers that lack the time to read it in full. These summaries do not necessarily add any additional intrinsic meaning to the scores, but can increase their usefulness to less astute readers.
Since the goal of this system is to replace the methods with which review scores are normally assigned, it should attempt to produce one number to represent the overall quality of the game. This can be done extremely simply: the scores of the ten categories can be added together, resulting in a final score on a scale of 0 to 100. If a final score out of 10 is desired, the sum can be divided by ten and be presented with a decimal (the way IGN’s scores are currently displayed).
There are inherent issues with this system, some of which will be discussed later, but the most glaring flaw is the inconsistent relevance of several of the categories. Most notably, not all games have a strong focus on writing and storytelling. Video games are not solely used as a medium for weaving narratives, they are also frequently used as more visceral elicitors of exhilarating recreation or calm relaxation. As an example, Donkey Kong Country: Tropical Freeze is an absolute masterpiece of game design, from its tight controls to its incredibly detailed environments. However, it contains essentially no plot or writing whatsoever. It seems unfair that such a high-quality game would lose nearly 20 potential points off of its final score for neglecting to insert a complicated narrative. In fact, in this case a lack of cutscenes and dialogue actually improves the experience. With that said, it remains unfair to simply give Tropical Freeze acceptable or even average scores in writing or storytelling when other games put so much care and effort into meticulously crafting both.
If score aggregation is deemed important enough to ignore this major flaw, then the unfairness of the system will unfortunately have to be ignored. Games that neglect to focus on any of the aspects defined by this system will receive poor scores in the applicable categories, resulting in a lower final aggregate score. An important action to mitigate the issue is to qualify these low scores as clearly as possible. If the limited focus on certain categories does not detract from the quality of the game as a whole, that sentiment should be made explicitly clear to the reader.
An alternate course of action would be to simply refrain from score aggregation entirely. The scoring categories speak for themselves, and will produce the desired result of a numerical representation of the reviewer’s opinion. This is an incredibly important point for this scoring system: score aggregation is optional. If the reviewer determines that their opinion would be better represented by ten separate numbers, forgoing the final summation is completely acceptable.
Lack of balance in score aggregation is certainly not the only flaw in this system. A more intrinsic and unavoidable flaw is the implied objectivity of review scores despite their subjective origins. As with all scoring systems this ois, first and foremost, a method for conveying a reviewer’s opinion. No matter how rigorous the system, this inherent aspect of review scores will cause fluctuations in a score’s “accuracy” or meaning. No two reviewers will have identical methods for assigning scores, and even an individual’s scoring strategy may change with their mood or the inexorable flow of time. This is not an issue that can be perfectly addressed in any scoring system. If review scores are to be used at all, they must always come with the acknowledgement that they carry a level of subjectivity to them even in the context of representing an opinion. This system is not an attempt to create a perfect scoring system, but is rather intended to provide a more meaningful and useful option for generating review scores.
A flaw more specific to this particular system is the equal weight of each scoring category. For example, at least numerically, this system treats sound design and gameplay as equally important. Both are scored out of 10 and presented on equal ground. More importantly, all the categories have an equal contribution to the aggregate score. This means that games that would receive excellent scores in important categories like ambition, gameplay, level design can still receive average or poor aggregate scores if their quality in other aspects is subpar. There are definitely arguments to be made that some of the categories in this system are more critical than others in an assessment of a game’s quality, and therefore deserve more weight in both individual presentation and aggregation. The issue is the lack of consensus on the subject. No two individuals would weight these categories in the same way, so forcing a particular emphasis would decrease the range of usefulness of the system. The potential value of a weighted scoring system should not be ruled out, but its development would require a large collaboration between a variety of experienced critics for it to hold any merit. Even then, there would still likely be a substantial number of dissenting opinions regarding the assigned weights. As such, no attempt was made to weight any categories in this system differently.
These seem to be the more obvious flaws for this proposed system, at least in my eyes. There are guaranteed to be plenty that I have not considered. Please provide feedback on this proposal in the comments, so that it can hopefully be improved into something universally usable.
To summarize my proposal for an improved video game review scoring system, the main points are presented here:
· The score will be divided into ten categories, each individually scored on a scale of 1 to 10.
· These categories will be considered as defined in this article, and include Ambition, Performance, Art Direction, Sound Design, Gameplay, Level Design, Writing, Storytelling, Immersion, and General Appeal.
· The assigned scores should resemble a bell curve, with exceedingly low or high scores being appropriately rare. Generalized guidelines for score assignment are provided in this article.
· Scores will be adequately justified within the text of the review. In addition to a full in-text justification, short explanations for the reasoning behind each score will be given along with the scores themselves.
· Score aggregation will be accomplished through a simple summation of the ten category scores. Aggregation is optional, and can be omitted if the reviewer deems it too misleading or counterproductive.
I intend to implement this scoring system into my own reviews in the near future. If you like the system, feel free to use it! As I mentioned above, please provide feedback and constructive criticism for improvement. If I get enough good ideas, I will post an edited version.