Putting the 'role' back in role-playing games since 2002.
Donate to Codex
Good Old Games
  • Welcome to rpgcodex.net, a site dedicated to discussing computer based role-playing games in a free and open fashion. We're less strict than other forums, but please refer to the rules.

    "This message is awaiting moderator approval": All new users must pass through our moderation queue before they will be able to post normally. Until your account has "passed" your posts will only be visible to yourself (and moderators) until they are approved. Give us a week to get around to approving / deleting / ignoring your mundane opinion on crap before hassling us about it. Once you have passed the moderation period (think of it as a test), you will be able to post normally, just like all the other retards.

Linguistic comparison of IE games with Nu-Games

Bester

⚰️☠️⚱️
Patron
Vatnik
Joined
Sep 28, 2014
Messages
9,094
Location
USSR
To begin the comparison, I've extracted the NPC lines from the following games: BG1, BG2, IWD1, IWD2, PST, POE1, NUMENERA and TYRANNY.
I've discarding the PC lines, because it would be unfair to the comparison - the player is not the story teller, he's often brief and to the point.

I've then lemmatized the resulting words, meaning I've extracted their lemmas, which is the form in which a word would appear in a dictionary. This is not a perfect process. The alternative was Porter or Snowball stemming, which are also imperfect.
After lemmatization, a sentence "Harry was in a better shape" will be turned into "Harry be in a good shape".

The goal is to compare unique words (disregarding their varying forms) with TOTAL words, hence the procedure.

Since these games are of a different length, I've decided to cut the comparison at a certain length which many of these games reach. IWDs and BG1 turned out too short to participate. POE1 barely made it.

The results are, unsurprisingly, that Tyranny has the poorest vocabulary of all games, separated by a significant amount. Note that the Y-axis is exponential. It's fair to compare results exponentially, because a 10% difference is exactly what it takes for the text to become livelier and richer. Intersperse every 10th word with a rarer one and you get a smarter text. Tyranny is lagging behind by OVER 10%. It falls behind after 150k total words and never catches up. The difference only grows.

84381a7ed8f2d24357e64b8bee5da36b.png





For another comparison, I've extracted the most used words that are over 4 characters long.

618063f49dfd166b166dc88cc2f48d9a.png


As you can see, all games are very subtle about their vernacular. PST is subtly charming with the heavy usage of "cutter", conveying the exotic setting with a multitude of other words that never repeat themselves. IWDs and NUMA emphase their most important toponym. BG1, 2 and even POE1 (yes, even POE1 despite its fampyrs) never verbally assault you with their unconventional world of magic and monsters.

None of this can be said about TYRANNY which jumps out of the bush and rapes you with the MOST USED WORD fatebinder, followed closely by KYROS, ARCHON, CHORUS, DISFAVORED, BEASTWOMEN and many more. The writers don't even attempt to convey ideas elegantly. They've created a glossary and clumsily handle the narration by dropping the same words on your head over and over again.

It's always been the author's opinion that Tyranny is an affront to any sane person's sensibilities. It has been proven by the numbers now. You can't like Tyranny and claim to be a respectable gentleman. Only a scullion could enjoy something like that.
 
Last edited:

Tigranes

Arcane
Joined
Jan 8, 2009
Messages
10,227
None of this can be said about TYRANNY which jumps out of the bush and rapes you with the MOST USED WORD fatebinder, followed closely by KYROS, ARCHON, CHORUS, DISFAVORED, BEASTWOMEN and many more. The writers don't even attempt to convey ideas elegantly. They've created a glossary and clumsily handle the narration by dropping the same words on your head over and over again.

But POE1 is a curiosity here. Given the common criticisms, we would imagine that POE1 and Tyranny are similar with each other, and dissimilar with IE-games. But apparently this is not the case?

One possibility here is that POE1 does feature a high quantity of 'vernacular', but it comes in the form of 800 different words, rather than a single one like 'cutter' dominating. That would require a secondary analysis to confirm one way or another, to see what is the overall proportion of 'jargon'. Without that, we can't yet conclude that Tyranny is guilty of jargon-dumping while all other games (including POE1) do not.
 

Dodo1610

Magister
Joined
May 3, 2018
Messages
2,050
Location
Germany
Tyranny had this hyperlink feature where you could mouse over specific terms and you would get a popup explanation for it. I assume that that's why they kept using some terms over an over again

Though I am surprised holowborn or Engwithan isn't on POE list.
 

Bester

⚰️☠️⚱️
Patron
Vatnik
Joined
Sep 28, 2014
Messages
9,094
Location
USSR
Though I am surprised holowborn or Engwithan isn't on POE list.
Hollowborn top 336 most used word with just 83 usages.
Engwithan top 351 with 80 usages.

They didn't overdo it. I couldn't even remember those words before you mentioned them.

Also voices, and disfavoured in TYR. is that normal word, or is there specific uncommon meaning to them in Tyr?
"Voices of Nerat" is some dude and Disfavored is a faction.

1.Note: Bloom word in Numa is a specific usage there, or just normal meaning?
A place.
 

Bester

⚰️☠️⚱️
Patron
Vatnik
Joined
Sep 28, 2014
Messages
9,094
Location
USSR
One possibility here is that POE1 does feature a high quantity of 'vernacular', but it comes in the form of 800 different words, rather than a single one like 'cutter' dominating. That would require a secondary analysis to confirm one way or another, to see what is the overall proportion of 'jargon'. Without that, we can't yet conclude that Tyranny is guilty of jargon-dumping while all other games (including POE1) do not.
You are right, but I'm not doing it.

While it's possible to discard all words that exist in dictionaries, we'll be left with a lot of stuff like this from Torment for example (the undead language), and then it'll be a matter of manually going through all of this. It's not a particular case, there's a lot of stuff like this in these games, so I'd rather not.
741df000a2199b9b2764050f336671d9.png
 

Pegultagol

Erudite
Joined
Feb 4, 2005
Messages
1,176
Location
General Gaming
I would be interested to know if there's any appreciable difference between POE1 and POE2. And if the number of writers involved in the project affect the scope of common lemmas in the dialogue.

It was nonetheless a very interesting read. Thanks Bester.
 

AdolfSatan

Arcane
Joined
Dec 27, 2017
Messages
1,646
And if the number of writers involved in the project affect the scope of common lemmas in the dialogue.
It shouldn't as long as there's a good lead writer in charge of the team. So yes, it probably does.

Interesting idea for a thread, but it feels a bit half-baked. Neither lack nor excess in verbosity implies good quality on itself, it'd be interesting to see it followed by some other criteria of analysis. Otoh it's kind of pointless since a quick glance is enough to realize which games have shit writing.

Any chance you could add other games to the list? It'd be interesting to see what the graph looks like for BaK, for example.
 

Bester

⚰️☠️⚱️
Patron
Vatnik
Joined
Sep 28, 2014
Messages
9,094
Location
USSR
Neither lack nor excess in verbosity implies good quality on itself, it'd be interesting to see it followed by some other criteria of analysis.
Propose a criterion. If I think it makes sense, I may run it.

I was thinking of how the Chinese and the Japanese have categorized their words by age at which these words must be learned at school, because each year they need to learn specific new hieroglyphs. So if I was analyzing a Japanese test, I'd be able to tell the age of the audience the text was aimed at. But I don't think there are corpora (linguistic resources, such as dictionaries) that have assigned a rarity value to words in English. At least I couldn't find it after a few minutes of googling.

It's possible to find the average word length just for fun, but it's probably not a serious criterion for anything. Do you have anything in mind?
 
Last edited:

ERYFKRAD

Barbarian
Patron
Joined
Sep 25, 2012
Messages
25,882
Strap Yourselves In Serpent in the Staglands Shadorwun: Hong Kong Pillars of Eternity 2: Deadfire Steve gets a Kidney but I don't even get a tag. Pathfinder: Wrath I'm very into cock and ball torture I helped put crap in Monomyth
Neither lack nor excess in verbosity implies good quality on itself, it'd be interesting to see it followed by some other criteria of analysis.
Propose a criteria. If I think it makes sense, I may run it.

I was thinking of how the Chinese and the Japanese have categorized their words by age at which these words must be learned at school, because each year they need to learn specific new hieroglyphs. So if I was analyzing a Japanese test, I'd be able to tell the age of the audience the text was aimed at. But I don't think there are corpora (linguistic resources, such as dictionaries) that have assigned a rarity value to words in English. At least I couldn't find it after a few minutes of googling.

It's possible to find the average word length just for fun, but it's probably not a serious criterion for anything. Do you have anything in mind?
Think you can add New Vegas and AoD to the mix?
 

Zibniyat

Arcane
Joined
Jun 22, 2014
Messages
6,288
2. Also voices, and disfavoured in TYR. is that normal word, or is there specific uncommon meaning to them in Tyr?

Fatebinder, Archon(s) Chorus, Kyros, edict(s), Tiers etc. are all unique terms for either certain persons (for example Voices of Nerat, hence voices), locations (Tiers), or factions (the Disfavoured).

Yes, they do repeat often, but the game presents a conflict between factions, with the player being simply part of it, hence the continued use of these same unique terms.
 

Tigranes

Arcane
Joined
Jan 8, 2009
Messages
10,227
One possibility here is that POE1 does feature a high quantity of 'vernacular', but it comes in the form of 800 different words, rather than a single one like 'cutter' dominating. That would require a secondary analysis to confirm one way or another, to see what is the overall proportion of 'jargon'. Without that, we can't yet conclude that Tyranny is guilty of jargon-dumping while all other games (including POE1) do not.
You are right, but I'm not doing it.

While it's possible to discard all words that exist in dictionaries, we'll be left with a lot of stuff like this from Torment for example (the undead language), and then it'll be a matter of manually going through all of this. It's not a particular case, there's a lot of stuff like this in these games, so I'd rather not.

Yeah, I understand why you wouldn't want to bother. It just means that we can't really answer that particular question of whether Tyranny is more jargon-abusive than POE1 or IE games. What we do know is that Tyranny is very focused in its jargoning - which is interesting!

(Another side effect of this issue: it's surprising that Sarevok is up there but not Bhaalspawn in either game. And maybe that's because it gets split into "Child of Bhaal", etc.)

I don't know what the order of the data was when you cut it, but we don't know if that affected the results either, e.g. a game being more/less jargony at the start as opposed to later.
 

AdolfSatan

Arcane
Joined
Dec 27, 2017
Messages
1,646
Propose a criterion. If I think it makes sense, I may run it.
I hadn't thought of anything in concrete, but I'll throw some ideas. No idea how feasible they are.

How about an algorithm that pulls out entire sentences and categorizes the words within, replacing them with their type (noun, pronoun, verb, adverb, adjective, etc.). You can either sum and dump that info, or process it again to get data on how the average sentence is constructed. Once more, cold data isn't everything since good taste may be found both on sparse and overflowing prose, but it'd be an interesting take.

On that same page, you could make data dumps for lemmas on specific categories as well.

Another thing you can do is count the amount of adjectives and pool it against lazy words (very, little, rather, quite, pretty, a lot, more, etc.). This should account on how rich/poor in adjectives the vocabulary is.

Flow and pacing is important too, how about pulling paragraphs and getting info on how many words and sentences form them? You can then average the sentence/paragraph length, or expand it further by setting length categories for sentences (say, 3 levels) and analyzing their distribution inside paragraphs. I have no idea how to average that last one though.

And one that's lazier to program but demands crossing data online might be drawing a set amount of random samples from each game, sending it to grammarly (or any other site) and getting a count for how many errors it returns and on which categories.
 

CappenVarra

phase-based phantasmist
Patron
Joined
Mar 14, 2011
Messages
2,912
Location
Ardamai
For another comparison, I've extracted the most used words that are over 4 characters long.

BG1 table
upload_2021-2-27_9-28-27.png
:hmmm:

excluding BG1 and IWD1 is a mistake, cut things off at a smaller threshold to include them or use relative counts instead of absolute ones or...
 

Funposter

Magister
Joined
Oct 19, 2018
Messages
1,678
Location
Australia
Neither lack nor excess in verbosity implies good quality on itself, it'd be interesting to see it followed by some other criteria of analysis.
Propose a criterion. If I think it makes sense, I may run it.

I was thinking of how the Chinese and the Japanese have categorized their words by age at which these words must be learned at school, because each year they need to learn specific new hieroglyphs. So if I was analyzing a Japanese test, I'd be able to tell the age of the audience the text was aimed at. But I don't think there are corpora (linguistic resources, such as dictionaries) that have assigned a rarity value to words in English. At least I couldn't find it after a few minutes of googling.

It's possible to find the average word length just for fun, but it's probably not a serious criterion for anything. Do you have anything in mind?

There's stuff like this floating around. Phrases such as "reading at an 8th grade level" etc. are relatively common in American television and writing when dealing with the subject of education, although I don't know if I've ever heard of attempts to do something similar in other English speaking countries such as the UK or Australia. There was also some talk about the educational level that US Presidents and presidential candidates spoke at back in 2015/6 due to Trump reportedly speaking at a "fourth grade level". In this example, it should however be noted that the educational level of Presidential speeches has been trending downward since the mid-19th century.
 

Rahdulan

Omnibus
Patron
Joined
Oct 26, 2012
Messages
4,888
For another comparison, I've extracted the most used words that are over 4 characters long.

618063f49dfd166b166dc88cc2f48d9a.png


As you can see, all games are very subtle about their vernacular. PST is subtly charming with the heavy usage of "cutter", conveying the exotic setting with a multitude of other words that never repeat themselves. IWDs and NUMA emphase their most important toponym. BG1, 2 and even POE1 (yes, even POE1 despite its fampyrs) never verbally assault you with their unconventional world of magic and monsters.


None of this can be said about TYRANNY which jumps out of the bush and rapes you with the MOST USED WORD fatebinder, followed closely by KYROS, ARCHON, CHORUS, DISFAVORED, BEASTWOMEN and many more. The writers don't even attempt to convey ideas elegantly. They've created a glossary and clumsily handle the narration by dropping the same words on your head over and over again.

It's always been the author's opinion that Tyranny is an affront to any sane person's sensibilities. It has been proven by the numbers now. You can't like Tyranny and claim to be a respectable gentleman. Only a scullion could enjoy something like that.

To be fair, it would be interesting to see how many of those words associated with Tyranny come from hyperlink text rather than from dialog proper. Those can be totally ignored by the player.

Damn, Dodo1610 beat me to it.
 

As an Amazon Associate, rpgcodex.net earns from qualifying purchases.
Top Bottom