For instance, their engine manages content via records as opposed to flat files. So new changes can be incorporated easily without creating conflicts. All this tooling serves the additional purpose of making Bethesda's game highly moddable.
While their engine is a big reason for their success, the records vs flat files isn't really relevant - a file is just a record in the directory it is under. You'd get exactly the same functionality if for every record in the TES editor you had a file instead and used file extensions to group different object types. Modding would be as simple as replacing files or having the engine look into "mod" directory before looking into the "data" directory whenever the "Pants of Fire.armor" file was about to be loaded. Many engines already work like that - it is how they are using those records/files at a higher level that affect their moddability.
People are bring up the engine and I'm just not getting this as the logic behind why it doesn't happen. It's not an engine thing, people make games with giant sprawling environments all the time. If any developer has an engine that can handle their big open world take on GTA, they've got an engine that can handle some kind of take on what Bethesda does. Maybe they can't all handle the novelty of being able to pick up every cup in the game, but until the day these become VR games where you can open your bag and put whatever item you want in it, it's just a novelty that doesn't even matter.
The thing is, it is
exactly what you think as novelty that requires the functionality to be there to make a game like Morrowind, Oblivion, Skyrim, etc. The ability to pick up every cup has a ton of implications on the systems that support said cup picking.
Being able to pick up -and drop- every cup means:
- Every cup is an actual item, so the engine needs support for arbitrary items in the inventory.
- This implies a flexible inventory system that can contain any sort of item, not just predefined items like weapons or arbitrary limitations like being able to carry only a single weapon and a single set of armor
- It also requires an "item" to be more than a model - items may need 2D and 3D representation, some sort of name/description, have stats like weight (if there is a weird system) or dimensions (if there is a tetris-like inventory system), etc
- Arbitrary items, like cups, can be moved around, meaning that you can't just merge all static props into big meshes - a common optimization engines do that while allows for more detailed graphics, it also creates more static worlds (since you can't move shit around). You must be able to handle any cup -or other item- being placed at any position, rotation, etc in the world.
- Being able to drop the cup you picked up means that you need to be able to drop items from your inventory, which in turn means that you can introduce new items to the 3D world that they weren't previously there (picking up an item essentially removes it from the 3D world) - for simple rooms this might be simple, but this needs to work even if you do it across the entire overworld
- Big open worlds, be it seamless (gothic, elex), mostly seamless (morrowind, oblivion, skyrim) or with an abstracted overworld (fallout 1/2) allow you to move (almost) anywhere you want at any time, meaning that if you can pick up and drop a cup you can pick up and drop ANY object, including objects you may need.
- This implies that the game must keep track if where and how you dropped the item so that if you drop the Axe of Major Buttkicking that is necessary to complete a quest (main or not), you must be able to go back and pick it up right from where you dropped it (if this is something that should make sense to happen or not is a game design issue -e.g. you may actually want to punish someone for dropping the Axe in the middle of a square- but the engine should allow it regardless).
- And of course having this done just for said Axe makes no sense since it might also be needed for the Staff of Lightingbulbs or whatever else you may not even know it exists, so you need it for any item that can be placed in the world - cups included, since those are items too.
- However it also means that since those games do not keep the entire world state in memory but instead load in and out stuff in pieces as they are needed (be it via cells like in Bethesda's games, individual areas like in Fallout 1/2/ATOM, etc or object groups like in some other engines), they need some way to store persistent information for objects that are going to be unloaded to be kept around until they are loaded later when the player comes back.
- This also need to take into account savegame state - ie. persistent data are not to be mixed with savegame data and if stored on disk, they must not affect reloading a previous game or starting a new game.
- If any item, like a cup, can be placed around dynamically, the NPCs that move should be able to somehow handle them when they are trying to move - a cup shouldn't make an NPC stop in their tracks, the NPC should be able to work around (or jump over or kick them away) items that were placed in front of them.
Of course being able to pick up any cup is part of the puzzle and really only one aspect of having a more simulationist and dynamic approach to the game systems instead of relying on a static and scripted world. Many games go for the latter partly because it allows for a more controlled environment (which helps to avoid "jank") but also because it is simply both easier and faster (from a performance standpoint) to avoid too much dynamic and unpredictable elements.
But the same mindset that would enable the above also enables for things like how in Bethesda's games NPCs and the player have the same stats and working inventory, how you can pick pocket someone not only to take items from them but also put items in their pocket, for things like in Oblivion every NPC being able to pick up a weapon from the ground, meaning that you can drop weapons for NPCs to pick up.
Having a rendering engine that can render a big world isn't really the important -or even necessary- thing, it is surface level stuff. It is just what people first see in these games and what is easy to promote with videos, screenshots, etc. Being able to show something like you pickpocketing an NPC to take away his dagger, then placing an enchanted dagger next to him that damages the holder before you attack them and having the NPC pick it up to attack you while you run away, isn't something that you can really see in PR-oriented videos by developers that would rather wow the crowds at E3 with their perfectly rendered mountains or even trust to not be something specific and scripted for that particular video.
(BTW AFAIK the above isn't possible in Bethesda's games either because they dumbed down the enchanting in Oblivion though it would be possible if you could do negative effects in Oblivion or NPCs would pick up items in Morrowind, but in any case it is an example of something that could feasibly happen)