In light of the recent launch trailer, and it's resulting debate about the animation quality in Fallout 4, I wanted to share what insight I can about why Bethesda typically doesn't use technologies like motion capture and other advanced animation techniques like performance capture for most of it's facial animation.
Typically, in motion/performance captured cutscenes, the game has a more complex and sophisticated 3d model of the character which is dropped in for when cutscenes occur. These high detail versions of the character have extra polygons, crease textures for wrinkles, more 'bones' in the animation rig for expression etc etc. To accomodate these computationally taxing 'actors', the cutscenes use clever camera angles and only render what is visible on screen. Nothing extra is being computed. So the geralt that rides around and slays Drowners, is not the same geralt that you see in conversations because all of the game's systems are suspended while a cutscene occurs.
Games today still use illusions and visual trickery to achieve amazing visuals, and in-engine cutscenes are a great way of doing it. You simply couldn't run high detail actors that exist in cutscenes, on top of all the other systems a game typically runs while in a playing state.
Fallout doesn't use cutscenes. Instead it chooses to create a world where everything happens right there, in real time. There's no breaking away into a separate instance of the game (unless you count caves etc), so the game is able to use that computational overhead for other things instead.
The animations so far in Bethesda games have been procedural. Similar to mass effect and dragon age (though those games have less dynamic things happening than fallout and can have slightly more detail in certain areas as a result), the actor's faces are given preset mouth shapes and an animator will use those mouth shapes to make a mouth movement mapping to match a given voice performance. This is why it can look awkward sometimes, because this mapping has to work regardless of the face shape a character is given, because again, everything happens in real time.
No matter how advanced a game engine is, today's hardware can't run complex performance capture performances in real time without sacrifices being made elsewhere, or at the least some sort of workaround (like a cutscene). That's just a reality of game development. It's why I am always really impressed with what Bethesda puts out, because ultimately when it comes to rich, fully formed, dynamic game worlds, they're badasses. What they pull off is actually very impressive technically.
That's not to say they don't use performance capture at all, in fact dogmeat is largely performance captured, as is a lot of the physical movement of NPCs, but those animations still have to work within a complex environment.
I realize that this has probably been explained many times before, and more eloquently, but hey, sometimes a guy's gotta ramble! Hoping this information was informative. :3