Duraframe300
Arcane
- Joined
- Dec 21, 2010
- Messages
- 6,395
Yuzu Progress Report
Progress Report December 2021
Written by GoldenX86 and Honghoa on January 11 2022
Yuz-ers! Welcome to the last progress report of 2021, released in 2022 because we still haven’t figured out how to travel back in time. December brought us improved kernel emulation, fixes for driver issues, improvements to input, rendering, overall stability, and more!
We keep trying with the time machine, but we’re running out of bananas to microwave and trash to fuel the Mr. Fusion. Okay, let’s get started!
PSA for NVIDIA users: Part 2
As mentioned two months ago, NVIDIA users have been experiencing issues when using GLSL due to the changes introduced by NVIDIA dropping support for Kepler cards in the 49X series of drivers.
We’re happy to announce that we have a set of workarounds implemented by epicboy that solve all known issues. These are already available for both Mainline and Early Access.
The root of the problem in NVIDIA’s drivers seems to be in negation of integer and floating point values, and bitwise conversions of input values.
On previous drivers, you could assign a value to a variable named x, then assign -x as the value to a new variable named y. y would be equal to -1 * x. New drivers ignore this negation entirely, resulting in random spontaneous fires, security breaches, too many dogs causing a Howl, and total chaos.
The workaround is to simply subtract the value from 0. In our example, y would get the value of 0 - x.
The bitwise conversion issue is more complex, but we talked about it in the past. Back in August, we mentioned how Intel had issues in Vulkan affecting Mario’s legendary moustache.
GetAttribute returns a float value, so a conversion is needed when working with integer values.
The same issue that affected Intel GPUs now happens here on the “greener” side, but inverted. When using instance_id, old drivers accepted a float to unsigned integer conversion without issue, and you could do this conversion multiple times without losing the correct value. The current drivers, on the other hand, can sometimes return zero.
Interpreting the value directly as unsigned integers now solves this issue in both GLSL and GLASM. Since this counts as an optimization, we now apply it to all APIs.
JuxtaposeJS
Back to the early days (Fire Emblem: Three Houses)
Please report if you find any issues, as there could be more broken games due to yet unknown driver bugs. On a similar note, more fixes should be coming to Vulkan too, if needed. One such issue solved itself, most likely NVIDIA fixed it on the latest drivers.
Other graphical fixes
Whenever a game played multiple videos at the same time, some of them would glitch and flicker. This happened because yuzu was limited to decoding a single video stream at a time. Having multiple videos running at the same time would cause the decoder to receive frames that were sent from different video sources, confusing the interpolation algorithm and causing the aforementioned problems. To prevent this issue from happening, vonchenplus implemented a temporary solution that gives each video stream their own video decoder, sending the correct frame data only to the correct decoder.
It still flickers, but that's the Chozo's fault (Metroid Dread)
Morph added the missing formats R16G16_UINT and ASTC_2D_8X5_UNORM to the Vulkan API, fixing the missing graphics on Immortals Fenyx Rising and making LEGO® CITY UNDERCOVER playable, respectively. (Please note that Immortals Fenyx Rising gets in game but has broken graphics at the moment).
I brick you not (LEGO® CITY UNDERCOVER)
Blinkhawk fixed a bug in the texture cache that was conveniently ignored by the AMD driver, but would cause Nvidia GPUs to crash when using the Vulkan API. This crash happened when blitting textures with different format types, something that points to a problem in the texture cache that will be addressed in a future PR.
Blinkhawk also updated the Vulkan headers to fix an extension and implemented logical operations. Both the extension and these logical operations are used by Vulkan to describe and process data, in order to compose the frames that will later be sent to the screen. This PR fixes the sand and shadow graphical problems in The Legend of Zelda: Skyward Sword, and also the shadow problems as seen in Xenoblade Chronicles 2.
JuxtaposeJS
When you invert the polarity of your HDR display (Xenoblade Chronicles 2)
epicboy took a look at the issues that affected games that made heavy use of sparse GPU memory, and made the changes necessary to mitigate the problem.
Sparse memory is a technique to store data non-contiguously, which is a fancy way to say that data is broken down into small blocks and only the relevant bits are loaded into memory. There was a bug in the code used to map this data into the memory, as the offsets needed to get the right address weren’t being included in the calculations. For the sake of precaution, he also added an extra guard that prevents modifying the memory address 0, as it is used as a placeholder to signal addresses that haven’t been loaded in yet.
These changes are meant to address (no pun intended) issues related to the GPU memory management, and hopefully alleviate some stability complications related to it. Notably, the crashes on titles developed with the UE4 engine (cough, True Goddess Reincarnation V or some such, cough). The devs are still investigating any other oddities surrounding this game, so stay tuned for more updates.
These changes mitigate memory-related problems but are not guaranteed to “fix” them completely (SHIN MEGAMI TENSEI V)
Users reported crashes when playing Sonic Colors Ultimate on AMD and Intel GPUs on Vulkan after the resolution scaler was introduced. epicboy quickly jumped in to intervene and save the Blue Hedgehog.
On the AMD side, Sonic suffers from ImageView issues, causing an invalid pointer dereference when the slot_images container of the texture cache is resized. This can happen even at native resolution. epicboy found that keeping a reference of the container resolves the issue.
Intel’s turn now. The Intel Vulkan Windows driver strongly follows the specification when dealing with image blits. Khronos defines that MSAA blits are not allowed, and while most drivers let this pass, Intel is being a good boy and crashes when trying to rescale MSAA textures. Leaving aside that using traditional antialiasing on a mobile device like the Switch is a crime against humanity (you don’t waste extremely limited bandwidth on traditional antialiasing), the issue is solved by rendering directly into the scaled image when rescaling by using the 3D pipeline. The performance cost is higher (integrated GPUs like most Intel ones also hate traditional antialiasing), but it’s a price to pay to avoid crashing or losing the scaling.
Colourful (Sonic Colors: Ultimate)
The texture cache has to handle several weird situations when dealing with rendering. One aspect of the process is overlaps, when different textures compete for the same video memory space. A bug in the texture cache’s logic was found when an overlap occurs over relatively big distances in GPU memory. An overflow could happen leading to a wrongly massive texture trying to be rendered, causing VRAM to fill up instantly, and leading yuzu to a crash. This issue was common in BRAVELY DEFAULT II. Thanks to epicboy, users no longer have to suffer this sudden crash.
BRAVELY DEFAULT II
Skyline framework: Part 2
itsmeft24 submitted a patch to implement the ProcessMemory and CodeMemory kernel SVCs (Supervisor Calls), which are some of the changes needed to support the Skyline framework for modding.
Part of the ongoing work includes adding support in yuzu for all tiers of subsdk. Games can use subsdk tiers from 0 to 8, with 9 being free. Skyline uses subsdk9 to operate, so jam1garner included support for the remaining two missing tiers in yuzu, 8 and 9.
There are still a couple of things that need to be implemented before it’s ready, but things are certainly getting closer to being completed.
You can check the current progress here.
Input changes
german77 has several fixes for us and some important new additions.
Let’s kick things off with a great new feature for handheld PC users, couch players, and anyone not wanting to reach all the way to their keyboard while playing: support for gamepad hotkeys.
You can customize them
With this, users can customize button macros. For example: access or exit fullscreen, unlock the framerate, pause/continue emulation, capture a screenshot (by default conveniently mapped to the capture button of the Nintendo controllers), close yuzu, and more!
Sorry about the bad quality
When certain games start, some internal testing is done to ensure that things are where they should be and respond with an acceptable delay. One of those tests involves rumble. Games prod the controllers with a low frequency rumble test, but sometimes, some games never stop and the controller continues to vibrate, depleting battery and making you doubt what the original intention of the developer was. german77 forces the rumble amplitude to zero after the test, stopping unwanted vibrations for these affected games.
VR games may use the gyroscope sensor on the Switch itself (not the controllers) to feed motion data. Previously, yuzu would only give partial data to the game, causing erratic movement of the game’s camera. german77 added all missing data, including the gyro sensor, to solve this issue.
german77 also added support for the SetNpadJoyAssignmentMode series of services, removing some spam from the logs. This change also adds support for dual Joy-Con pairs with a single Joy-Con connected, which is something that some games seem to do.
After the release of Project Kraken, the input rewrite, analog triggers were accidentally broken. A simple bug slipped by, causing them to only work when the joysticks were moved. Two lines of code were changed, and the issue was made no more.
german77 has also been working on making Ring Fit Adventure playable. While working on implementing support for the pressure ring accessory that the game requires, german77 also ended up making some global improvements.
One change that ended up benefiting all games is controller type validation, which ensures that the emulator can only accept controller types that the game supports, while discarding and disconnecting anything else.
A bug in the controller type validation code caused Captain Toad: Treasure Tracker to constantly spam the controller applet when trying to launch two-player mode. Well, not any more! Again thanks to german77.
Co-op treasure hunting, what else could you ask for? (Captain Toad: Treasure Tracker)
Flatpak fixes
Following up from our previous mention last month, liushuyu continues to fight against the weirdness of Flatpak.
NVDEC requirements are now more flexible, the CUDA libraries are no longer mandatory, without actually affecting CUDA decoding support. Also, FFmpeg requirements have been raised to version 4.3 and higher. This should enable native Vulkan video support later on when there is driver support for it.
With this, decoding crashes are solved when running Flatpak builds of yuzu.
liushuyu also solved an issue affecting the prevent sleep functionality on Flatpak. Implementing XDP’s Inhibit API solves the issue, preventing the display from turning off at the worst moment while playing.
Additionally, Flatpak builds are compiled with asserts enabled, meaning that the emulator will be stopped when an assertion fails or an out-of-bound access inside a vector is encountered. Appimage and regular Mainline/Early Access builds are shipped with asserts disabled.
While this usually isn’t an issue, Flatpak users reported crashes in Pokémon Sword & Shield when trying to set their uniform number. As it turns out, the on-screen keyboard (OSK) was performing an out-of-bounds access when calling the number pad. Morph pointed the OSK to the proper array and the crashing stopped.
Thank you RodrigoTR for the pic! (Pokémon Sword)
General changes and bugfixes
bunnei continues to work on the kernel rewrite, toiling away to increase the accuracy of our implementation.
This time, by simplifying a number of functions and polishing the tracking of resources, he introduced more changes to improve the threading and scheduling kernel routines. These changes increase yuzu’s parity with recent updates to the Nintendo Switch OS, and also fix a number of race conditions and crashes, such as the ones experienced in Pokémon Sword & Shield and Dead or Alive Xtreme 3 Scarlet.
bunnei also implemented SetMemoryPermission, and updated the implementation of SetHeapSize, which are SVCs used by the kernel to manage the memory resources.
Previously, SetHeapSize only supported setting the heap size and expanding it, which was good enough for most games. But since some titles (such as Donkey Kong Tropical Freeze) may shrink this size, the implementation was updated to allow games to change the heap as needed, making it more accurate.
Both these changes were validated with hardware tests, ensuring that they behave as expected.
While working on these changes, bunnei found a bug in the service used to retrieve information of the currently executing process. Correcting this behaviour allowed The Witcher 3: Wild Hunt to boot, although there are still plenty of graphical issues to fix on this title.
Blinkhawk also made a number of changes to the building process to enforce more link time optimizations, and improve the time needed to generate the PDB (Program Database) file, which contains debug information. If this mumbo-jumbo sounds confusing, the gist of it is that the process of building yuzu should produce more efficient code and smaller binaries now. But feel free to skip the following few paragraphs if you’re not interested in the specifics.
Roughly speaking, compiler optimizations work on a “local” level per object. This optimization step will inline some functions, merge loops, put calling and called functions close in memory for better caching, etc. But if a function defined in another file is called within the file, the compiler can’t perform these optimizations, as it doesn’t know what this external function does, or how to optimize it.
Link time optimizations, on the other hand, take into consideration all the functions in the project. The linker, thus, is able to perform the same optimizations as the compiler, but more efficiently, as it is aware of the contents of all the functions defined in the project. This comes at a price, since the process needs more memory and takes more time to finish, but it guarantees that the released binaries perform better.
Along with this work, we considered enforcing SSE4.2 support, improving performance but making yuzu incompatible with 12 year old CPUs like the Core 2 Duo and Phenom II or older. While the performance results were positive, we are still debating whether we should reduce CPU compatibility or not.
When you open yuzu, the emulator has to take some time to measure the RDTSC frequency, a way to measure the clock speed of the CPU. Due to a bit of bloat in the previous implementation, 3 full seconds were needed to complete the operation. Morph rewrote the whole function and now only 0.2 seconds (200 milliseconds) are needed to get results as accurate as before, considerably reducing the boot times of the emulator itself.
As previously stated, german77 continues to work towards making Ring Fit Adventure playable. He has stubbed the SetNpadCaptureButtonAssignment, ClearNpadCaptureButtonAssignment, ListAlarmSettings, and Initialize services, and added support for the notif:a service.
With all his changes, the current Early Access build (at the time of writing this article) can boot and play the first stage of the game!
Ring Fit Adventure
Tatsh added NSP and XCI file association to Linux. Thanks!
Tachi107 updated cubeb and removed now deprecated functions. Cleaner is always better, thanks!
heinermann fixed a crash that would occur when the emulation was paused and the window was out of focus. Thank you!
jbeich changed the building configuration so that VA-API, one of the video decoding APIs of Linux, is enabled on Unix systems, allowing the users who want to build targeting BSD or other Unix-based systems to use hardware acceleration for video decoding.
This is just one of several PRs jbeich wrote to help yuzu work on BSD systems, thank you for your contributions!
UI changes
The favourites row in yuzu’s game list was always expanded, even if the user collapsed it. epicboy added a persistent setting to remember the user preference between launches.
One of the most common issues users face is lack of Vulkan support on their PC. Not lack of hardware support, but instead missing software support caused by outdated GPU drivers or poorly coded/outdated Vulkan injections.
Our old error popup didn’t reflect this so your writer, with his total lack of coding skills, decided to improve it.
This is a complex issue and the main reason Vulkan is not yuzu’s default API. Users of old laptops with AMD and Intel integrated GPUs tend to use the driver shipped by either the laptop vendor or Windows Update. In both cases, those drivers are most likely years old (yuzu can run on AMD GPUs from 2012) and either lack Vulkan support at all, or only support a portion of what’s needed to run yuzu. Also, since laptops, by default, connect the display directly to the integrated GPU, that’s the first Vulkan driver that will be seen, so it’s critical to have the latest GPU driver installed even if your laptop has a dedicated NVIDIA GPU running the latest driver.
While telling AMD users to manually download and install updated drivers is a viable option and works as it should, in its infinite wisdom, Intel decided to block manual installation of its own official drivers if a custom laptop vendor driver is in use (those modified drivers are usually created to cheat on battery life metrics and/or to save money on cooling).
The only alternative in those cases is to manually download the ZIP version of the driver > unpack it > Launch the Device Manager > right-click the correct GPU in Display Adapters > select Update Driver Software… > select Browse my computer for driver software > select Let me pick from a list of device drivers on my computer > select Have Disk… > then finally browse to the folder where the driver was unpacked and select the iigd_dch.inf file. What a very intuitive and user-friendly way to update a GPU driver… great job Intel.
Here’s a video tutorial for those that prefer visual aid over our rambling. Just make sure to use the iigd_dch.inf file instead of the one shown in the slightly outdated video. Other mentioned optimizations on the video no longer apply.
With this easy job done, the Intel GPU gets full Vulkan support, runs at its intended performance, and has access to all the new features, fixes, and performance improvements that the driver developers worked on. The driver is also allowed to auto-update on new official releases.
Known software that uses broken Vulkan injectors are outdated screen recorders like Bandicam, Action!, and even OBS. We strongly recommend using an up-to-date OBS, the native encoders from the GPU vendor (Radeon ReLive and Geforce Experience), or the integrated Xbox Game Bar on Windows. Overwolf and GShade are also known to break Vulkan support, so we strongly recommend avoiding them.
Future projects
Project Gaia is progressing smoothly. Heads up, SSD users will notice improvements once it is released.
Blinkhawk informs us that Project Y.F.C. will be released in smaller chunks in order to push more progressive updates instead of delaying for a big release that would require more testing time. We want to get these updates in your hands as soon as possible! We continue to plan to add several GPU features that have been pending. Here’s a screenshot as an example:
Mario Golf: Super Rush
That’s all folks! Thank you for your attention, and we hope to see you next month!