I sound like someone who actually implemented TTS in a few projects. Not in a game, but for that functionality, there is no difference.
Hell, one of them was actually both ways (speech to text), including speech analysis to translate to 3D mouth movements (how to move the animated mouth to fit what was said). And it still only took a week for two people.
Sorry, but any coder who cannot implement a simple feature in a short time is not worth their money.
Many game devs are terrible coders, though, I'll give you that...
Yes, simple. Just use some libraries, without properly checking their code, without proper testing. Great recipe for success in a multiplayer game.
The fact that it is a multiplayer game is entirely irrelevant for a functionality which produces audio from a string. Multiplayer/networking generally doubles most effort involved in development, but in this case it simply doesn't.
You either read an incoming chat string out loud or you don't, this affects the rest of the game in no way at all (other than maybe adding options for it).
Don't make this seem complicated when it really isn't. Maybe your boss is clueless and that guarantees your job, but don't treat everyone like that...
Also, if there would be any real need to check third-party code before using it, nobody would ever use professional third-party libs, as those are almost always closed source.
And even if it is open source, nobody reads through a library's code before using it. That's a waste of time.
You look at the API, examples and where it has been used so far.
Is the library used often in other projects without complaints? Yes? You're good, use it. Others already did the testing for you.
In the case of TTS, there are so many choices available that checking them out probably takes longer than implementing one. Which is kind of a luxury problem to have.