Putting the 'role' back in role-playing games since 2002.
Donate to Codex
Good Old Games
  • Welcome to rpgcodex.net, a site dedicated to discussing computer based role-playing games in a free and open fashion. We're less strict than other forums, but please refer to the rules.

    "This message is awaiting moderator approval": All new users must pass through our moderation queue before they will be able to post normally. Until your account has "passed" your posts will only be visible to yourself (and moderators) until they are approved. Give us a week to get around to approving / deleting / ignoring your mundane opinion on crap before hassling us about it. Once you have passed the moderation period (think of it as a test), you will be able to post normally, just like all the other retards.

Bringing D&D/AD&D campaign settings to life with Stable Diffusion

Non-Edgy Gamer

Grand Dragon
Patron
Glory to Ukraine
Joined
Nov 6, 2020
Messages
17,656
Strap Yourselves In
But yeah, Increasingly Nervous Man is saying exactly that, because intelligence requires a cognitive model of the world, or at least the given task.
You don't need intelligence to draw images anymore than you need it to generate text. (Reddit proves that every day.)

You can give all sorts of examples of the AI failing to generate something ridiculous - like, say, a dog wearing VR goggles - but it doesn't change the fact that it already can and does generate useable or near-usable images and has been doing so for months.

You can say "AI won't work" until you're blue in the face, but the fact remains that it is working. Just like GPT is already being used by writers, social media companies and gamers, and just like game devs are already beginning to use AI voice actors.

A household robot that does the laundry but never puts the dog in the dryer is a great thing to aspire to but it does require intelligence.
Funny, but the Roomba not having intelligence hasn't stopped people from buying it. Almost like there are other ways to work around problems besides truly intelligent AIs.

The Roomba has no "mental model" or understanding about any thing in your house. It just knows the boundaries it's supposed to draw in and stops when it hits an object. No dogs vacuumed up yet.

These AIs can similarly be wrangled into doing what you want them to. Whether with sketches, training or clever prompts. All without anything capable of passing a Turing Test.

This thread really isn't the best place for this discussion though.
 
Last edited:

Bigfass

Learned
Patron
Joined
Oct 9, 2020
Messages
561
Location
Florida
Codex Year of the Donut
The Roomba has no "mental model". It just knows the boundaries it's supposed to draw in. No dogs vacuumed up yet.
But the Roomba does have a mental model of its task. It's very limited, but it understands things like remaining battery, the location of the charger, the amount of trash it has inside, the time it's supposed to go to work, etc. It's been explicitly programmed to understand these things, and it has nothing to do with AI.

You can say "AI won't work" until you're blue in the face, but the fact remains that it is working.
Sure, there are products that are based on neural networks, some of them actually useful, increasingly so. That doesn't mean that the current approach is not a dead end as far as something that could be called Artificial Intelligence is concerned. Neural networks latch onto statistical characteristics in the training data, and have no concept of meaning.

There are people much smarter than me arguing both sides of this, with decades of experience in the field. There's no way for either of us to know for sure, one way or the other. I just find the sceptic point of view a lot more persuasive.
 

Non-Edgy Gamer

Grand Dragon
Patron
Glory to Ukraine
Joined
Nov 6, 2020
Messages
17,656
Strap Yourselves In
But the Roomba does have a mental model of its task.
No, I explained that it didn't have a model the way you were saying SD needed and why. It only "knows" what area to stay in. It doesn't know why. It doesn't really know anything.

You're the one who came up with the criteria for a model that requires intelligence. The Roomba doesn't have this. It has no more of a model than Stable Diffusion has. SD's model, by this new definition, is just vastly more complex.
It's very limited, but it understands
It does not. It understands nothing. There is none of the intelligence you said was needed. There are just several programs running on a tiny hard drive. It's no more intelligent than whatever you're typing on right now. Far less, in fact.

You're balking at one of the more advanced AIs to become available to the public in the past few years one minute, but now you think a Roomba has more intelligence just because it doesn't bump into walls.
Sure, there are products that are based on neural networks, some of them actually useful, increasingly so. That doesn't mean that the current approach is not a dead end as far as something that could be called Artificial Intelligence is concerned.
You keep trying to shift the goalpost to true AI, when that's never been what SD was trying to achieve or what any of its users want.

Sure, it'd be more useful, but generated art is the goal, not a toaster with feelings.
There's no way for either of us to know for sure, one way or the other.
Pretty sure the AI images I've generated show me how possible it is to generate images with AI. I think I know that for sure. But maybe I just haven't read enough articles by over-educated concern trolls yet.
 

Bigfass

Learned
Patron
Joined
Oct 9, 2020
Messages
561
Location
Florida
Codex Year of the Donut
You keep trying to shift the goalpost to true AI, when that's never been what SD was trying to achieve or what any of its users want.
My contention has been that neural networks are incapable of understanding meaning, so they will not replace artists. My first post in this thread was in response to "artists on suicide watch".

I don't know what SD is trying to achieve, other than hype and $100m in VC money. And you don't either. Maybe they do:

1663939510828.png


You're balking at one of the more advanced AIs to become available to the public in the past few years one minute, but now you think a Roomba has more intelligence just because it doesn't bump into walls.
You're being intentionally obtuse. You understand exactly what I meant by saying that the Roomba is more aware than any neural network. There exists an abstraction for a room inside a Roomba. Nothing of the sort has ever been demonstrated for a neural network.
 

Non-Edgy Gamer

Grand Dragon
Patron
Glory to Ukraine
Joined
Nov 6, 2020
Messages
17,656
Strap Yourselves In
My contention has been that neural networks are incapable of understanding meaning, so they will not replace artists. My first post in this thread was in response to "artists on suicide watch".
You don't need to replace artists in every field. Just in enough of them. It's the danger of outsourcing and automation: replace enough jobs and you lower the labor demand, resulting in layoffs or a wage reduction.
I don't know what SD is trying to achieve, other than hype and $100m in VC money. And you don't either. Maybe they do:
I would say they're trying to achieve an advancement in open-source AI no longer dependent on companies like OpenAI who purposely limit access to the public, while letting select individuals and organizations use it freely.

Emad has long been a member of several AI communities working to that effect. And not just for art, but for GPT as well. If he wants to make a profit while doing so, good for him. But it wouldn't be the first time someone got bought out.

As to "hype" though, the hype is from the quality of the work itself. Notice how no one is valuing Crayon at $1 billion or writing panicked articles about it.
You're being intentionally obtuse. You understand exactly what I meant by saying that the Roomba is more aware than any neural network.
I understand that you're wrong. A neural network goes through a similar process to a Roomba in a way. Both "learn" what they're supposed to "draw". The Roomba just has a much simpler task and much more easily defined limitations.
There exists an abstraction for a room inside a Roomba. Nothing of the sort has ever been demonstrated for a neural network.
Lol. There is no abstraction of anything to the Roomba. There is a set of coordinates and a program that says "don't go here". That's it. It doesn't understand it as a map, or know what any of the objects it bumps into are. "Don't go beyond this point." That's all.

If you're equating that to an abstract model of reality, then Stable Diffusion has a much more complex abstraction that it was trained to learn. "Draw it like this, but based on these factors, don't draw it like this." There's literally a "model" file that's the result of that.

But of course, neither have intelligence, understanding or true abstraction. And yet both perform their functions in spite of that.
 
Last edited:

Zed Duke of Banville

Dungeon Master
Patron
Joined
Oct 3, 2015
Messages
13,347
I'm trying to make Justin Sweet/Vance Kovacs style Icewind Dale portrait but it seem the AI is mostly inspired by shitty fanart.
They're probably not famous enough and not prominently included in the training data, can't find them listed here for instance: https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/artists.csv
Yes, before relying on an artist, I recommend engaging in testing the impact of that artist as a prompt, by using some series of prompts with only that artist and comparing against the same prompts without any artist and also the same prompts with some artist known to have a strong effect. I hadn't been doing this myself at first, but later testing did reveal that Justin Sweet has only a minor impact on results, which also seems to be true of the D&D/AD&D artists even if they're present in one of the Stable Diffusion lists. Of course, it's best to include multiple artists who either have a similar style (as with many of the examples I've posted in this thread) or who otherwise complement each other (as with the popular Mucha/Rutkowski/Artgerm combination).

DAS RAYCIS!
The training data on which Stable Diffusion relies must suffer from an implicit bias against Spaniards. :M


The Dawn of the Emperors box set from 1989 covered not only the Empire of Thyatis but also its rival the Empire of Alphatia, ruled by magic-users. Although there is some relation to the historical Persian Empire, Alphatia is more sui generis and perhaps not suited for any of the more historical art styles. However, it still has much in common with the pulp fantasy literature that inspired the Known World setting (and D&D), so for portraits I selected as source artists Virgil Finlay and Roy Krenkel (unfortunately, Krenkel had a relatively minor effect, and I wasn't able to find a third artist in a similar art style who works with Stable Diffusion). For urban landscapes, I reverted to the usual Mucha/Rutkowski/Artgerm combination.

Empress Eriadna, with mahogany-brown hair, green eyes, delicate and expressive features, wearing a golden gown:
hpoobFC.png
Y3SiNTQ.png



Prince Zandor, heir to the throne, with brown hair, brown eyes, a white robe, sharp features:
kGpZmdT.png
99DiCT8.png



Master Terari, with grey hair, a grey beard, dark-brown eyes, brown robes, sharp features, inquisitive but settled and relaxed:
wa2qcyc.png



Asteriela Torion, with gold-blonde hair, eyes that should be dark brown, a fair complexion, beautiful, energetic, and charming:
u83bGn0.png
CivY4ro.png



Galatia Allatrian, lady-in-waiting to Asteriela, with red hair, brown eyes, a stylish gown, clever and mischievous:
qntzExY.png
f1oWsZd.png



Alphatia's capital Sundsvall, the city built by magic:
ciFgQcU.png
21I98kV.png

hDwlYxj.png
uAgF6HP.png

8SBOCmv.png
x5WiQg6.png
FGkSbAy.png
tNhN5MS.png
coN2phs.png
WBAlKAA.png



The University of Sundsvall, largest magical academy in the world:
a3Mz73Y.png
Q4RkZra.png

wQSpziS.png
Trr029m.png

ehm4R3z.png
2rKcG3q.png

qU0Grxq.png
ux6SV0G.png

WtsnFaG.png
4TPGtyB.png

DwWXWzR.png
hd7kBqV.png

CDMOpPb.png
O0KecvF.png
 

Catacombs

Arcane
Patron
Joined
Aug 10, 2017
Messages
6,147
I agree with JD; the cityscapes are great. Can you share some of the prompts for them?
 
Last edited:

Dexter

Arcane
Joined
Mar 31, 2011
Messages
15,655
so for portraits I selected as source artists Virgil Finlay and Roy Krenkel (unfortunately, Krenkel had a relatively minor effect, and I wasn't able to find a third artist in a similar art style who works with Stable Diffusion).
I find that if it gets faces more wrong than not, there's probably something incongruent with the artist combination or text description (try to group up all face-describing keywords into one big blob for instance, since keywords that are closer to one another will be linked together more). There are artist combinations where it gets the faces right 90%+ or close to 100% of the times like the one you used for the cityscapes. :)
You can also try to fix a face with CodeFormer or GPFGan, you can still do this after you've created an image in the "Extras" Tab if you're using the WebUI e.g.:
00059.png
00060.png
00066.png
00067.png

If you're trying to do emotions or face expressions, you also have to describe it like you're doing it to an autistic person, otherwise it'll default to a kind of "neutral" face or slight smile e.g.:
SmilesSD.jpg
Emotion-Compare-SD.jpg
Emotion-Compare-SD.jpg
 

Non-Edgy Gamer

Grand Dragon
Patron
Glory to Ukraine
Joined
Nov 6, 2020
Messages
17,656
Strap Yourselves In
If you're trying to do emotions or face expressions, you also have to describe it like you're doing it to an autistic person, otherwise it'll default to a kind of "neutral" face or slight smile e.g.:
Yeah, I've seen people post this sort of image elsewhere as if it shows how to get the AI to draw these things. It's nonsense.

The AI doesn't know what a "maniac smile" is. It knows "maniac" and "smile" at best, and will combine drawing a maniac with the smile expression. Just look at "forced smile", which just looks dumbstruck and isn't a smile at all. Or "demonic smile" which rather than giving her a devilish grin, instead redraws her face as a literal demon with glowing eyes and fangs.

You might as well say "smiling maniac" or "smiling demon", since that's exactly what the AI is drawing, not an expression alone.

jsXWGRi.png


You might get lucking with word association, and that's probably what you should try for, but the AI isn't intelligent. Don't expect it to handle anything regarding emotional expression that might puzzle Data from Star Trek. Google image search outperforms its results for expressions by far.
 

Dexter

Arcane
Joined
Mar 31, 2011
Messages
15,655
The AI doesn't know what a "maniac smile" is. It knows "maniac" and "smile" at best, and will combine drawing a maniac with the smile expression. Just look at "forced smile", which just looks dumbstruck and isn't a smile at all. Or "demonic smile" which rather than giving her a devilish grin, instead redraws her face as a literal demon with glowing eyes and fangs.
It knows what it has been trained on and does keyword association of words clumped together and tries to apply said to the resulting image, it will do this the same with say "cute smile" as it does with "Frank Frazetta":
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=duchenne+smile
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=cute+smile
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=maniac+smile
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=psychotic+smile
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=forced+smile

It doesn't have to be intelligent or "get" a concept like a human would to do this. And as you point out, overemphasizing/weighting some things can lead to bad results and some things barely work or don't work at all. Fact of the matter is, if you want certain facial expressions or emotions expressed by a character you have to specifically guide it towards said result by telling it. The comparison seems to be made using the same Sampling method/steps/CFG and Seed and simply inserting different keywords into the image, and it obviously "knows" what a bunch of these things are/recognizes them as facial expressions (crying, sleepy, yawning, angry) or various smiles it can seemingly differentiate between and apply them to the resulting pictures as can be seen.

And yes, maybe "devilish grin" might have been a better keyword association, although given some of the results I assume it would have likely transformed the face too, as would have likely been "maniacal laughter", and I think "aesthetic score" also plays a role in what it has been fed:
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=devilish+grin
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=maniacal+laughter

In that way Data is a bad depiction of "AI", since it would have been easy to just look up human emotions in the database or recognize a joke being told and present the appropriate reaction, just like the Enterprise computer did every time it started a Holodeck program, even though it was supposed to be a lot less sophisticated than Data.
 
Last edited:

Non-Edgy Gamer

Grand Dragon
Patron
Glory to Ukraine
Joined
Nov 6, 2020
Messages
17,656
Strap Yourselves In
It knows what it has been trained on and does keyword association of words clumped together and tries to apply said to the resulting image, it will do this the same with say "cute smile" as it does with "Frank Frazetta":
I know that. So does google. But that's not going to help you if the association isn't strong enough, and many of the images in that collage are clearly not what the prompt says.

Zq5ujrN.png


I mean look at this. Come on.
And as you point out, overemphasizing/weighting some things can lead to bad results and some things barely work or don't work at all. Fact of the matter is, if you want certain facial expressions or emotions expressed by a character you have to specifically guide it towards said result by telling it.
But that's my point: you need to understand that it just won't have the associations there for it in most cases. You need to look at things objectively and not assume wishfully that your prompt is having an effect and the AI isn't just drawing random smiles.
although given some of the results I assume it would have likely transformed the face too
I'm thinking the best way to handle such things for now is with img2img. Provided the colors don't change much, you can mask out only the area you want. I've done this with several images, adding smiles, fixing eyes, etc.
iUUffCf.png

:hmmm:
 
Last edited:

Zed Duke of Banville

Dungeon Master
Patron
Joined
Oct 3, 2015
Messages
13,347
I agree with JD; the cityscapes are great. Can you share some of the prompts for them?
The university pictures are 14 of 30 I created with exactly the same prompts, only changing the seed; the terms were broad enough to capture both exterior and interior scenes with considerable variation. Similarly, the city scenes are 10 of 25 created with exactly the same prompts, only the seeds differing. The descriptions are simply keywords from Dawn of the Emperors, although not necessarily found in one place. Interestingly, Sundsvall is the name of an actual city in Sweden, and this had some impact on the architecture displayed, although the effect was limited compared to what the results would have been with a stronger city such as Novogorod, Chartres, Athens, or Kyoto:

No City NameSundsvall
NovgorodChartres
AthensKyoto

oE78Mzr.png
hDwlYxj.png

AthNKd9.png
aZfduDW.png

2A0d4VR.png
i4U2jsN.png


7qQgOMO.png
uAgF6HP.png

sYkHZpX.png
Fx15TPw.png

d7gFkMW.png
63SVL2l.png
 

Non-Edgy Gamer

Grand Dragon
Patron
Glory to Ukraine
Joined
Nov 6, 2020
Messages
17,656
Strap Yourselves In
Could it be that it treats separate words as individual terms and you need to group them together with quotes to form a single term?
Not really. The () should do that, iirc, if it was going to, but it doesn't because it doesn't work like that.

Basically, every image got scraped with whatever words are associated with it. The words and phrases become tokens, and when there's a strong enough association for the AI on a particular token, it should draw it. If not, it won't. I also understand that how your words get tokenized is a factor. Supposedly Dall-E provides additional guidance that SD doesn't in that process, making the outputs end up more like what people type in than SD.

But basically, there should be a strong association between what you type in and what you want to see within the model, or it probably won't work. You can't depend on it to understand even common things. Some things it just hasn't recieved enough training on. It may understand what a happy face is, but probably not an ambivalent one.

That's my limited understanding of it. But just know that a lot of the reddit and 4chan meme prompt tips are junk. Especially their negative prompts.

If I had a nickel for every time I saw someone put "deformed hands" in their negative prompt, I'd have a lot of nickels. The AI just looks at it and says "Ok, no deformed and no hands!" And these people don't even notice that none of their pictures have hands.

:deathclaw:
 

Dexter

Arcane
Joined
Mar 31, 2011
Messages
15,655
Not really. The () should do that, iirc, if it was going to, but it doesn't because it doesn't work like that.

Basically, every image got scraped with whatever words are associated with it. The words and phrases become tokens, and when there's a strong enough association for the AI on a particular token, it should draw it. If not, it won't.

But basically, there should be a strong association between what you type in and what you want to see within the model, or it probably won't work. You can't depend on it to understand even common things. Some things it just hasn't recieved enough training on. It may understand what a happy face is, but probably not an ambivalent one.

That's my limited understanding of it. But just know that a lot of the reddit and 4chan meme prompt tips are junk. Especially their negative prompts.

If I had a nickel for every time I saw someone put "deformed hands" in their negative prompt, I'd have a lot of nickels. The AI just looks at it and says "Ok, no deformed and no hands!" And these people don't even notice that none of their pictures have hands.
Pretty sure it can do keyword associations well enough, since it uses CLIP to resolve your prompt and try to come up with a fitting image. The closer two words are to one another in the prompt, the likelier it is to associate them. That's why you should group up things describing one specific element of a picture like a face or the background or whatever into a word blob instead of trying to spread it all over. Similarly, it parses a prompt from the start to the end and will give added weight to what's at the beginning. If you for instance say "fire breathing dragons attacking a castle" it'll concentrate on the fire breathing dragons. If you instead describe a castle with its moat and whatever in minute detail, add some artists or whatnot and throw in "fire breathing dragons" at the very end with like ~30-50 words before that it might outright ignore it or make it a very minor component of the composition. If it couldn't do context and basic association you'd have "fire", "breathing", "dragons", "attack" and "castle" as separate elements competing with one another and a mess of an incomprehensible picture.

CLIP has been trained on image-text pairs and can do a lot more than resolve a single word though as it does Natural Language Processing, for instance you can feed it an image and various descriptions and based on its previous training it can determine which is the closest by measuring the statistical relationship between tokens it finds close to one another: https://towardsdatascience.com/clip...el-from-openai-and-how-to-use-it-f8ee408958b1
1*r9SKmy1fQE6r2WrWQ-c03g.jpeg

1*EqqumwYGzFmvOGnHhqajIQ.png


It uses a similar method to generate pictures from noise and find what is the likeliest result from your text prompt. Without CLIP and the correlation it does between text and image you wouldn't be able to type something like "photo of a woman with platinum blonde hair" and get mostly highly accurate results, and it would draw much more nonsense or random shit instead. It obviously understands what "platinum blonde hair" is in relation to "a woman" and doesn't just start painting chunks of platinum near a woman and random hair all over the picture.

I don't think () and [] are meaningful for CLIP, that's a feature that was implemented in the WebUI Repo to (((add))) or [[[detract]]] weight from a specific keyword or keyword combination. I don't think it does anything for grouping.

Also pretty sure it knows well enough what "deformed hands" are, since this is what comes up if you just input that term, same for "poorly drawn hands":
grid-0442.png
grid-0444.png

And this is what comes up when you search LAION-5B for them, all this Search does is also use CLIP to search for images based on text in the database of 5 billion images, a subset of which SD was trained on, I believe they limited it to a higher aesthetic score (you can change that on the left-hand side):
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=deformed+hands
https://rom1504.github.io/clip-retrieval/?back=https://knn5.laion.ai&index=laion5B&useMclip=false&query=poorly+drawn+hands

As for their usefulness as negative keywords, I'm not sure and there is definitely a lot of placebo going around. If not copying something to try for a specific result, I generally use them very sparingly mostly if I want to get rid of a certain element or predominant color or whatever. But it certainly doesn't just "remove hands" and you can easily test that.
 
Last edited:

Non-Edgy Gamer

Grand Dragon
Patron
Glory to Ukraine
Joined
Nov 6, 2020
Messages
17,656
Strap Yourselves In
CLIP has been trained on image-text pairs and can do a lot more than resolve a single word though as it does Natural Language Processing
I know. That's why I said it uses tokens.

If there's enough association in the training data, it will be able to associate these tokens, or group multiple words into a single token. If not, it won't.

I don't think () and [] are meaningful for CLIP
Again, I know. It's for weights in SD. Assuming there is enough association, emphasizing the token or tokens should work to emphasize the association in the result. But as I said, it doesn't because it doesn't work like that. The association is either there or it isn't. Yelling at the model won't make more training data appear.
Also pretty sure it knows well enough what "deformed hands" are, since this is what comes up if you just input that term, same for "poorly drawn hands":
:lol:

Dude, the model can't draw hands at all, unless by a fluke. Look at what comes up for "perfect hands":

Screenshot 2022-09-28 143556.png

See, this the kind of thinking I'm referring to that leads to so many incorrect conclusions. You assume the model knows things based on what you read into the images vs actual testing and data.

1664394195006.png


Again, this is not a forced smile or a cruel smile. They're both almost the same expression, and not even smiles at all. It's classic bias in testing.

As for their usefulness as negative keywords, I'm not sure
You can get sure by testing it and paying attention objectively instead of hoping to see what you want to see.
Half of these are just hands.
 
Last edited:

Dexter

Arcane
Joined
Mar 31, 2011
Messages
15,655
If there's enough association in the training data, it will be able to associate these tokens, or group multiple words into a single token. If not, it won't.
Yes it does, based on closeness and statistical relationship. And if something isn't in the model it might just add noise or try to infer something else. It won't "group multiple words into a single token" though, since a token is at most a word, if it's a long one then it can even be 2-3 tokens for a word, also commas and other separators are usually a token. You can see how this works here based on GPT-3, just hit "show example" or Paste something: https://beta.openai.com/tokenizer It's not 1:1 applicable to Stable Diffusion, but the same principle. One of the recent WebUI Updates even introduced a Token counter so people don't exceed the Maximum of 75 for the Model, which some people did, since everything after that just gets cut off.
Again, I know. It's for weights in SD. Assuming there is enough association, emphasizing the token or tokens should work to emphasize the association in the result.
I was just clarifying, that () or [] aren't tokens and have no meaningful effect on the Tokenizer or CLIP, it's just a specific implementation in one/probably the widest used WebUI's: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features
attention-3.jpg

Using other WebGUIs, Desktop UIs, Plugins or the SaaS implementations of various Websites, these characters have no meaningful effect.
Again, this is not a forced smile or a cruel smile. They're both almost the same expression, and not even smiles at all. It's classic bias in testing.
And as you point out, overemphasizing/weighting some things can lead to bad results and some things barely work or don't work at all. Fact of the matter is, if you want certain facial expressions or emotions expressed by a character you have to specifically guide it towards said result by telling it.
I don't even think the Original post claimed it was, just testing what works and posting a helpful comparison. I'm not searching where that was first posted though. I'm still not sure why you're throwing a fit over me pointing out that if you want a facial expression in a portrait or any picture depicting characters to be anything other than "neutral" you have to specifically point that out by including something like ((laughing)) or (cute smile) or whatever else, many of which do work.
You can get sure by testing it and paying attention objectively instead of hoping to see what you want to see.
And these people don't even notice that none of their pictures have hands.
That I did test. Your claim that "deformed hands" in the Negatives removes hands is wrong, there were still plenty of hands in the results. Beyond that I have no idea what it exactly does and have no interest in lengthy testing, the results seemed a bit worse than without it, but that doesn't mean the model doesn't have a vague idea what "deformed hands" means and that if it could perfectly draw hands that wouldn't be useful. As you pointed out it still has problems drawing hands in the best of cases though and seems to be more based on luck if it gets it kind of right once or twice.
 
Last edited:

Non-Edgy Gamer

Grand Dragon
Patron
Glory to Ukraine
Joined
Nov 6, 2020
Messages
17,656
Strap Yourselves In
It won't "group multiple words into a single token" though, since a token is at most a word
No, it depends on the model. It is possible to do multi-word tokenization or to combine punctuation into a single token with the word, and I've used language models that have done this, but I have no idea if SD does that or not.
I was just clarifying, that () or [] aren't tokens and have no meaningful effect on the Tokenizer or CLIP Model, it's just a specific implementation in one/probably the widest used WebUI's
Yeah, I know.
That I did test. Your claim that "deformed hands" in the Negatives removes hands is wrong
In most cases, it does. The negative prompt isn't 100%. The hands will usually be out of frame, a single hand will be shown, the hands will be incomplete etc. You can say it's wrong, but it's not. I've seen it, and anyone who wants to can test it. You can insist that you have tested it, but I encourage people not to take your word for it.
Beyond that I have no idea what it exactly does and have no interest in lengthy testing
Then you admit you don't actually know.

It's the lack of interest in "lengthy testing" that's your problem in general. Just like you typed "deformed hands" into the model, got deformed hands and assumed it understood you even though it drew hands how it always draws them. A handful of tests or a single test means very little when dealing with such randomness.
the results seemed a bit worse than without it, but that doesn't mean the model doesn't have a vague idea what "deformed hands" means and that if it could perfectly draw hands that wouldn't be useful.
Yes, IF it could. But it can't. That's my point. You can't use a prompt to force ideas that aren't already trained enough into the model. I wish you could. I wish I could just type "bad drawing" into the negative prompt and avoid all the bad drawings, but I can't. I'm probably just telling it to avoid the tokens "bad" and "drawing" more than anything. Which, btw, avoiding "drawing" is useful if you want to force photorealism.
As you pointed out it still has problems drawing hands in the best of cases though and seems to be more based on luck if it gets it kind of right once or twice.
Glad we agree.
 
Last edited:

Zed Duke of Banville

Dungeon Master
Patron
Joined
Oct 3, 2015
Messages
13,347
Turning to the Planescape campaign setting, starting with Mechanus, the plane of Law(ful Neutral).

Clockwork mechanism:
kgOpFB6.png



Great Orrery:
4TObhuS.png



Modron Cathedral:
t3eYwvm.png



Jade Palace of Shang-Ti the Celestial Emperor:
Q7mjEfk.png



Mycelia, domain of Psilofyr god of the Myconids:
EFJ2yso.png
 

As an Amazon Associate, rpgcodex.net earns from qualifying purchases.
Back
Top Bottom