I have a big set of strings of arbitrary natural language for me to analyze my tool to give each string a unique color value (RGB or other). I need color contrast to be dependent on string similarity (more string is different from other, and their respective color should be different).
Any advice about how to pass this problem?
Update at a distance between stars
I probably need "equality" which is defined as the distance like Levenstein. No natural language parsing is necessary
This is:
"I am going to the store" and "we are going to the store"
Similar
"I'm going to the store" and "Today I'm going to the store"
Similar Even in the form (but a little less).
"I'm going to the shop" and "JBN Hpjohn up UIFTups F"
is not exactly the same.
(Thanks,!)
I would probably know
which distance function I need, when I see the program output, with the simple things Start. Update on Tasks Simplification
I've removed my suggestion to split my work into two & mdash; Absolute distance calculation and color distribution will not work well as before we are diminishing dimensional information in one dimension, and then trying to synthesize it up to three dimensions.
What do you mean by "similar strings" to come up with a suitable conversion function? Need to be more detailed. What is the string
"I'm going to the store" and "we are going to the store"
Is it the same? What about strings
"I'm going to the store" and "JBN Hpjohn up UIF tips"
(All the letters in original 1) , Or
"I'm going to the store" and "Today I'm going to the store"
? Based on what you call "equal", you can consider various actions.
If differences can only be based on the values of characters (in Unicode or whatever they are), then you can use HSV space to summarize values as a color and use the result Try if the colors can be different from having a longer string, then you can weigh the characters based on their position in the string.
If the difference is more complex, such as the occurrence of certain letters or words, then you need to identify this. If you have a lot of these in the domain, then you can decide the values of red, green and blue depending on the number of S, SS and RS. Based on the proportion of the vowels, or the syllables, select a color based on the proportion of the vowels or by the words.
There are many different ways to reach this, but the best one is actually "the same" wire.
Comments
Post a Comment