Radu Luchianov`s: Pastime: Commentary
Information Space

This is a reply to a posting on the HOT/CURE lab list.

On Wed, 17 April 2002, "Richard F. Dillon" wrote:

A common UI problem is selecting from a large number of choices. An example is storing and using bookmarks. A common solution is to store and select alphabetically or temporally (in the order entered), perhaps with the capability to easily shift back and forth among the orders. Another solution is to arrange items hierarchically in folders or successive menus. Another approach is to provide a search capability. All techniques have serious problems.

There is an interesting hybrid solution involving a search capability for bookmarks at http://www.kaylon.com/why.html . (Read the first three major headings). What they have isn't new, but it puts together a number of techniques in an effective way. To my surprise, I like their approach, and see potential for the approach as a general solution to the large-number-of choices problem for a variety of things (but not all things) beyond bookmarks.

Look at what they say and, in spite of any skepticism, try their bookmark program for a while. Think beyond bookmarks and think beyond the specific implementation they have.

On Wed, 17 April 2002, "Radu Luchianov" wrote:

You asked what we think on a very interesting topic, so here's a bit of MonDoc :)

Actually the issue you raise in your message above -selecting from many choices, or finding what one needs in a large information space (henceforth ISpace: e.g. UI, database, encyclopaedia, PIM, GIM, conceptual net,...), starts with organizing said ISpace in an accessible way. This is currently approached to my knowledge in many ways, including:

mathematically in algorythm sets:
1. structurally (as weighed directed graphs, a superset of hierarchical trees mentioned by you and the article you point to, as is implemented in different versions of adaptive navigation),
2. rationally (with different methods of
  - imposing conceptual distance, as it is done in the natural sciences (e.g. Geology, Biology) or
  - assessing conceptual distance, as it is done when selecting stimuli for psycholinguistic experiments),
3. statistically (with methods which sort flat lists by frequency of usage, as we see sometimes implemented in the Location pop-down of browsers),
and even cognitively (the problem of conceptual binding, one of the toughest of the problems faced by Cognitive Science: how come we come up with the right answer to a question so quickly, especially when the answer is NO or I DON'T KNOW)

In MonDoc, there are several relevant principles to cover this problem; here are two:

content: reducing choice to relevant options (which is relevant mostly to the comments on UIs you're making; I implement this principle among other ways with the approaches 1 and 3 above)
automatic bidirectional topic linking (which is relevant to the comments on bookmark organization you're making; this is a more specific solution to approach 1)

Here's one possible implementation of the topic linking principle to bookmark organization (I guess similar with the approach taken by the Kaylon guys):

Start with keywords and bookmarks. Allow for association lists (among keywords, from keywords to bookmarks and among bookmarks). The resulting data structure can be rendered for example as:

a hierarchical tree in which the same bookmark can exist in multiple 'folders'
a keyword graph where each keyword can open a sublist of bookmarks (the more specific the keyword, the shorter the list)
category pages or notes (a la Yahoo)
dynamically generated pop-down list groups (as I implemented in the first HYMNS project, in 1996)
and of course, the whole structure can be searched

On Thu, 18 April 2002, Sanjay Chandrasekharan wrote

This is interesting, we seem to be interested in the same problem, more or less! :-)

Radu: I would guess more than less. We're both studying CogSci, remember? :)

Sanjay: Part of my post was just a high level way of saying what you said above.

Radu: Sorry, I found your post after I wrote mine, and it brought a different perspective on the question. I guess that the question Dick raised concentrated on handling choices *after* they are

generated (the process of signaling you mention) and

encoded (the signal-to-data process: e.g. building a user interface)

Sanjay: I subsume the rational and conceptual under "classification".

Radu: I know. I fought in many papers that subsumation (which many researchers of categorization fall for.) It's blurring concepts that are already fuzzy enough. The 'rational' approaches are in my opinion only sieves that filter out currently consistent concepts out of what creative processes generate. This separation, that defines the rational as a subset of the conceptual, has been around in the Cognitive Modeling world ever since GPS - the first formal architecture implemented I know of, the General Problem Solver.

Sanjay: If you think of organizing a set of papers in your office, the rational classification is an ordering by topic, the statistical classification would be a pile-by-usage (most accessed piles in this corner, less accessed piles in that corner).

Radu: Yeah. By the nature of classification, you can put it as a label on any decision-making process. At the expense of lots of relevant (and important) details of the structure of these processes. But my ad-hoc listing of approaches to ISpace structuring was just a side remark, important only from an implementation and maybe functional perspective. I think Dick is concerned with a specific use of that structure, choice: in your example, finding the right paper as soon as possible.

Sanjay: The structural classification is kind of difficult to group, it depends on how the weights get generated.

Radu: I guess Dick's looking for simple descriptive methods that anyone can use, not to add to whatever theory librarians use :)

Classification's prescriptive, not descriptive. Symbolically, it's just a pruned traversal of a graph. In abstract n-dimensional space, it's a systematic field sampling along isometric curves/sheafs. Any ISpace can be traversed in a huge number of ways, number that depends on how many facts and relations the ISpace is made out of. That's why it takes protocols and standards so that people can communicate :) Classifications are simply basic elements in such protocols.

In other words, making a predefined classification doesn't help much. Netscape has tried, Yahoo has tried, we all try to arrange bookmarks only to end up duplicating categories in different bookmark folders (or keeping a flat list sorted by some set of criteria).

Sanjay: If it is created out of keywords provided by the user, then it falls under my "signal" category, because the weights are provided by the user, and is functioning as a significance structure.

Radu: You seem to be contradicting yourself there. You say signals are inherent in the objects, but also that they are "in the eye of the beholder"?

Sanjay: In general, none of the approaches you list consider signaling as a way of solving the many objects problem.

Radu: Many choices, not objects. Big difference :). For me, signals, the form under which knowledge exists in the environment, are too low-level and too unstructured, too context-sensitive to be discussed at this level. We're discussing assisted decision-making, remember? Of course signals are involved, but if we'd go that low it would be like appreciating a painting by doing the job of a ray-tracer, one unit of field of view at a time.

Sanjay: But people and animals use signaling often to solve precisely this problem. Think of the balloon tips provided for icons. I consider that a signal.

Radu: Whoa. Mighty well-structured signal. I consider that a functional cue, not a signal.

Sanjay: Which of the knowledge organization techniques you mention cover that?

Radu: I didn't mention a technique for that, because I consider it a mere matter of form: how a transition in ISpace gets rendered. MonDoc has the following principle that covers it:

"form: use context cues (through dynamic layouts) to enforce the intended structure"

Sanjay: Or consider a library book with RFID tags in it. The book can respond to a user's query and direct a user to the book, much like a male cricket's song directing a female cricket to it.

Radu: Sure, but in this example you already made the choice, and it's only a matter of tracking down the result in 3D space using a directional signal receiver. This is only a subset of the cases covered by the problem raised by Dick.

Sanjay: None of the classification techniques cover this kind of "focusing" on an object. Note that the object itself is providing the information for this focusing.

Radu: Sure they do! Even hierarchical trees do. Traces of the information represented by the object are supposed to be left through the path followed while classifying a bookmark. And that breadcrumb trail is supposed to be used later, during retrieval. The problem is that people's internal ISpace changes continuously as they are exposed to environmental stimuli and internal adjustments.

On Fri, 19 April 2002, Sanjay Chandrasekharan wrote

I don't agree that the problem is one of many choices, it is one choice, many objects. Essentially it is about finding a specific object from a bunch of similar ones. So you have a bunch of papers (or bookmarks), and you want to find a particular paper (or bookmark) without searching through the whole lot.

Radu: Irrelevant terminological debate. We talk at two different levels of complexity. The objects exist, it is the choice that is made. Thus we should concentrate on the process, not the arguments of the process (objects). You're saying that the choice is made at once among many objects - view consistent with a Gestalt/imagery/parallel approach. But any parallel system can be serialized - and it should be in order to be communicated/understood. At the absolute bottom of that process, the atomistic process can be only binary choice. Something like MergeSort. Though the final result is one sorted list, the individual step in the recursive process are binary comparisons. Notice that I'm not saying that the process is serial. Each binary choice can - and often does - get input from other binary choices which happen at the same time in the system. This is what I think makes parallel pruning processes faster than their serial counterparts.

Sanjay: In my view, the only way you can do this is by creating new structure in the environment. So you can order the objects (papers or bookmarks) alphabetically, by category, by time, or by any number of ways. But consider this: the only reason you have to order the papers is because the individual paper (or bookmark) cannot "call out" and tell you "here I am". If the object can do that, you don't need to order objects, classify them.

Radu: Yes, creating new structure in the environment IS a solution. But that only follows changes in the agent's internal model of the environment. I think it goes like this:

agent observes certain recurring cues in the environment (goal-free, emergent: signals being data-fied: labels/names),

as a result, its internal model of the environment is augmented with the new data (label is 'attached' to corresponding environmental cue),

[optionally] the new data is checked against further occurences (goal-directed, systematic: agent pays attention to occurences of that cue in the environment; each consistent occurence raises the strength of the label),

agent decides to change the structure of the environment in order to capture that systematic cue (e.g. writing down symbols, placing symbols in spatial relatios to each other, moving objects to new places- a.k.a sorting, etc.)

I found it very useful to disambiguate signals and data, but you seem to like that ambiguity :)

Sanjay: Okay, I agree that in classification the agent uses cues that are already present in the environment to change the world. But I don't agree that the change in the external world follows a change in the internal model. Consider the case of the papers again. I have an internal model of the papers I have, and I know that they have a fixed number of attributes (title, author, journal, keywords, size, shape...). Out of them, the only attributes I can access readily ar the size and shape, and those attributes don't let me discriminate between papers, and find the one I need. This is the reason I reorder the environment. Once I reorder the environment, I change my internal model (it now says, "Kirsh paper is in that corner lot" etc.)

Radu:

Sanjay: The same applies for signals. I decide to add markers to all my important papers and do so, then I change my internal model to say "the marked papers are the important ones".

I didn't understand your comment on signals and data. For me, a signal is data that is focused to a function, and is discoverable readily by the function; that is, the data announces itself to the function. Pure data is not directly linked to a function (like an unsorted pile of papers) and needs sorting or other operations to be useful.

Radu:

Sanjay: But you can make objects do that. In my RFID book example, that is what the tag does. It allows the individual object (the particular book) to "call out" to the searcher. The object is not passive here. This principle is behind much of animal signaling, animals "announce" their locations and internal properties. But this principle is also used by us when we put markers and labels on objects to locate and identify them. The marker focuses our attention on that particular object we are interested in.

Radu: Er... Sorry, but objects are passive by definition. It is the agent that makes use of whatever affordances the environment offers (be it objects or other agents.) See, in your RFID example, the user needs a RF receiver to filter out all the frequency markers coming from other books.

Sanjay: Yes, I agree, objects are passive. But the interesting point for me is that an agent can make the object "talk". He does it by creating a function (the receiver) and signalling data (RFID tags) that "fits" that function. There was no such affordance in that object before, and there was no function that could pick up that affordance. But now the object has that affordance and it can be picked up. The question is: how do we get to make objects "talk"? How do we create new data and fit the new data to new functions?

Radu:

Sanjay: These two processes, classification and signalling, are two ways of pruning the information search space. Classification works on passive objects by ordering them, by providing a graph to traverse. Signalling shortcuts classification and leads you directly to the object. It works like a pointer.

Radu: These approaches are not at all mutually exclusive. Classification uses signals in order to work. Signals use at least the simplest classification (sorting), in order to be distinguished from each other. What I think you're trying to suggest is using overlapped multimodal signals with decoders on different spectra in order to facilitate retrieval. But you're still working at the retrieval nooks in the cognitive woods, not at the choice/decision-making ones. Your RF decoder can't make the decision itself of what RFID tag to locate, it's still the library patron who does. And this latter one is the problem Dick raised. Say you know what book you're looking for, but forgot the title and author. How is the RFID tag finder going to help you?

Sanjay: I partly agree. Classification uses signals to work. But not the other way round. Signals don't use classification. Signals work by making an object salient among other similar objects. You don't have to traverse a tree to find the holidays (marked in red) in your calendar. But if you want to find the third Tuesday, you have to traverse a tree. To use a classification, you need to traverse a tree.

About the second part, I think our understanding of Dick's problem is different. Think of this: suppose I know which bookmark I need, let's say it is the one to your page. And I have a hundred unordered bookmarks. How do I find the bookmark to your page from these hundred? This was Dick's problem as I understood it.

If your page's bookmark is something I need to recover every once in a while, it is cost effective for me to order it in a way that I can retrieve it easily. The choice/decision I make is: should I, or should I not, reorder my bookmarks so that I can access them faster? Once I decide to reorder them, I can go about it in two ways. One is by classifying them using keywords and folders. I traverse a tree here. The other approach could be colour coding the bookmarks. If red is the colour of all cograd pages, I ignore all the rest and just look at the red bookmarks to get your page. (Bookmarks are not a good example, because you don't have a lot of options to make them signal; papers are better.)

In the RFID case, my problem is: I know the name and author of the book, but I don't know how to find the book. The traditional way is to traverse a tree, first in a catalogue, and then in the physical stack. In the RFID case, you don't traverse any trees, you just go to the book directly. Signalling cuts out tree traversal (which is why nature came up with it in the first place). The interesting point is that if signals grow beyond a point, you have to traverse trees again, unless you are able to process many signals at the same time (which RFID receivers can do).

Radu:

Sanjay: I agree that we also use a lot of structure that already exists in the environment (like colour, shape etc.) as signals. But the problem is that it is difficult to find out which structure is being used, and how. When you create new structure, you can see how they are created, and how they map on to functions. That's the reason I'm interested in created structure.

It looks to me that you consider the bookmark as a signal, a pointer. So in the bookmark case, the signals make up the ISpace, and you search for the right signal. I agree that you can look at it this way. Think of the bookmark as a marker. If I put markers on every paper I have, I will need to classify my markers to find the right marker. This is the bookmark problem again. The reason I made the bookmark was to lead me directly to the page, now I need a marker to find the bookmark!. So it is a never ending spiral. In the Kaylon case, what happens when there are so many keywords that they are not useful anymore?

Radu: Just as you say, a search would retrieve too many bookmarks for the system to be useful.

But let me try again to make it clear. I do not consider a bookmark as a signal. A pointer yes, but a pointer has much more structure than a signal. The definition of pointers includes some types of signals, but you can't use the terms interchangeably. If you can, it means you don't need one of them :)

Sanjay: Mea culpa. I agree, a bookmark is not a signal, it is a pointer, because you follow it to get to an object. A signal announces that an object is "here". A pointer says "go this way to X", a signal says "I'm here". In the first case, you want something, and you don't know where it is. In the second case, you know where it is, but you cannot distinguish it from the rest.

I made a bookmark a signal because it was what we started off talking about, and it was easy to illustrate my point using them, sorry.

Radu:

Sanjay: What I'm interested in is:

at what point does a signal (like a bookmark) become useless? What are the factors that affect this?
at what point is a new signal useful? And at what point is classification useful?
What are the factors that affect the use of a signal? I'm looking at animal signalling to understand this.

Radu: Generally, I suggest you re-examine your terminology. I'd go for a glossary of terms to make sure that I'm not confusing the many meanings natural languages tend to pile on terms. I'm planning to do the same for my own dissertation.

More to the point, to [try to] answer your questions:

signals are useless most of the time; that's why there's so much noise in any environment; signals are 'useful' only to guide perception. I guess you agree that 'useful' implies goal-directed behavior, which makes up a very small percentage of out cognitive activity. If you want to augment the definition of 'useful' to comprise some forms of meta-cognitive ability (like thinking about how signals affect the behavior of an agent) - that's a different story, eventually leading to mind-reading and other intractable problems.
signals are not now or old; they happen when they happen; I guess you mean 'when are new codes useful'. Classifications are always good for reference; when they identify mutually exclusive subsets of a set, they are great for abstractization; but that seldom happens.
now that's the best question you asked. If knowledge is not relevant, it's not knowledge but trivia. So if you get an answer to this question, I'd like to know it

Sanjay:1. On the contrary, I think signals are very useful, which is why nature uses them so much. And they are useful not just to guide perception, they allow animals to take a lot of important decisions, like mating, feeding and fleeing. The interesting thing about signals is that they work in spite of the noise in the environment.

I also don't agree that goal-directed activity makes up only a small percentage of our life. I see goals in everything we do! :-)

Thinking abut signals doesn't lead to mind-reading, because I doubt whether crickets and peacocks read minds. A signal can develop just by reinforcement learning. What I'm interested in (like any designer) is to short cut learning and get to the mechanism directly.

2. No, signals can be new. There are studies that show that animals develop new signals in response to changes in the environment (see the case of the tobacco plant in my other paper). I don't mean code, a code is a new way of using the same given structure, a variation. A signal is a new structure that the sender learns to send and the reciever learns to process.

I agree that classifications are always useful. That way, they are like brute search, and only a little better; dependable to give an answer, even if it may take thousand years! :-) Signals are not always useful, but when they are, they are VERY useful, because they cut down search drastically. And evolutionarily, signalling is a much more primitive structure than classification, signaling exists even in single cells. Classification developed quite late (for instance, squirrels hoarding nuts in different places etc.). So I think we are much better equipped to detect signals than to traverse classification trees.

3. Of course I will let you know! :-) I sort of know some of the factors, read my other paper! :-)

I will reply to your comments on my paper later. Thanks a lot for taking the effort! I hope this discussion will be useful for both of us!

Radu: I hope that too. If not in any formal way, at least we can help each other find holes in each other's approaches, theories, methodologies.

On Tue, 23 April 2002, Sanjay Chandrasekharan wrote

It looks like it's not just us who order cues,

Radu: Hehe... We're animals first and humans second. Many of our cognitive abilities are present in other species, no surprise there.

Sanjay: I just got this from a Polish biologist:

"In short, we analyzed design of the signal used in courtship and amount of time which the receiver needs to assess it (i.e. find a statistic describing it with a negligible error). It seems that signal assessment is quite costly in terms of time (a female needs almost three hours to asses signal of a single male). However, it takes much less time, when the receiver just tries to order several males according to some characteristic of a signal."

So, if you can compare a signal with another of the same kind, then you are better off in deciding about it. If it is a stand-alone signal, then you have to assess it using a normative scale, pretty hard to do.

Radu: Yup. Sounds reasonable. Except for your strange use of 'signal' :) In some experiments on context effects on decision-making and choice I also noticed that people spend more time in appraising a stimulus (visual in that case), than in choosing between two stimuli. What were the animals(?) the Polish biologist studied? Frogs?

Cheers.
Radu

Radu Luchianov`s: Pastime: CommentaryInformation Space

This is a reply to a posting on the HOT/CURE lab list.

On Wed, 17 April 2002, "Richard F. Dillon" wrote:

On Wed, 17 April 2002, "Radu Luchianov" wrote:

On Thu, 18 April 2002, Sanjay Chandrasekharan wrote

On Fri, 19 April 2002, Sanjay Chandrasekharan wrote

On Tue, 23 April 2002, Sanjay Chandrasekharan wrote

Radu Luchianov`s: Pastime: Commentary
Information Space