Categories
dissertation

Dissertation – Hypotheses

Dissertation – Hypothesis

9-3-95

Part of the draft dissertation of William C. Wake

Model

Evaluation of information retrieval systems has often been treated
as “black-box” evaluation: “How well does the system match results
to the query?” While such a question is important, it is not
complete.
As we move toward considering more of the human interface design
of a system, we have other questions as well: “Does the system
encourage the user to ask the right questions? Can the user
understand the results? Does the system lead the user down a good
path?”
Information retrieval is a process, and real systems must
be evaluated in that context. Two systems are likely to diverge in
real use, as the user sees different results and asks different
questions.
Rather than Input–>Output, as a black box, we also consider
internal effects and external effects:

Input–>(Internal Effects)–>Output
| |
/
V
External Effects

That is, we want to know what’s happening inside the
black box, as well as any side effects. Since this is a process,
the steps involved have a history, and may affect future steps.Two systems might have the same measurable output for direct
performance of a task. For example, users of two library systems
might look for something, and arrive at lists of items that an
expert evaluator decides are equally good results for the search
undertaken. However, by looking inside the process by which the
systems are used, or looking at effects other than “direct quality
of retrieval,” we may discover reasons to prefer one system over
the other.

Hypotheses

Suppose the “Input” is a set of tasks with some medium- to
long-term commitment (not just one-shot queries intended to be
tested and thrown away).

Output

These measures are the traditional ones that evaluate how well a
user performs a given task.

Task Performance

O1. There are important tasks that yield higher-quality results
using a powerful browser.

  • A large class of searches require looking at what’s there to
    decide what to do next.
  • but, there may be useful tasks that don’t need the
    interaction.

O2. There are important tasks that yield results faster using a
powerful browser. O3. Users of a browser give better estimates of
the number of relevant items.

  • They have been exposed to items along the way, and may have a
    sense of what proportion are useful.
  • But this exposure may not be helpful – the users may get
    just as good a sense from a query result.

O4. Users of a browser can better estimate the distribution of
items’ attributes.

  • Browsing exposes users to the attributes much more often.

User Preference

O5. There are important tasks that some people prefer to undertake
using a powerful browser.

  • As for O1.

Internal Effects

These measures assess what happens to the user during performance
of a task. While we expect that “better” internal effects will lead
to better output, it’s not a contradiction if this is not the case.
(For example, one system might allow extremely fast query
formulation, but still have worse output because of other factors.
In another context, the fastest guitarist might not be the
best guitarist.)

Task Performance

IE1. Seeing items in context helps people elaborate their
information needs more quickly.

  • Seeing what is there may help guide them to the useful
    parts.

IE2. The SortTables browser can perform quickly enough on
moderately large collections (100K-1M items). IE3. Use of the
SortTables browser will lead to fewer false trails than when using
the xxx or yyy systems. IE4. SortTables users will tend not to give
up as quickly on tasks, compared to use of systems xxx or yyy. IE5.
Users can better demonstrate a sense of range boundaries and
attribute values using the SortTables browser. IE6. Of all items
shown, a higher percentage of potentially relevant items are shown
when using the SortTables browser.

Preference

IE7. Some users prefer interactive to “batch” processing.

Perception

IE8. The SortTables browser provides a steady sense of satisfaction
and progress.

  • Its interaction steps are of similar power.
  • But this sense of satisfaction/progress may not translate
    into better searches.

IE9. The SortTables browser provides a better sense of closure when
the task is complete.

External Effects

These are the side effects from using the system. They do not show
up directly in the output, but may show up in learning or future
performance. This potential future payoff may not show up in the
quality of the output.

EE1. People learn more by browsing.

EE2. People learn more about the search process when browsing,
making future searching faster.

EE3. People using SortTables better retain learned information across sessions.

EE4. People using SortTables better retain their ability to use the system across
sessions.

EE5. People using a browser learn more about their
subject when given a broad problem.

  • The incidental exposure to related information may pay off in
    future exploration of a related problem.