Collaborative information seeking (CIS) is a field of research that involves studying situations, motivations, and methods for people working in collaborative groups for information seeking projects, as well as building systems for supporting such activities. Such projects often involve information searching or information retrieval (IR), information gathering, and information sharing. Beyond that, CIS can extend to collaborative information synthesis and collaborative sense-making.
Background
Seeking for information is often considered a solo activity, but there are many situations that call for people working together for information seeking. Such situations are typically complex in nature, and involve working through several sessions exploring, evaluating, and gathering relevant information. Take for example, a couple going on a trip. They have the same goal, and in order to accomplish their goal, they need to seek out several kinds of information, including flights, hotels, and sightseeing. This may involve them working together over multiple sessions, exploring and collecting useful information, and collectively making decisions that help them move toward their common goal.
It is a common knowledge that collaboration is either necessary or highly desired in many activities that are complex or difficult to deal with for an individual. Despite its natural appeal and situational necessity, collaboration in information seeking is an understudied domain. The nature of the available information and its role in our lives have changed significantly, but the methods and tools that are used to access and share that information in collaboration have remained largely unaltered. People still use general-purpose systems such as email and IM for doing CIS projects, and there is a lack of specialized tools and techniques to support CIS explicitly.
There are also several models to explain information seeking and information behavior, but the areas of collaborative information seeking and collaborative information behavior remain understudied. On the theory side, Shah has presented C5 Model for studying collaborative situations, including information seeking. On the practical side, a few specialized systems for supporting CIS have emerged in the recent past, but their usage and evaluations have underwhelmed. Despite such limitations, the field of CIS has been getting a lot of attention lately, and several promising theories and tools have come forth. Multiple reviews of CIS related literature are written by Shah. Shah's book provides a comprehensive review of this field, including theories, models, systems, evaluation, and future research directions. Other books in this area include one by Morris and Teevan, as well as Foster's book on collaborative information behavior. and Hansen, Shah, and Klas's edited book on CIS.
Theories
Depending upon what one includes or excludes while talking about CIS, we have many or hardly any theories. If we consider the past work on the groupware systems, many interesting insights can be obtained about people working on collaborative projects, the issues they face, and the guidelines for system designers. One of the notable works is by Grudin, who laid out eight design principles for developers of groupware systems.
The discussion below is primarily based on some of the recent works in the field of computer supported cooperative work CSCW, collaborative IR, and CIS.
Definitions and terminology
The literature is filled with works that use terms such as collaborative information retrieval, social searching, concurrent search, collaborative exploratory search, co-browsing, collaborative information behavior, collaborative information synthesis, and collaborative information seeking, which are often used interchangeably.
There are several definitions of such related or similar terms in the literature. For instance, Foster defined collaborative IR as "the study of the systems and practices that enable individuals to collaborate during the seeking, searching, and retrieval of information." Shah defined CIS as a process of collaboratively seeking information that is "defined explicitly among the participants, interactive, and mutually beneficial." While there is still a lack of a definition or a terminology that is universally accepted, but most agree that CIS is an active process, as opposed to collaborative filtering, where a system connects the users based on their passive involvement (e.g., buying similar products on Amazon).
Models of collaboration
Foley and Smeaton defined two key aspects of collaborative information seeking as division of labor and the sharing of knowledge. Division of labor allows collaborating searchers to tackle larger problems by reducing the duplication of effort (e.g., finding documents that one's collaborator has already discovered). The sharing of knowledge allows searchers to influence each other's activities as they interact with the retrieval system in pursuit of their (often evolving) information need. This influence can occur in real time if the collaborative search system supports it, or it can occur in a turn-taking, asynchronous manner if that is how interaction is structured.
Teevan et al. characterized two classes of collaboration, task-based vs. trait-based. Task-based collaboration corresponds to intentional collaboration; trait-based collaboration facilitates the sharing of knowledge through inferred similarity of information need.
Situations, motivations, and methods
One of the important issues to study in CIS is the instance, reason, and the methods behind a collaboration. For instance, Morris, using a survey with 204 knowledge workers at a large technology company found that people often like and want to collaborate, but they do not find specialized tools to help them in such endeavors. Some of the situations for doing collaborative information seeking in this survey were travel planning, shopping, and literature search. Shah, similarly, using personal interviews, identified three main reasons why people collaborate.
- Requirement/setup. Sometimes a group of people are "forced" to collaborate. Example includes a merger between two companies.
- Division of labor. Working together may help the participants to distribute the workload. Example includes a group of students working on a class project.
- Diversity of skills. Often people get together because they could not individually possess the required set of skills. Example includes co-authorship, where different authors bring different set of skills to the table.
As far as the tools and/or methods for CIS are concerned, both Morris and Shah found that email is still the most used tool. Other popular methods are face-to-face meetings, IM, and phone or conference calls. In general, the choice of the method or tool for our respondents depended on their situation (co-located or remote), and objective (brainstorming or working on independent parts).
Space-time organization of CIS systems and methods
The classical way of organizing collaborative activities is based on two factors: location and time. Recently Hansen & Jarvelin and Golovchinsky, Pickens, & Back also classified approaches to collaborative IR using these two dimensions of space and time. See "Browsing is a Collaborative Process", where the authors depict various library activities on these two dimensions. As we can see from this figure, the majority of collaborative activities in conventional libraries are co-located and synchronous, whereas collaborative activities relating to digital libraries are more remote and synchronous. Social information filtering, or collaborative filtering, as we saw earlier, is a process benefitting from other users' actions in the past; thus, it falls under asynchronous and mostly remote domain. These days email also serves as a tool for doing asynchronous collaboration among users who are not co-located. Chat or IM (represented as 'internet' in the figure) helps to carry out synchronous and remote collaboration.
Rodden, similarly, presented a classification of CSCW systems using the form of interaction and the geographical nature of cooperative systems. Further, Rodden & Blair presented an important characteristic to all CSCW systems – control. According to the authors, two predominant control mechanisms have emerged within CSCW systems: speech act theory systems, and procedure based systems. These mechanisms are tightly coupled with the kind of control the system can support in a collaborative environment (discussed later).
Often researchers also talk about other dimensions, such as intentionality and depth of mediation (system mediated or user mediated), while classifying various CIS systems.
Control, communication, and awareness
Three components specific to group-work or collaboration that are highly predominant in the CIS or CSCW literature are control, communication, and awareness. In this section key definitions and related works for these components will be highlighted. Understanding their roles can also help us address various design issues with CIS systems.
Control
Rodden identified the value of control in CSCW systems and listed a number of projects with their corresponding schemes for implementing for control. For instance, the COSMOS project had a formal structure to represent control in the system. They used roles to represent people or automatons, and rules to represent the flow and processes. The roles of the people could be a supervisor, processor, or analyst. Rules could be a condition that a process needs to satisfy in order to start or finish. Due to such a structure seen in projects like COSMOS, Rodden classified these control systems as procedural based systems. The control penal was every effort to seeking people and control others in this method used for highly responsible people take control of another network system was supply chine managements or transformation into out connection processor information
Communication
This is one of the most critical components of any collaboration. In fact, Rodden (1991) identified message or communication systems as the class of systems in CSCW that is most mature and most widely used.
Since the focus here is on CIS systems that allow its participants to engage in an intentional and interactive collaboration, there must be a way for the participants to communicate with each other. What is interesting to note is that often, collaboration could begin by letting a group of users communicate with each other. For instance, Donath & Robertson presented a system that allows a user to know that others were currently viewing the same webpage and communicate with those people to initiate a possible collaboration or at least a co-browsing experience. Providing communication capabilities even in an environment that was not originally designed for carrying out collaboration is an interesting way of encouraging collaboration.
Awareness
Awareness, in the context of CSCW, has been defined as "an understanding of the activities of others, which provides a context for your own activity". The following four kinds of awareness are often discussed and addressed in the CSCW literature:
- Group awareness. This kind of awareness includes providing information to each group member about the status and activities of the other collaborators at a given time.
- Workspace awareness. This refers to a common workspace that the group has where they can bring and discuss their findings, and create a common product.
- Contextual awareness. This type of awareness relates to the application domain, rather than the users. Here, we want to identify what content is useful for the group, and what the goals are for the current project.
- Peripheral awareness. This relates to the kind of information that has resulted from personal and the group's collective history, and should be kept separate from what a participant is currently viewing or doing.
Shah and Marchionini studied awareness as provided by interface in collaborative information seeking. They found that one needs to provide "right" (not too little, not too much, and appropriate for the task at hand) kind of awareness to reduce the cost of coordination and maximize the benefits of collaboration.
Systems
A number of specialized systems have been developed back from the days of the groupware systems to today's Web 2.0 interfaces. A few such examples, in chronological order, are given below.
Ariadne
Twidale et al. developed Ariadne to support the collaborative learning of database browsing skills. In addition to enhancing the opportunities and effectiveness of the collaborative learning that already occurred, Ariadne was designed to provide the facilities that would allow collaborations to persist as people increasingly searched information remotely and had less opportunity for spontaneous face-to-face collaboration.
Ariadne was developed in the days when Telnet-based access to library catalogs was a common practice. Building on top of this command-line interface, Ariadne could capture the users’ input and the database’s output, and form them into a search history that consisted of a series of command-output pairs. Such a separation of capture and display allowed Ariadne to work with various forms of data capture methods.
To support complex browsing processes in collaboration, Ariadne presented a visualization of the search process. This visualization consisted of thumbnails of screens, looking like playing cards, which represented command-output pairs. Any such card can be expanded to reveal its details. The horizontal axis on Ariadne’s display represented time, and the vertical axis showed information on the semantics of the action it represented: the top row for the top level menus, the middle row for specifying a search, and the bottom row for looking at particular book details.
This visualization of the search process in Ariadne makes it possible to annotate, discuss with colleagues around the screen, and distribute to remote collaborators for asynchronous commenting easily and effectively. As we saw in the previous section, having access to one’s history as well as the history of one’s collaborators are very crucial to effective collaboration. Ariadne implements these requirements with the features that let one visualize, save, and share a search process. In fact, the authors found one of the advantages of search visualization was the ability to recap previous searching sessions easily in a multi-session exploratory searching.
SearchTogether
More recently, one of the collaborative information seeking tools that have caught a lot of attention is SearchTogether, developed by Morris and Horvitz. The design of this tool was motivated by a survey that the researchers did with 204 knowledge workers, in which they discovered the following.
- A majority of respondents wanted to collaborate while searching on the Web.
- The most common ways of collaborating in information seeking tasks are sending emails back and forth, using IM to exchange links and query terms, and using phone calls while looking at a Web browser.
- Some of the most popular Web searching tasks on which people like to collaborate are planning travels or social events, making expensive purchases, researching medical conditions, and looking for information related to a common project.
Based on the survey responses, and the current and desired practices for collaborative search, the authors of SearchTogether identified three key features for supporting people’s collaborative information behavior while searching on the Web: awareness, division of labor, and persistence. Let us look at how these three features are implemented.
SearchTogether instantiates awareness in several ways, one of which is per-user query histories. This is done by showing each group member’s screen name, his/her photo and queries in the “Query Awareness” region. The access to the query histories is immediate and interactive, as clicking on a query brings back the results of that query from when it was executed. The authors identified query awareness as a very important feature in collaborative searching, which allows group members to not only share their query terms, but also learn better query formulation techniques from one another.
Another component of SearchTogether that facilitates awareness is the display of page-specific metadata. This region includes several pieces of information about the displayed page, including group members who viewed the given page, and their comments and ratings. The authors claim that such visitation information can help one either choose to avoid a page already visited by someone in the group to reduce the duplication of efforts, or perhaps choose to visit such pages, as they provide a sign of promising leads as indicated by the presence of comments and/or ratings.
Division of labor in SearchTogether is implemented in three ways: (1) “Split Search” allows one to split the search results among all online group members in a round-robin fashion, (2) “Multi-Engine Search” takes a query and runs it on n different search engines, where n is the number of online group members, (3) manual division of labor can be facilitated using integrated IM.
Finally, the persistence feature in SearchTogether is instantiated by storing all the objects and actions, including IM conversations, query histories, recommendation queues, and page-specific metadata. Such data about all the group members are available to each member when he/she logs in. This allows one to easily carry a multi-session collaborative project.
Cerchiamo
Cerchiamo is a collaborative information seeking tool that explores issues related to algorithmic mediation of information seeking activities and how collaborators' roles can be used to structure the user interface. Cerchiamo introduced the notion of algorithmic mediation, that is, the ability of the system to collect input asynchronously from multiple collaborating searchers, and to use these multiple streams of input to affect the information that is being retrieved and displayed to the searchers.
Cerchiamo collected judgments of relevance from multiple collaborating searchers and used those judgments to create a ranked list of items that were potentially relevant to the information need. This algorithm prioritized items that were retrieved by multiple queries and that were retrieved by queries that also retrieved many other relevant documents. This rank fusion is just one way in which a search system that manages activities of multiple collaborating searchers can combine their inputs to generate results that are better than those produced by individuals working independently.
Cerchiamo implemented two roles—Prospector and Miner—that searchers could assume. Each role had an associated interface. The Prospector role/interface focused on running many queries and making a few judgments of relevance for each query to explore the information space. The Miner role/interface focused on making relevance judgments on a ranked list of items selected from items retrieved by all queries in the current session. This combination of roles allowed searchers to explore and exploit the information space, and led teams to discover more unique relevant documents than pairs of individuals working separately.
Coagmento
Coagmento (Latin for "working together") is a new and unique system that allows a group of people work together for their information seeking tasks without leaving their browsers. Coagmento has been developed with a client-server architecture, where the client is implemented as a Firefox plug-in that helps multiple people working in collaboration to communicate, and search, share and organize information. The server component stores and provides all the objects and actions collected from the client. Due to this decoupling, Coagmento provides a flexible architecture that allows its users to be co-located or remote, working synchronously or asynchronously, and use different platforms.
Coagmento includes a toolbar and a sidebar. The toolbar has several buttons that helps one collect information and be aware of the progress in a given collaboration. The toolbar has three major parts:
- Buttons for collecting information and making annotations. These buttons help one save or remove a webpage, make annotations on a webpage, and highlight and collect text snippets.
- Page-specific statistics. The middle portion of the toolbar shows various statistics, such as the number of views, annotations, and snippets, for the displayed page. A user can click on a given statistic and obtain more information. For instance, clicking on the number of snippets will bring up a window that shows all the snippets collected by the collaborators from the displayed page.
- Project-specific statistics. The last portion of the toolbar displays task/project name and various statistics, including number of pages visited and saved, about the current project. Clicking on that portion brings up the workspace where one can view all the collected objects (pages and snippets) brought in by the collaborators for that project.
The sidebar features a chat window, under which there are three tabs with the history of search engine queries, saved pages and snippets. With each of these objects, the user who created or collected that object is shown. Anyone in the group can access an object by clicking on it. For instance, one can click on a query issued by anyone in the group to re-run that query and bring up the results in the main browser window.
An Android (operating system) app for Coagmento can be found in the Android Market.
Cosme
Fernandez-Luna et al. introduce Cosme (COde Search MEeting) as a NetBeans IDE plug-in that enables remote team of software developers to collaborate in real time during source-code search sessions. The COSME design was motivated by early studies of C. Foley, M. R. Morris, C. Shah, among others researchers, and by habits of software developers identified in a survey of 117 universities students and professors related with projects of software development, as well as to computer programmers of some companies. The five more commons collaborative search habits (or related to it) of the interviewees was:
- Revision of problems by the team in the workstation of one of them.
- Suggest addresses of Web pages that they have already visited previously, digital books stored in some FTP, or source files of a version control system.
- Send emails with algorithms or explanatory text.
- Division of search tasks among each member of the team for sharing the final result.
- Store relevant information in individual workstation.
COSME is designed to enable either synchronous or asynchronous, but explicit remote collaboration among team developers with shared technical information needs. Its client user interface include a search panel that lets developers to specify queries, division of labor principle (possible combination include the use of different search engines, ranking fusion, and split algorithms), searching field (comments, source-code, class or methods declaration), and the collection type (source-code files or digital documentation). The sessions panel wraps the principal options to management the collaborative search sessions, which consists in a team of developers working together to satisfy their shared technical information needs. For example, a developer can use the embedded chat room to negotiate the creation of a collaborative search session, and show comments of the current and historical search results. The implementation of Cosme was based on CIRLab (Collaborative Information Retrieval Laboratory) instantiation, a groupware framework for CIS research and experimentation, Java as programming language, NetBeans IDE Platform as plug-in base, and Amenities (A MEthodology for aNalysis and desIgn of cooperaTIve systEmS) as software engineering methodology.
Open-source application frameworks and toolkits
CIS systems development is a complex task, which involves software technologies and Know-how in different areas such as distributed programming, information search and retrieval, collaboration among people, task coordination and many others according to the context. This situation is not ideal because it requires great programming efforts. Fortunately, some CIS application frameworks and toolkits are increasing their popularity since they have a high reusability impact for both developers and researchers, like Coagmento Collaboratory and DrakkarKeel.
Future research directions
Many interesting and important questions remain to be addressed in the field of CIS, including
- Why do people collaborate? Identifying their motivations can help us design better support for their specific needs.
- What additional tools are required to enhance existing methods of collaboration, given a specific domain?
- How to evaluate various aspects of collaborative information seeking, including system and user performance?
- How to measure the costs and benefits of collaboration?
- What are the information seeking situations in which collaboration is beneficial? When does it not pay off?
- How can we measure the performance of a collaborative group?
- How can we measure the contribution of an individual in a collaborative group?
- What sorts of retrieval algorithms can be used to combine input from multiple searchers?
- What kinds of algorithmic mediation can improve team performance?