Contact Management (DEX) is a server-based utility that, given a person’s email address and learned models of users’ home pages, crawls the Web to find contact information such as addresses, phone numbers, and expertise keywords.
- API Documentation
- Contact Management web UI
- Download Package
- Java Client Coding Example
- Licensing Info
- References
Overview
Contact Management (also known as DEX) finds contact information from the Web such as addresses, phone numbers, and expertise given a person’s name and email address. Contact Management results can be used to automatically fill in the contact details for a new contact or to determine whom to consult about a topic based on expertise information. Contact Management was developed by researchers at the University of Massachusetts.
Suppose we want to find the contact information and expertise for John Doe who has an email address, doe@sri.com. There are many Web pages for John Doe and finding the desired information is tedious. Contact Management will find the appropriate Web pages and extract the relevant information. Figure 1 is an example of the results returned from Contact Management for John Doe using the using the web user interface.
Figure 1. Contact Management Result PageContact Management uses a Web search service API to obtain the URL of candidate pages. It then retrieves each page corresponding to those URLs until it finds one that contains the contact information. Links contained within the contact information page are used to find the expertise information. Based on its learned models, Contact Management assigns confidence levels for the data extracted from the pages and then employs a probabilistic information-extraction model using conditional random fields to identify the contact information. Information Gain is used to determine a set of expertise-describing keywords. More detail on the algorithms used by Contact Management can be found in the references.
Contact Management finds information from Web pages that mention the person’s name or email address. The default search is a narrow search that gathers information only from Web pages associated with the host part of the email address. In contrast, a wide search will search the entire internet and may be less accurate. For doe@sri.com, a narrow search will search only the Web pages of sri.com.
Contact Management runs as a Java Web service. It can be accessed remotely either through a Web User Interface or programmatically through a client program by using DEX Web service APIs.
Prerequisites
Server
- Java 1.5 (or above)
- A Web Server runs on Java 1.5 or above (such as Resin or Tomcat)
- Hessian (http://hessian.caucho.com/)
Hessian is an open source, high performance, binary Web service protocol.
Implementations are available for many different platforms, including Java and .NET C#.
Client
- Hessian (http://hessian.caucho.com/) client for programmatic access
Limitations
- Contact Management works only if information exists on the Web for the input candidate.
- A search engine may return different pages at different times for the same query. As a result, Contact Management may return different results for the same query.