Merging structured text using temporal knowledge

https://doi.org/10.1016/S0169-023X(02)00019-8Get rights and content

Abstract

Structured text is a general concept that is implicit in a variety of approaches in handling information. Syntactically, an item of structured text is a number of grammatically simple phrases together with a semantic label for each phrase. Items of structured text may be nested within larger items of structured text. Much information is potentially available as structured text including tagged text in XML, text in relational and object-oriented databases, and the output from information extraction systems in the form of instantiated templates. In a previous paper, we presented a framework for merging items of potentially inconsistent structured text [A. Hunter, Data & Knowledge Engineering 34 (2000) 305]. In this paper, we extend that framework by considering the need to use temporal knowledge. The primary aim of this paper is to specify the language and types of axioms required in this temporal knowledge.

Introduction

Syntactically, an item of structured text is a data structure containing a number of grammatically simple phrases together with a semantic label for each phrase. The set of semantic labels in a structured text is meant to parameterize a stereotypical situation, and so a particular item of structured text is an instance of that stereotypical situation. Using appropriate semantic labels, we can regard a structured text as an abstraction of an item of text.

For example, news reports on corporate acquisitions can be represented as items of structured text using semantic labels including buyer, seller, acquisition, value, and date. Each semantic label provides semantic information, and so an item of structured text is intended to have some semantic coherence. Each phrase in structured text is very simple – such as a proper noun, a date, or a number with unit of measure, or a word or phrase from a prescribed lexicon. For an application, the prescribed lexicon delineates the types of states, actions, and attributes, that could be conveyed by the items of structured text. An example of structured text is given in Fig. 1.

Much material is potentially available as structured text. This includes items of text structured using XML tags, and the output from information extraction systems given in templates (see for example [3], [17], [28]). The notion of structured text also overlaps with semi-structured data (for reviews see [1], [11]).

Whilst structured text is useful as a resource, there is a need to develop techniques to handle, analyse, and reason with it. In particular, we are interested in merging potentially inconsistent sets of news reports [37], [43], and deriving inferences from potentially inconsistent sets of news reports [7], [38], [39], [41].

In this paper, we want to consider the role of temporal knowledge in facilitating merging of structured news reports. To illustrate some of our needs, consider a set of news reports, in the following situations.

  • The set of reports refer to the same time point of a subject. As an example, consider a report from a newspaper and a report from a radio station on the weather in London today. So the weather in London is the subject, and the time point is today. However, the TV report and radio reports have not necessarily been made at the same time. So for example, the newspaper report could have been published three days ago, whereas the radio report could have been broadcast today.

  • The set of reports refer to the same time intervals1 of a subject. As an example, consider a report from a TV station on the weather in London today and a report from a radio station on the weather in London today. So the weather in London is the subject, and the subject time interval is today. As with the example above, the TV report and radio reports have not necessarily been broadcast at the same time.

  • The set of reports are broadcast at the same time point. As an example, consider a radio broadcast today with a long-range weather report for weather in Europe over the next month, and a TV broadcast today with a weather report for London today. Here, the granularity of the periods of the subjects of the reports is different.

  • The set of reports are broadcast during the same time interval. As an example, consider a radio broadcast today with a weather report for weather in Europe over the next month, and a TV broadcast yesterday with a long-range weather report for weather in Europe over the next week. Here, the granularity of the periods of the subjects of the reports is different.


To handle these, and a number of related issues, we need to consider how we can enhance our approach to reasoning with the temporal aspects of news reports.

In this paper, we consider a merging system for handling a collection of reports, such as weather reports, together with merging axioms (which specify how information should be combined) and domain knowledge. We assume that this collection of reports may be heterogeneous in structure. Furthermore, we assume the following:

  • Each report incorporates an explicit time point at which the report was logged. We call this the log time. For example, a report in the 15/5/00 issue of the Financial Times would have 15/5/00 as the log time. As another example, a report in the December 2000 issue of the London Review of Books would have December 2000 as the log time.

  • Each report can incorporate an explicit time point or time interval delineating the period of the event being covered by the report. We call this the event time. For example, a report on Microsoft's financial performance over the period from 1/1/00 to 31/3/00 would have the event time of 1/1/00–31/3/00. As another example, a report in the stock market may refer to the price of a stock at 14:00 h, and so 14.00 h would be the event time.


We represent each news report as a set of monadic and binary predicates. We have assumed the words and phrases are sufficiently simple and restricted to not require natural language processing. We will represent each word or phrase by a constant symbol in the logic, and each semantic label as a relation symbol. A set of merging axioms and domain knowledge represented in classical logic is then used to derive a set of merged predicates. To do this we need to
  • 1.

    Reduce all the information in the news reports to common granularities of time. For this, we present information in pointbased time, and this may involve rewriting information that is in intervalbased time.

  • 2.

    Infer predicates that capture information merged from the predicates representing the individual news reports. For this, we need to conjoin information and to select the information from some news reports in preference to information from other reports if conflicts arise between them. To do this we will use various preference criteria including:

    • preferences over sources,

    • preference for more recently logged information,

    • preference for news that refers to a finer grained event time.

  • 3.

    Select the monadic and binary predicates from the inferences that capture the merged information.


Given these logical formulae, we show how we can construct a timeline, based on the natural numbers with the usual ordering, to which we can associate information from the news reports. This gives a set of linear discrete temporal models which correspond to tense or US temporal logic models. This correspondence with temporal logic provides various options for further temporal reasoning with the predicates.

To summarize the essence of our requirements, we want to reason with the different granularities of time points and time intervals, where there may be heterogeneous formats for the temporal information, and interchange between them where possible. To support this reasoning we will adapt and extend a formalization of time based on the natural numbers. We will present this formalization, and show how we can use it in a framework for merging, and reasoning with, structured news reports. We summarize the merging framework in Fig. 2.

Our logic-based approach differs from other logic-based approaches for handling inconsistent information such as belief revision theory (e.g. [21], [24], [44], [47]), knowledgebase merging (e.g. [9], [45]), and logical inference with inconsistent information (e.g. [4], [5], [10], [20], [51]). These proposals are too simplistic in certain respects for handling news reports. Each of them has one or more of the following weaknesses:

  • 1.

    One-dimensional preference ordering over sources of information – for news reports we require finer-grained preference orderings.

  • 2.

    Primacy of updates in belief revision – for news reports, newest reports are not necessary the best reports.

  • 3.

    Weak merging based on a meet operator – this causes unnecessary loss of information.


Furthermore, none of these proposals incorporate actions on inconsistency or context-dependent rules specifying the information that is to be incorporated in the merged information, nor do they offer a route for specifying how merged reports should be composed.

Other logic-based approaches to fusion of knowledge include the KRAFT system which uses constraints to check whether information from heterogeneous sources can be merged [31], [53]. If knowledge satisfies the constraints, then the knowledge can be used. Failure to satisfy a constraint can be viewed as an inconsistency, but there are no actions on inconsistency. Merging information is also an important topic in database systems. A number of proposals have been made for approaches based in schema integration (e.g. [54]) and conceptual modelling for information integration based on description logics [14], [15], [23]. These differ from our approach in that they do not seek an automated approach that uses domain knowledge for identifying and acting on inconsistencies. Heterogeneous and federated database systems also could be relevant in merging multiple news reports, but they do not identify and act on inconsistency in a context-sensitive way [18], [50], [59], though there is increasing interest in bringing domain knowledge into the process (e.g. [16], [60]).

Our approach also goes beyond other technologies for handling news reports. The approach of wrappers offers a practical way of defining how heterogeneous information can be merged (see for example [19], [32], [55]). However, there is little consideration of problems of conflicts arising between sources. Our approach therefore goes beyond these in terms of formalizing reasoning with inconsistent information and using this to analyse the nature of the news report and for formalizing how we can act on inconsistency.

As an overview of the rest of the paper, we will review a formalization of structured reports, and then present a new framework for representing and reasoning with structured reports using temporal knowledge. This framework is based on a first-order classical logic presentation of temporal reasoning. We present a temporal semantics for representing and reasoning with the merged information that is based on US temporal logic.

Section snippets

Formalizing structured news reports

In this section, we will formalize the notions of structured text. We will then explain how we can translate each item of structured text into a set of literals. 2.1 Structured text, 2.2 Representing structured reports as logical formulae are a review of [37], [39]. Then in Section 2.3, we consider the format for capturing temporal information in structured news reports using timestamps. We then introduce, in Section 2.4, timestamp equivalence axioms to relate different formats for temporal

Modelling the flow of time

In order to relate pieces of temporal information from a collection of structured reports, we assume a timeline for our reasoning. This is based on the natural numbers with the usual ordering. This gives us a discrete linear view on time.

Expansion and contraction inferences

There are many ways that we may describe a proposition in time. Consider the following descriptions taken from Shoham [58].

Downward-hereditary. P is downward-hereditary if whenever P holds over an interval, it holds over all of its subintervals. For example, the robot travelled less than two miles.

Upward-hereditary. P is upward-hereditary if whenever P holds for all proper subintervals of some non-point interval, it also holds over the non-point interval itself. For example, the robot travelled

Merging news reports

In this section, we enhance our framework for merging news reports (given in [37]) by incorporating temporal domain knowledge.

Conclusions

In this paper, we have presented a framework for representing and reasoning with temporal domain knowledge. We have shown how this can be used with structured text, and that the inferences obtained from the domain knowledge and structured text can be useful for reasoning with potentially inconsistent structured reports and for merging potentially inconsistent structured reports.

In our framework, we assume each structured report incorporates a log time and a time for the story of the report.

Anthony Hunter received a BSc (1984) from the University of Bristol and an MSc (1987) and PhD (1992) from Imperial College, London. He is currently a senior Lecturer in the Department of Computer Science at University College London. His main research interests are: Knowledge representation and reasoning for handling uncertainity incompleteness and inconsistency in information; default reasoning and argumentation; applications in decision-support and in technologies and argumentation;

References (60)

  • C. Bettini et al.

    Time Granularities in Databases, Data Mining, and Temporal Reasoning

    (2000)
  • C. Baral et al.

    Combining knowledgebases of consisting of first-order theories

    Computational Intelligence

    (1992)
  • G. Brewka

    Preferred subtheories: an extended logical framework for default reasoning

  • P. Buneman, Semistructured data, in: Proceedings of the ACM Symposium on Principles of Database Systems,...
  • J. Burgess

    Axioms for tense logic 1. since and until

    Notre Dame Journal of Formal Logic

    (1982)
  • B. Carpenter

    The Logic of Typed-feature Structures

    (1992)
  • D. Calvanese et al.

    Description logic framework for information integration

  • D. Calvanese et al.

    Source integration in data warehousing

  • L. Cholvy

    Reasoning with data provided by federated databases

    Journal of Intelligent Information Systems

    (1998)
  • J. Cowie et al.

    Information extraction

    Communications of the ACM

    (1996)
  • L. Cholvy et al.

    Merging databases: problems and examples

    International Journal of Intelligent Systems

    (2001)
  • W. Cohen, A web-based information system that reasons with structured collections of text, in: Proceedings of...
  • C. Cayrol et al.

    Management of preferences in assumption based reasoning

  • (1998)
  • M. Finger et al.

    Adding a temporal dimension to a logical system

    Journal of Logic Language and Information

    (1992)
  • E. Franconi, U. Sattler, A data warehouse conceptual data model for multidimensional aggregation, in: S. Gatziu, M....
  • P. Gardenfors

    Knowledge in Flux

    (1988)
  • D. Gabbay et al.

    Temporal Logic: Mathematical Foundations and Computational Aspects

    (1994)
  • I. Goralwalla et al.

    Temporal granularity: completing the puzzle

    Journal of Intelligent Information Systems

    (2001)
  • R. Goldblatt

    Logics of Time and Computation

    (1987)
  • Cited by (0)

    Anthony Hunter received a BSc (1984) from the University of Bristol and an MSc (1987) and PhD (1992) from Imperial College, London. He is currently a senior Lecturer in the Department of Computer Science at University College London. His main research interests are: Knowledge representation and reasoning for handling uncertainity incompleteness and inconsistency in information; default reasoning and argumentation; applications in decision-support and in technologies and argumentation; application in decision-support and in technologies for understanding and reasoning with information in natural language.

    View full text