1 Introduction
2 Related Work
2.1 Exemplar Detection
2.2 Document Summarization
2.2.1 Timeline Summarization
2.2.2 Comparative Summarization
2.2.3 Entity Summarization
3 Problem Definition
3.1 Input
3.2 Research Problem
3.3 Approach Overview
4 Event Importance Calculation
Original sentence | Importance |
---|---|
The tower of Karatsu Castle was built in 1966 | 0.015483 |
1949 also saw the opening of Fukushima University | 0.362794 |
January 17, 1995: Great Hanshin earthquake causes more than 100 casualties | 0.786335 |
During World War II, the July 19, 1945 Bombing of Okazaki killed over 200 people and destroyed most of the city center | 0.937206 |
5 History-Based Entity Categorization
5.1 Event Representation
5.2 Entity Similarity Calculation
5.3 Optimization Model for Exemplar Detection
6 Comparative Timeline Generation
6.1 Mutually Reinforced Random Walk
6.2 Post-processing
7 Experiments
7.1 Datasets
Dataset | Wikipedia category | # Entities | Time range |
---|---|---|---|
D1 | Japanese Cities | 532 | 40—2016 |
D2 | Chinese Cities | 357 | 12—2016 |
D3 | UK Cities | 68 | 1—2016 |
D4 | American Scientists | 141 | 0—103 |
D5 | French Scientists | 41 | 0—101 |
D6 | Japanese PMs (pre WW2) | 32 | 0—98 |
D7 | Japanese PMs (post WW2) | 30 | 0—93 |
7.2 Analyzed Methods
7.3 Experiment Settings
Dataset | D1 | D2 | D3 | D4 | D5 | D6 | D7 |
---|---|---|---|---|---|---|---|
Number of categories | 3 | 2 | 2 | 3 | 2 | 2 | 2 |
7.4 Evaluation Criteria
7.4.1 Evaluation Criteria for Created Categories and Exemplars
7.4.2 Evaluation Criteria for Summarized Timelines
- Saliency which measures how sound and important each extracted event is.
- Comprehensibility which measures how easily the output words can be associated with real events.
- Diversity which measures how diverse the events in the summary are (both semantically and temporally).
7.5 Evaluation Results
7.5.1 Evaluation Results for Created Groups and Exemplars
Model | Data | IntraSim | InterSim | Ratio | AveImp |
---|---|---|---|---|---|
K-Means | Cities | 0.888 | 0.780 | 1.139 | 0.697 |
Persons | 0.535 | 0.462 | 1.190 | 0.722 | |
AP | Cities | 0.884 | 0.770 | 1.148 | 0.732 |
Persons | 0.619 | 0.476 | 1.330 | 0.779 | |
MMR | Cities | 0.820 | 0.563 | 1.490 | 0.898 |
Persons | 0.572 | 0.344 | 2.084 | 0.879 | |
DFP | Cities | 0.874 | 0.809 | 1.080 | 0.912 |
Persons | 0.770 | 0.738 | 1.044 | 0.879 | |
OM | Cities | 0.859 | 0.543 | 1.620 | 0.939 |
Persons | 0.758 | 0.478 | 2.152 | 0.914 |
7.5.2 Evaluation Results for Summarized Timelines
Model | Data | Saliency | Comprehensibility | Diversity |
---|---|---|---|---|
LexRank | Cities | 3.457 | 2.714 | 3.571 |
Persons | 4.089 | 2.333 | 4 | |
LSA | Cities | 3.243 | 2.571 | 3.714 |
Persons | 3.822 | 2.667 | 3.444 | |
KLSUM | Cities | 3.543 | 2.429 | 3.143 |
Persons | 3.522 | 2.667 | 3.556 | |
MEAD | Cities | 3.429 | 2.571 | 2.571 |
Persons | 3.878 | 2.556 | 3.333 | |
MRRW | Cities | 3.686 | 3.143 | 4.286 |
Persons | 4.2 | 2.556 | 4.556 |
7.6 Example Summary
Event | Terms | |
---|---|---|
1 | District | Matsubara, village, district, amami, area part, incorporated, city, tannan, prefectures |
Civil unrest | Occurred, end, widely violent, strike protest, opposed matsukawa, incident, demonstration delayed | |
Transportation | Sapporo, route, completed, megumino, built meter, main, linking, highway, bypass building | |
Natural disasters | People killed, earthquake, suffered, damage wake, light left, tsunami, mikawa, february, dead, typhoon | |
Urbanization | Renamed, irino, neighborhood, hall, split respectively, mura, elevated, status new, incorporated | |
Militarization | Japanese, industry, navy, nagoya military, imperial center area, works, warehouse, training, support | |
Media | Continued, waraji television, spring largescale, included firebombing, expo, broadcasts, bombing, nhk | |
Sports | Shizuoka, held, sport, park, national garden university, pacific, international, high, competition | |
Clans | Clan, shimazu, province, local, vassal takada samurai ruled, powerful, perished, lord, unified | |
Autonomy | Increased, core, autonomy, system prefectural, government city, establishment designation, structure, place | |
2 | Meiji restoration | Abolition, period kuroda dazaifu, uetsu, reppan part meiji, joined, edo, dispossessed, daimyo |
Battles | Navy, japanese, satsuma, royal, refusal punish previous, pay, indemnity, compensation, charles | |
Wars | Japanese, navy, imperial, base, air togos, russojapanese role, orient nickname, nelson naval, military | |
Festivals | Festival, first, took, snow, place maple, lantern, held, chrysanthemum cherry blossom, castle, hirosaki | |
Construction | Warehouse, stone, torn, stonework, reconstructed original form, dutch, date, constructed, builder | |
Transportation | Railway, development, increase, via, scale sagami rapid, railroad, rail, connected, led | |
Universities | University, taught, matsue lafcadio learn, author, hirosaki, established | |
Commerce | Much, fire, consumes, area, replanned maritime, ginza, commerce, canal, accommodate, city | |
Natural disasters | Little earthquake, volcano, throughout, spread, relatively outages, numerous, morioka, hit, extensive | |
Shoguns | Daimyo tokugawa, shogun shigeharu, sakamoto rule, position, newly, metsuke, income | |
3 | Wars | War, zenkunen, yoriyoshi, takenori, reinforced dewa, defeated, abe, province, minamoto |
Government | City, suggests, reliable, publicly, point notices, legal, issued, governing, council | |
Battles | Summer, battle, ground burned, osaka, sakai | |
Commerce | Wealthiest, residents, population, people, living enterprise, earned, commercial, almost, japan | |
Christianity | Went, outlawed, hiding escape, christianity, capture | |
Merge | Isawa, city, village, modern, merger maesawa, koromogawa, established, district, town | |
Christianity | Xavier, prosperity, priests, including, francis documented, christian, sengoku, period, visited | |
Missionary | Stand, sent, sendai reach, portugal, padre new, missionary, many, jesuit hour, diogo | |
Trade | Yamato trade, using, richest, muromachi, mouth location, inland connect, became, foreign, sengoku | |
Business and power | Weaken toyotomi, system stronghold, seized, reportedly power, nobunaga, move, merchant, central, business |
8 Discussions
- When calculating the similarity of the histories of two entities by dynamic time warping, we assign higher weights to events closer to each other than to events separated by longer time gaps. This is based on the intuition that correspondences between events being far away from each other are less meaningful and such events should not fall into same cluster. As a result, entities are deemed to have similar history if (a) the events in their histories are semantically similar, and (b) these events are close in the timeline. Furthermore, in the future, we will try to utilize the information of absolute time-stamps of events.
- In this work, the event similarity is based on the cosine similarity between event vectors, where the event vector is the TF-IDF weighted combination of the vectors of terms contained in the event. Here, we assume that word ordering does not affect event similarity, following some previous models on sentence similarity computation such as WMD (Word Mover’s Distance) [18] and SIF (Smooth Inverse Frequency) [1]. However, we note that word ordering can be an important factor when computing historical event similarity.
- To prepare historical events for a given entity, we capture all sentences containing dates in the History section of the Wikipedia article corresponding to the entity. We detect the temporal expressions by using spaCy5 tool. We adopt this extraction method motivated by the previous work [3, 8]. However, when preprocessing datasets, more refined methods for associating time with sentences could be applied (e.g., [16]).
- We would like to emphasize that the proposed task is a novel kind of historical knowledge generation and organization. Our work makes the first endeavor to propose a optimization formulation for the history-based exemplar detection task. This could offer interesting insights to historians, especially, as professionals could provide more complete data as an input. Furthermore, based on the history-based entity grouping we propose, a history of any given entity could be now seen not independently but rather in relation to the typical history of an underlying latent group it belongs to.
- The method can be extended such that latent groups can be detected for different time periods (e.g., histories of cities during the Renaissance or histories of famous persons during their early careers). Different input time periods will usually result in different discovered latent groups.
- As a prototype is composed of sentences extracted from diverse entities, naturally, coherence of a generated history can be an issue. Currently, we abstract from the extracted sentences by constructing timeline events through representative terms. In the future, abstractive summarization methods could be used.
- A related issue is that prototype events that may contradict each other, though no such cases were observed in the experiments.
- Currently, the exemplars are selected on the basis of the similarities of their histories to histories of other entities. However, other attributes could be also considered in the process of exemplar selection—for instance, popularity or familiarity among users (e.g., while Dazaifu may be a good exemplar for its latent group, Kyoto which belongs to the same group is more known and recognized by potential users). Hence, entity popularity or importance could serve as an additional component for the exemplar selection (i.e., used as an additional constraint in OM).