2014 | OriginalPaper | Buchkapitel
Automatic Extraction of Headlines from Punjabi Newspapers
verfasst von : Vishal Gupta
Erschienen in: Applied Algorithms
Verlag: Springer International Publishing
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
For any language in the world, headlines of newspapers are always important and by reading headlines we can have idea of whole news without completely reading the news articles. Moreover there are many websites whose task is to extract the news headlines from online newspapers and display those headlines on their websites for information to their users. One other important application of headlines extraction is in text summarization where headline-sentences are given more importance than other sentences for including in final summary. This paper concentrates on automatic headlines extraction from Punjabi newspapers. Punjabi is the official language for state of Punjab. But Punjabi is under resource language. There are very less number of computational-linguistic resources available for Punjabi. But a lot of research is going on for developing NLP applications in Punjabi language. It is first time that automatic headlines extraction from Punjabi newspapers has been developed with four features of headlines: 1) Punctuation mark feature 2) Font feature 3) Number of words feature and 4) Title keywords feature. Weights of these four features are calculated by applying mathematical regression as machine learning approach. For extracting headlines, final scores of sentences are obtained using feature weight equation as:
w
1
f
1
+
w
2
f
2
+
w
3
f
3
+
w
4
f
4
where
f
1
,
f
2
,
f
3
and
f
4
are feature-scores of four features and
w
1
,
w
2
,
w
3
and
w
4
are learned weights of these features. The accuracy of Punjabi headline extraction system is 98.39% which is tested over fifty Punjabi single/multi news documents. A part of Punjabi headlines extraction system with Punctuation mark feature has been integrated with Punjabi Text Summarization system which is available online.