Skip to main content

2014 | OriginalPaper | Buchkapitel

Automatic Extraction of Headlines from Punjabi Newspapers

verfasst von : Vishal Gupta

Erschienen in: Applied Algorithms

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

For any language in the world, headlines of newspapers are always important and by reading headlines we can have idea of whole news without completely reading the news articles. Moreover there are many websites whose task is to extract the news headlines from online newspapers and display those headlines on their websites for information to their users. One other important application of headlines extraction is in text summarization where headline-sentences are given more importance than other sentences for including in final summary. This paper concentrates on automatic headlines extraction from Punjabi newspapers. Punjabi is the official language for state of Punjab. But Punjabi is under resource language. There are very less number of computational-linguistic resources available for Punjabi. But a lot of research is going on for developing NLP applications in Punjabi language. It is first time that automatic headlines extraction from Punjabi newspapers has been developed with four features of headlines: 1) Punctuation mark feature 2) Font feature 3) Number of words feature and 4) Title keywords feature. Weights of these four features are calculated by applying mathematical regression as machine learning approach. For extracting headlines, final scores of sentences are obtained using feature weight equation as:

w

1

f

1

 + 

w

2

f

2

 + 

w

3

f

3

 + 

w

4

f

4

where

f

1

,

f

2

,

f

3

and

f

4

are feature-scores of four features and

w

1

,

w

2

,

w

3

and

w

4

are learned weights of these features. The accuracy of Punjabi headline extraction system is 98.39% which is tested over fifty Punjabi single/multi news documents. A part of Punjabi headlines extraction system with Punctuation mark feature has been integrated with Punjabi Text Summarization system which is available online.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Metadaten
Titel
Automatic Extraction of Headlines from Punjabi Newspapers
verfasst von
Vishal Gupta
Copyright-Jahr
2014
Verlag
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-319-04126-1_20