• Türkçe
    • English
  • English 
    • Türkçe
    • English
  • Login
View Item 
  •   DSpace@Muğla
  • Araştırma Çıktıları | TR-Dizin | WoS | Scopus | PubMed
  • WoS İndeksli Yayınlar Koleksiyonu
  • View Item
  •   DSpace@Muğla
  • Araştırma Çıktıları | TR-Dizin | WoS | Scopus | PubMed
  • WoS İndeksli Yayınlar Koleksiyonu
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

On constituent chunking for Turkish

Thumbnail

View/Open

Tam metin / Full text (486.9Kb)

Date

2018

Author

Aslan, Özkan
Günal, Serkan
Dinçer, Bekir Taner

Metadata

Show full item record

Abstract

Chunking is a task which divides a sentence into non-recursive structures. The primary aim is to specify chunk boundaries and classes. Although chunking generally refers to simple chunks, it is possible to customize the concept. A simple chunk is a small structure, such as a noun phrase, while constituent chunk is a structure that functions as a single unit in a sentence, such as a subject. For an agglutinative language with a rich morphology, constituent chunking is a significant problem in comparison to simple chunking. Most of Turkish studies on this issue use the IOB tagging schema to mark the boundaries. In this study, we proposed a new simpler tagging schema, namely OE, in constituent chunking for Turkish. "E" represents the rightmost token of a chunk, while "O" stands for all other items. In reference to OE, we also used a schema called OB, where "B" represents the leftmost token of a chunk. We aimed to identify both chunk boundaries and chunk classes using the conditional random fields (CRF) method. The initial motivation was to employ the fact that Turkish phrases are head-final for chunking. In this context, we assumed that marking the end of a chunk (OE) would be more advantageous than marking the beginning of a chunk (013). In support of the assumption, the test results reveal that OB has the worst performance and OE is significantly a more successful schema in many cases. Especially in long sentences, this contrast is more obvious. Indeed, using OE means simply marking the head of the phrase (chunk). Since the head and the distinctive label "E" are aligned, CRF finds the chunk class more easily by using the information contained in the head. OE also produced more successful results than the schemas available in the literature. In addition to comparing tagging schemas, we performed four analyses. Along with the examination of window size, which is a parameter of CRF, it is adequate to select and accept this value as 3. A comparison of the evaluation measures for chunking revealed that F-score was a more balanced measure in contrast to token accuracy and sentence accuracy. As a result of the feature analysis, syntactic features improves chunking performance significantly under all conditions. Yet when withdrawing these features, a pronounced difference between OB and OE is forthcoming. In addition, flexibility analysis shows that OE is more successful in different data.

Source

Information Processing & Management

Volume

54

Issue

6

URI

https://doi.org/10.1016/j.ipm.2018.05.004
https://hdl.handle.net/20.500.12809/1306

Collections

  • Bilgisayar Mühendisliği Bölümü Koleksiyonu [103]
  • Scopus İndeksli Yayınlar Koleksiyonu [6219]
  • WoS İndeksli Yayınlar Koleksiyonu [6466]



DSpace software copyright © 2002-2015  DuraSpace
Contact Us | Send Feedback
Theme by 
@mire NV
 

 




| Policy | Guide | Contact |

DSpace@Muğla

by OpenAIRE
Advanced Search

sherpa/romeo

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsTypeLanguageDepartmentCategoryPublisherAccess TypeInstitution AuthorThis CollectionBy Issue DateAuthorsTitlesSubjectsTypeLanguageDepartmentCategoryPublisherAccess TypeInstitution Author

My Account

LoginRegister

DSpace software copyright © 2002-2015  DuraSpace
Contact Us | Send Feedback
Theme by 
@mire NV
 

 


|| Policy || Guide|| Instruction || Library || Muğla Sıtkı Koçman University || OAI-PMH ||

Muğla Sıtkı Koçman University, Muğla, Turkey
If you find any errors in content, please contact:

Creative Commons License
Muğla Sıtkı Koçman University Institutional Repository is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 Unported License..

DSpace@Muğla:


DSpace 6.2

tarafından İdeal DSpace hizmetleri çerçevesinde özelleştirilerek kurulmuştur.