Weighted word embeddings and clustering‐based identification of question topics in MOOC discussion forum posts

Aytuğ Onan; Mansur Alp Toçoğlu

doi:10.1002/cae.22252

journal article May 04, 2020

Weighted word embeddings and clustering‐based identification of question topics in MOOC discussion forum posts

Aytuğ Onan

Mansur Alp Toçoğlu

Computer Applications in Engineering Education Vol. 29 No. 4 pp. 675-689 · Wiley

View at Publisher Save 10.1002/cae.22252

Abstract

AbstractMassive open online courses (MOOCs) are recent and widely studied distance learning approaches aimed at providing learning material to learners from geographically dispersed locations without age, gender, or race‐related constraints. MOOCs generally enriched by discussion forums to provide interactions among students, professors, and teaching assistants. MOOC discussion forum posts provide feedback regarding the students' learning processes, social interactions, and concerns. The purpose of our research is to present a document‐clustering model on MOOC discussion forum posts based on weighted word embeddings and clustering to identify question topics on discussion posts. In this study, four word‐embedding schemes (namely, word2vec, fastText, global vectors, and Doc2vec), four weighting functions (i.e., term frequency‐inverse document frequency [IDF], IDF, smoothed IDF, and subsampling function), and four clustering algorithms (i.e., K‐means, K‐means++, self‐organizing maps, and divisive analysis clustering algorithm) for document clustering and topic modeling on MOOC discussion forum posts have been evaluated. Twenty different feature representations obtained from word‐embedding schemes and weighting functions have been obtained. The feature representation schemes have been evaluated in conjunction with four clustering methods. For the evaluation task, the empirical results for the latent Dirichlet allocation have been also included. The empirical results in terms of adjusted rand index, normalized mutual information, and adjusted mutual information indicate that weighted word‐embedding schemes combined with clustering algorithms outperform the conventional schemes.

Topics

No keywords indexed for this article. Browse by subject →

References

53

[1]

Adamopoulos P. (2013)

[2]

10.1007/978-1-4614-3223-4_4

[3]

10.1016/j.tele.2019.01.007

[4]

10.1007/978-1-84800-046-9_5

[5]

10.1109/ictai.2014.70

[6]

Arora S. (2017)

[7]

Arthur D (2007)

[8]

Bengio Y. "A neural probabilistic language model" J. Mach. Learn. Res. (2003)

[9]

10.1145/2133806.2133826

[10]

Blei D. M. "Latent Dirichlet allocation" J. Mach. Learn. Res. (2003)

[11]

Enriching Word Vectors with Subword Information

Piotr Bojanowski, Edouard Grave, Armand Joulin et al.

Transactions of the Association for Computational... 10.1162/tacl_a_00051

[12]

10.1002/cae.22059

[13]

10.1093/biostatistics/kxj007

[14]

10.1080/02607476.2018.1516350

[15]

10.1145/2723576.2723589

[16]

Glaab E (2011)

[17]

Gupta V. (2018)

[18]

Han J. (2006)

[19]

10.1016/j.patrec.2009.09.011

[20]

10.1016/j.nedt.2019.02.004

[21]

A.Joulinet al. FastText. zip:Compressing text classification models arXiv:1612.03651 2016.

[22]

10.1016/j.bushor.2016.03.008

[23]

10.1016/j.tele.2017.09.009

[24]

10.1007/978-3-642-56927-2

[25]

10.1016/j.compedu.2017.11.010

[26]

10.1177/0735633117753364

[27]

10.1002/cae.22068

[28]

10.3115/1118108.1118117

[29]

Mikolov T. (2019)

[30]

T.Mikolovet al. Efficient estimation of word representations in vector space 2013 arXiv:1301.3781.

[31]

10.1016/j.chb.2014.10.003

[32]

10.1109/access.2019.2945911

[33]

10.1002/cae.22179

[34]

10.1177/0165551516638784

[35]

Onan A. "LDA‐based topic modelling in text sentiment classification: An empirical analysis" Int. J. Comput. Linguistics Appl. (2016)

[36]

10.1016/j.eswa.2013.08.042

[37]

Glove: Global Vectors for Word Representation

Jeffrey Pennington, Richard Socher, Christopher Manning

Proceedings of the 2014 Conference on Empirical Me... 10.3115/v1/d14-1162

[38]

M. F.Porter.Snowball: A language for stemming algorithms 2001.http://snowball.tartarus.org/.

[39]

10.3115/v1/w14-1804

[40]

10.1016/j.eswa.2006.04.005

[41]

C. W.Schmidt.Improving a tf‐idf weighted document vector embedding 2019 arXiv:1902.09875.

[42]

10.1145/2330601.2330661

[43]

10.1177/0047239518797085

[44]

10.1002/asi.21001

[45]

10.1109/edunine.2019.8875845

[46]

10.1109/72.846731

[47]

Wang X. (2015)

[48]

10.1145/2883851.2883964

[49]

10.3390/info8030092

[50]

Wen M. (2014)

Showing 50 of 53 references

Metrics

43

Citations

53

References

Details

Published: May 04, 2020
Vol/Issue: 29(4)
Pages: 675-689
License: View

Authors

A

Aytuğ Onan

Department of Computer Engineering, Faculty of Engineering and Architecture İzmir Katip Çelebi University İzmir Turkey

M

Mansur Alp Toçoğlu

Department of Software Engineering, Faculty of Technology Manisa Celal Bayar University Manisa Turkey

Cite This Article

Aytuğ Onan, Mansur Alp Toçoğlu (2020). Weighted word embeddings and clustering‐based identification of question topics in MOOC discussion forum posts. Computer Applications in Engineering Education, 29(4), 675-689. https://doi.org/10.1002/cae.22252

Weighted word embeddings and clustering‐based identification of question topics in MOOC discussion forum posts

You May Also Like