A probabilistic topic modeling approach for event detection in social media
Social media services, such as Twitter, have become a prominent source of information for event detection and monitoring applications as they provide access to massive volume of dynamic user content. Previous studies have focused on detecting a variety of events from Twitter feeds, including natural disasters such as earthquakes and hurricanes and entertainment events, such as sporting events and music festivals. A key challenge to event detection from Twitter is identifying user posts, or tweets, that are relevant to the monitored event. Current approaches can be grouped into three categories---keyword filtering, supervised classification, and topic modeling. Keyword filtering is the simplest approach but it tends to produce a high false positive rate. Supervised classification approaches apply generic classifiers, such as support vector machine (SVM), to determine if a tweet is related to the event of interest. Their performance depends on the quality of features used to represent the data. Topic modeling approaches such as latent Dirichlet allocation (LDA) can automatically infer the latent topics within the tweets. However, due to the unsupervised nature of the algorithm, they are not as effective as supervised learning approaches. The approach developed in this thesis combines probabilistic topic modeling with supervised classification to leverage the advantages from each approach. This supervised topic modeling approach, called subtopicLDA, utilizes label information to help guide the topic model to select topics that best fit the label information. The model is evaluated for its effectiveness in detecting foodborne illness related tweets.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
VanDam, Courtland
- Thesis Advisors
-
Tan, Pang Ning
- Committee Members
-
Chai, Joyce
Punch, William
- Date Published
-
2012
- Subjects
-
Twitter
Data mining
Information storage and retrieval systems
Probabilistic databases
Temporal databases
- Program of Study
-
Computer Science
- Degree Level
-
Masters
- Language
-
English
- Pages
- vi, 67 pages
- ISBN
-
9781267847294
1267847298
- Permalink
- https://doi.org/doi:10.25335/4ter-wq70