Georgia Tech computing students create an investigative analysis with instantaneous question-response dataset
HQ Trivia is a gaming app that has become a daily tradition for a growing league of devotees across the United States. The accessibility of the app places the chance of winning a large monetary prize into the hands of millions of players, twice a day.
A recent analysis conducted by Georgia Tech computing students, that began as a project aiming to breakdown the game’s questions according to difficulty, has progressed to something much larger.
“I think we have the first instantaneous question-response database that maintains millions of responses,” said Justin Melnick, online master of science in analytics student from Georgia Tech, about the analysis that he and partner David Milmont, a data scientist at a fintech company and part of Georgia Tech’s Analytic MicroMasters program, have created.
Melnick and Milmont are the two minds driving the HQ Trivia research group, HQ Insiders.
Becoming the HQ Insiders
The duo first began breaking down data on HQ questions that was publicly available and manually inputting entries for analysis. They then discussed their findings via a subreddit in which they used the handle, HQ Insiders.
Not long after the team disclosed their HQ question analysis, they were contacted by the Washington Post for an HQ feature showcasing their findings. However, the Washington Post encouraged the team to gather more data before moving forward - something that would require the team to seek an alternative to manual data entry.
“We knew through YouTube that a lot of fans had been screen recording the game and we could go back and archive the games available,” explained Melnick. “We used Amazon Mechanical Turk to create a HTML for transcribing videos. Then, using heuristics and other data cleaning tools we ensured that the information logged was accurate.”
Letting the Data Speak for Itself
Thus far, the team has been able to gather data from 630,872,741 player responses, with 2,486 questions over 205 games.
The likes of a data set of this size, language variance, and instantaneous response has never been collected before and can reveal much more than what the HQ Insiders set out to find. And, with an incoming of requests from several outlets to discuss the data collected from different angles, the team now has to just decide what direction they wish to pursue first.
The CEO and founder of HQ, Rus Yusupov, was even impressed by the team’s analysis and messaged them with words of encouragement, while even going as far as reposting the story on his social media accounts.
One thing that is very unique compared to other analysis is the way in which the HQ Insiders are scraping the data. Milmont said, “We’re able to log quickly because of the automation we have developed using Python and SQL around data collection and cleaning.”
Being Prepared to Answer
Melnick is a student of CSE Associate Professor Polo Chau’s CSE-6242-OAN course, Data and Visual Analytics, which introduces students to techniques and tools for analyzing and visualizing data at a scale.
“Without that class we would not have had a clear direction on how to support the Washington Post with our data, our ideas going forward to assist other media outlets, and having a presence on the internet," said Melnick.
"It is very empowering, and we feel like what we know isn’t in a textbook, but knowledge through real-world trial and error. It is a very tough class, but when the opportunity knocks from the real world, we are better prepared to answer.