steam250

Review backfilling

Published: February 3rd 2020, 11:17:29 pm

PreviousNext
steam250 main image

Review backfilling has begun and we're starting with the biggest games on Steam. Backfilling means downloading all the reviews for a particular game so we can chart its entire performance history. Here's the top five biggest games on Steam based purely on number of reviews, and the date on which we backfilled it.

1. Counter-Strike: Global Offensive: 3,797,560
2. PLAYERUNKNOWN'S BATTLEGROUNDS: 1,276,396 on 2020-02-03 at 18:45
3. Dota 2: 1,198,645 on 2020-02-02 at 21:22
4. Grand Theft Auto V: 689,532 on 2020-02-01 at 21:54
5. Team Fortress 2: 627,351 on 2020-02-02 at 23:41

I wish backfilling was a simple task but it is fraught with technical challenges, most of which stem from working around the myriad of bugs in the Steam site itself. We break down reviews for large games like these into chunks, called partitions, but Steam sometimes misreports the number of reviews in a partition, so we chase after a review that does not exist.

Then there's the black holes. Certain date ranges are permanently inaccessible no matter how many times we try to access them. Here's one (it won't load. That's the whole point.) We try to cut around the black holes as close as possible to get all the reviews around it, but this is a long and tedious (automated) process.

In the table of games above, you might have noticed CS: GO does not have a backfill date yet. We currently resolve all the partitions for a game up-front before beginning the review import, but it just so happens CS: GO has a black hole right on a partition boundary, so we can't even start the import using this strategy! One goal for February is to adapt our strategy to successfully backfill CS: GO, even though it is 3x larger than any game we've backfilled so far (and the largest game on Steam).

The main goal for February, however, will be to adapt the backfilling algorithm to import all games. Currently we just import one at a time, which is fine when we need to carefully monitor its behaviour on larger games to ensure everything goes smoothly, but we also want to import all the medium and smaller games in the most efficient manner possible, which requires further work. Once we succeed in this, we will have all the data we need to build the new site!

By the way, the image for this article shows the backfilling algorithm in action. Each line is an update on the status of each partition. It shows the number of currently active partitions (importing), the total number partitions and the percentage completion of each active partition next to its partition identifier. It's pretty nerdy stuff, but if you'd like to know more I'll happily tell you about it!

See you next time~
Bilge