How our Data Pulling Process works (Part 1).

Part 1: Types of Pulls

 

Our supervisor process runs the following 3 types of pulls:

  • quick pull
  • long pull
  • verify

 

To start, we run a number of different updates for listings based on the modification timestamp field, this field has many different names across each board, but that's the generic term.

 

SO. Any time a listing gets updated, the date on the modification timestamp changes. We send requests, every 5 minutes, to see what listings need to be pulled based on the 'modification timestamp' of the most recently updated listing in our system.

 

SO. this pull that runs every ~5 minutes is called a quick pull.

 

The trick here is that the quick pull isn't always quick. If the quick pull takes longer than 5 minutes to run, we don't start our next query until the current pull finishes. so say it takes 30 minutes to pull, well there will be a 30 minute delay.

 

On top of that quick pull we also run something called a long pull. the long pull runs after 100 quick pulls finish running. That translates to roughly once every 8 hours on smaller boards, or once a day on larger boards.

 

The goal of the long pull is to pick up any listings that got dropped by the quick pulls.

 

In an ideal world nothing ever gets dropped, but we don't live in an ideal world. so, on top of 'long pulls' we also run something called a verify pull.

 

the verify pull confirms all the listings we have are listings we should have and removes listings we should no longer have. this runs at the same frequency as the long pulls.