fix: Filter replication key items for use_fake_since_parameter#379
fix: Filter replication key items for use_fake_since_parameter#379cbrammer wants to merge 5 commits intoMeltanoLabs:mainfrom
use_fake_since_parameter#379Conversation
tap_github/client.py
Outdated
| # save the context from the requests so it can be available to the parse_response method | ||
| self.context = context |
There was a problem hiding this comment.
AFAIK this is not necessary. The stream class already has a context attribute.
There was a problem hiding this comment.
Unfortunately, I don't think it is available 😞 on the core RESTStream class. That is why all other method signatures include it to be passed in. For some reason it was excluded from this method.
use_fake_since_parameteruse_fake_since_parameter
Co-authored-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Co-authored-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
Sadly, the GitHub API returns NUL values in some fields. This function recursively replaces them with empty strings. Otherwise postgres will raise an error when inserting the data.
|
|
I added on to this PR since it was pending and the changes were in the same spot. It looks like GiHub is returning NUL (\x00) values in some responses! I originally had it only on the specific stream that was causing issues, but I figured if it happens once... so I pushed it to the client layer. This does add overhead to see if there is a nul value and replace it, but it is better than dealing with bad data |



This tap uses a custom
use_fake_since_parameterfor items API's in GitHub that don't have asinceparameter. This effects all items that useuse_fake_since_parameter. This filters out any returned items that are before thestart_date.An example is PullRequestStream. It would get an entire page of 100 items, and 99 of them could be past the
sincedate. But all 100 were then spinning up child streams to get comments/commits etc. That was causing a huge extra usage of the API request limit. This filters those out.