When looking at a GenBank entry there often is interesting information in the "source" feature, which actually contains several subfields. From instance below one can see the "host" information:
I end up having to wrap the gb_io.Record class with another class to access this information. Also I end up constantly having to replace newlines with spaces.
class GenBankRecord:
def __init__(self, rec):
"""Get a record instance from gb_io"""
self.rec = rec
@functools.cached_property
def source(self):
"""Parse the first feature which is always 'source'"""
return next(iter(self.rec.features))
@functools.cached_property
def fields(self):
"""Get all subfields of source."""
return self.source.qualifiers.to_dict()
@functools.cached_property
def host(self):
if 'host' in self.fields:
return self.fields['host'][0].replace('\n', ' ')
return ''
It would be nice to just do rec.host no?
When looking at a GenBank entry there often is interesting information in the "source" feature, which actually contains several subfields. From instance below one can see the "host" information:
I end up having to wrap the
gb_io.Recordclass with another class to access this information. Also I end up constantly having to replace newlines with spaces.It would be nice to just do
rec.hostno?