-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VIVO Harvester #384
VIVO Harvester #384
Conversation
Update develop branch
70d1f2f
to
063edbc
Compare
Just to explain really quickly why the build failed, we have enforced style guidelines. If you click the
Obviously you don't really have to worry about this for now, but just making sure you know. |
Oh, looks like you fixed it already. Nevermind then! |
Also, you can run this locally with |
Thanks, there is still some issues to correct, I'll have this in mind to check my changes before pushing. |
('journalTitle', '/journalTitle'), | ||
('abstract', ('/abstract', lambda x: x if x else '')), | ||
('issue', '/issue'), | ||
('publisher', '/publisher'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
publisher
can actually fit inside the SHARE schema, under the publisher
field. It has to be either an organization or a person.
I left some comments about things that may be able to move to top level fields, I hope that is helpful! Also, our schema is also on github (https://github.com/CenterForOpenScience/SHARE-schema), so if you have any suggestions for improving it, feel free to bring them up there |
?URI bibo:pmid ?PMID . | ||
}} | ||
OPTIONAL {{ | ||
?autorship a vivo:Authorship . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is autorship
the correct name, or should it be authorship
? I am not familiar with the API, so I am not sure if this is a typo or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is a typo.
Also, I am fabianvf on IRC, you can usually find me in the #cos channel on freenode. |
fabianvf VIVO has Academic Article Alexander Garcia Castro [9:47 PM] Alexander Garcia Castro [9:47 PM] |
- Publisher in the right schema field - Put out of ‘OtherProperties’ the fields that can be resolved as URIs, the only fields that I haven’t been able to change is the ISSN and the ISBN ‘cause I didn’t find any ISSN or ISBN resolver ¿do you know any? - Put in Keywords in the tags field
Optimizing and splitting queries to avoid the use of OPTIONAL statements in SPARQL.
Looks like you need to add a requirement to the |
Also, sorry, looks like I missed your comment from earlier. I think nearly all of those are interesting. I would start just by harvesting them all, but provide a mechanism to limit it later if we decide something is irrelevant. |
from SPARQLWrapper import SPARQLWrapper, JSON | ||
|
||
from scrapi import settings | ||
from scrapi.settings.sparql_mapping import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not to have import *
, could you at least namespace the commands (from scrapi.settings import sparql_mapping as mapping
or something like that)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, having looked at the settings file now, why not just import that variable?
from scrapi.settings.sparql_mapping import SPARQL_MAPPING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
self.sparql_wrapper.setQuery(query_str) | ||
results = self.sparql_wrapper.query() | ||
results = results.convert() | ||
for result in results['results']['bindings']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just
return [ result['uri']['value'] for result in results['results']['bindings'] ]
if you want, but this is really minor stylistic quibble, so feel free to ignore.
Looks pretty good to me. Most comments are related to making the functions a little less dense/noisy, but didn't see any real logical issues. Once you get the build passing I can take another look and see what the data looks like when I run it locally. If you have any questions or disagreements, feel free to voice them! I am on the #cos IRC channel on freenode, but if there is another way you would like to chat, just let me know and I can get in there. Really appreciate this PR, I am very excited to get it in. |
Removing patterns from the functions and ‘pythonizing’ some statements.
Adding the properties required by travis and the yaml file for the vivo harvester.
Adding the icon for the vivo harvester.
The authors completion need to make some HTTP requests so the right thing to do is include it in the harvest phase.
…ure/vivo-harvester
def schema(self): | ||
return { | ||
'title': ('/title', lambda x: x if x else ''), | ||
'providerUpdatedDateTime': ('/date', lambda x: datetime.strptime(x, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%S%Z") + '+00:00'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, just noticed this. We actually have a helper function (helpers.datetime_formatter
), which ensures that all datetimes we receive are processed into an identical format. could you use that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok :)
👍 |
Work in progress for the VIVO harvester.