VIVO Harvester #384

alexgarciac · 2015-10-01T13:25:14Z

Work in progress for the VIVO harvester.

Update develop branch

into develop

fabianvf · 2015-10-01T13:41:06Z

Just to explain really quickly why the build failed, we have enforced style guidelines. If you click the Details button on the travis PR check, you will see the following output:

./scrapi/settings/local-dist.py:49:17: E126 continuation line over-indented for hanging indent

./scrapi/harvesters/vivo.py:5:68: W291 trailing whitespace

./scrapi/harvesters/vivo.py:15:1: F401 'parse' imported but unused

./scrapi/harvesters/vivo.py:18:1: F401 'HumanName' imported but unused

./scrapi/harvesters/vivo.py:22:1: F401 'requests' imported but unused

./scrapi/harvesters/vivo.py:26:1: F401 'compose' imported but unused

./scrapi/harvesters/vivo.py:29:1: W293 blank line contains whitespace

./scrapi/harvesters/vivo.py:30:1: E302 expected 2 blank lines, found 1

./scrapi/harvesters/vivo.py:46:1: E302 expected 2 blank lines, found 1

./scrapi/harvesters/vivo.py:52:1: E302 expected 2 blank lines, found 1

./scrapi/harvesters/vivo.py:69:73: W291 trailing whitespace

./scrapi/harvesters/vivo.py:71:26: W291 trailing whitespace

./scrapi/harvesters/vivo.py:74:122: W291 trailing whitespace

./scrapi/harvesters/vivo.py:84:74: W291 trailing whitespace

./scrapi/harvesters/vivo.py:92:97: W291 trailing whitespace

./scrapi/harvesters/vivo.py:95:27: W291 trailing whitespace

./scrapi/harvesters/vivo.py:97:59: W291 trailing whitespace

./scrapi/harvesters/vivo.py:113:40: W291 trailing whitespace

./scrapi/harvesters/vivo.py:117:40: W291 trailing whitespace

./scrapi/harvesters/vivo.py:121:40: W291 trailing whitespace

./scrapi/harvesters/vivo.py:130:39: W291 trailing whitespace

./scrapi/harvesters/vivo.py:148:38: W291 trailing whitespace

./scrapi/harvesters/vivo.py:157:122: W291 trailing whitespace

./scrapi/harvesters/vivo.py:171:124: E226 missing whitespace around arithmetic operator

./scrapi/harvesters/vivo.py:176:51: F821 undefined name 'process_sponsorships'

./scrapi/settings/local-dist.py:53:15: E126 continuation line over-indented for hanging indent

Obviously you don't really have to worry about this for now, but just making sure you know.

fabianvf · 2015-10-01T13:41:43Z

Oh, looks like you fixed it already. Nevermind then!

fabianvf · 2015-10-01T13:42:10Z

Also, you can run this locally with flake8 .

alexgarciac · 2015-10-01T13:43:14Z

Thanks, there is still some issues to correct, I'll have this in mind to check my changes before pushing.

fabianvf · 2015-10-01T13:56:55Z

scrapi/harvesters/vivo.py

+                ('journalTitle', '/journalTitle'),
+                ('abstract', ('/abstract', lambda x: x if x else '')),
+                ('issue', '/issue'),
+                ('publisher', '/publisher'),


publisher can actually fit inside the SHARE schema, under the publisher field. It has to be either an organization or a person.

fabianvf · 2015-10-01T14:13:12Z

I left some comments about things that may be able to move to top level fields, I hope that is helpful!

Also, our schema is also on github (https://github.com/CenterForOpenScience/SHARE-schema), so if you have any suggestions for improving it, feel free to bring them up there

fabianvf · 2015-10-01T16:33:39Z

scrapi/harvesters/vivo.py

+                                        ?URI bibo:pmid ?PMID .
+                                   }} 
+                            OPTIONAL {{
+                                        ?autorship a vivo:Authorship .


is autorship the correct name, or should it be authorship? I am not familiar with the API, so I am not sure if this is a typo or not.

Yes it is a typo.

fabianvf · 2015-10-01T19:14:35Z

Also, I am fabianvf on IRC, you can usually find me in the #cos channel on freenode.

alexgarciac · 2015-10-02T19:50:46Z

fabianvf VIVO has Academic Article
Article
Award or Honor
Blog Posting
Book
Case Study
Catalog
Chapter
Concept
Conference Paper
Conference Poster
Database
Edited Book
Editorial Article
Equipment
Extension Document
Human Study
Journal
Newsletter
News Release
Patent
Proceedings
Report
Research Proposal
Review
Series
Software
Thesis
Video
Webpage
Website

Alexander Garcia Castro [9:47 PM]
do u want to harvest everything ?

Alexander Garcia Castro [9:47 PM]
or do u want to limit the harvesting just to some specific types?

- Publisher in the right schema field - Put out of ‘OtherProperties’ the fields that can be resolved as URIs, the only fields that I haven’t been able to change is the ISSN and the ISBN ‘cause I didn’t find any ISSN or ISBN resolver ¿do you know any? - Put in Keywords in the tags field

Optimizing and splitting queries to avoid the use of OPTIONAL statements in SPARQL.

fabianvf · 2015-10-14T15:08:28Z

Looks like you need to add a requirement to the requirements.txt

fabianvf · 2015-10-14T15:10:50Z

Also, sorry, looks like I missed your comment from earlier. I think nearly all of those are interesting. I would start just by harvesting them all, but provide a mechanism to limit it later if we decide something is irrelevant.

fabianvf · 2015-10-14T15:12:56Z

scrapi/harvesters/vivo.py

+from SPARQLWrapper import SPARQLWrapper, JSON
+
+from scrapi import settings
+from scrapi.settings.sparql_mapping import *


I prefer not to have import *, could you at least namespace the commands (from scrapi.settings import sparql_mapping as mapping or something like that)?

Actually, having looked at the settings file now, why not just import that variable?

from scrapi.settings.sparql_mapping import SPARQL_MAPPING

fabianvf · 2015-10-14T15:23:45Z

scrapi/harvesters/vivo.py

+        self.sparql_wrapper.setQuery(query_str)
+        results = self.sparql_wrapper.query()
+        results = results.convert()
+        for result in results['results']['bindings']:


you can just

return [ result['uri']['value'] for result in results['results']['bindings'] ]

if you want, but this is really minor stylistic quibble, so feel free to ignore.

fabianvf · 2015-10-14T15:38:20Z

Looks pretty good to me. Most comments are related to making the functions a little less dense/noisy, but didn't see any real logical issues. Once you get the build passing I can take another look and see what the data looks like when I run it locally. If you have any questions or disagreements, feel free to voice them! I am on the #cos IRC channel on freenode, but if there is another way you would like to chat, just let me know and I can get in there. Really appreciate this PR, I am very excited to get it in.

Removing patterns from the functions and ‘pythonizing’ some statements.

Adding the properties required by travis and the yaml file for the vivo harvester.

Adding the icon for the vivo harvester.

The authors completion need to make some HTTP requests so the right thing to do is include it in the harvest phase.

…ure/vivo-harvester

fabianvf · 2015-10-16T17:26:08Z

scrapi/harvesters/vivo.py

+    def schema(self):
+        return {
+            'title': ('/title', lambda x: x if x else ''),
+            'providerUpdatedDateTime': ('/date', lambda x: datetime.strptime(x, "%Y-%m-%d").strftime("%Y-%m-%dT%H:%M:%S%Z") + '+00:00'),


Oh, just noticed this. We actually have a helper function (helpers.datetime_formatter), which ensures that all datetimes we receive are processed into an identical format. could you use that instead?

…ure/vivo-harvester

fabianvf · 2015-10-19T13:37:27Z

👍

VIVO Harvester

alexgarciac added 2 commits September 28, 2015 14:42

Merge pull request #1 from CenterForOpenScience/develop

af3cba9

Update develop branch

Merge branch 'develop' of https://github.com/CenterForOpenScience/scrapi

2f78c83

into develop

fabianvf changed the title ~~WIP~~ [WIP] VIVO Harvester Oct 1, 2015

Work in progress VIVO harvester

063edbc

alexgarciac force-pushed the feature/vivo-harvester branch from 70d1f2f to 063edbc Compare October 1, 2015 13:39

fabianvf reviewed Oct 1, 2015
View reviewed changes

fabianvf added the enhancement label Oct 12, 2015

Changing query strategy

589dd64

Optimizing and splitting queries to avoid the use of OPTIONAL statements in SPARQL.

fabianvf reviewed Oct 14, 2015
View reviewed changes

alexgarciac added 7 commits October 15, 2015 10:26

Style correction, adding new requirement

3c1c57a

Removing patterns from the functions and ‘pythonizing’ some statements.

Adding properties to travis-dist and vivo.yaml

82bc145

Adding the properties required by travis and the yaml file for the vivo harvester.

Adding vivo_favicon

6cec7c1

Adding the icon for the vivo harvester.

Change the authors processing to the harvest phase and vivo.yaml update

ffdff89

The authors completion need to make some HTTP requests so the right thing to do is include it in the harvest phase.

Updating travis-dist.py

ee923e8

Merge remote-tracking branch 'CenterForOpenScience/develop' into feat…

9adcfef

…ure/vivo-harvester

Better description for the harvester

2c81dd1

fabianvf reviewed Oct 16, 2015
View reviewed changes

alexgarciac added 2 commits October 18, 2015 05:10

Merge remote-tracking branch 'CenterForOpenScience/develop' into feat…

b4a23e2

…ure/vivo-harvester

Using datetime_formater

8ead6ff

fabianvf changed the title ~~[WIP] VIVO Harvester~~ VIVO Harvester Oct 19, 2015

fabianvf added a commit that referenced this pull request Oct 19, 2015

Merge pull request #384 from alexgarciac/feature/vivo-harvester

9423c0d

VIVO Harvester

fabianvf merged commit 9423c0d into CenterForOpenScience:develop Oct 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VIVO Harvester #384

VIVO Harvester #384

alexgarciac commented Oct 1, 2015

fabianvf commented Oct 1, 2015

fabianvf commented Oct 1, 2015

fabianvf commented Oct 1, 2015

alexgarciac commented Oct 1, 2015

fabianvf Oct 1, 2015

fabianvf commented Oct 1, 2015

fabianvf Oct 1, 2015

alexgarciac Oct 2, 2015

fabianvf commented Oct 1, 2015

alexgarciac commented Oct 2, 2015

fabianvf commented Oct 14, 2015

fabianvf commented Oct 14, 2015

fabianvf Oct 14, 2015

fabianvf Oct 14, 2015

alexgarciac Oct 14, 2015

fabianvf Oct 14, 2015

fabianvf commented Oct 14, 2015

fabianvf Oct 16, 2015

alexgarciac Oct 16, 2015

fabianvf commented Oct 19, 2015

VIVO Harvester #384

VIVO Harvester #384

Conversation

alexgarciac commented Oct 1, 2015

fabianvf commented Oct 1, 2015

fabianvf commented Oct 1, 2015

fabianvf commented Oct 1, 2015

alexgarciac commented Oct 1, 2015

Choose a reason for hiding this comment

fabianvf commented Oct 1, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabianvf commented Oct 1, 2015

alexgarciac commented Oct 2, 2015

fabianvf commented Oct 14, 2015

fabianvf commented Oct 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabianvf commented Oct 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabianvf commented Oct 19, 2015