Configuration files for MARVIN on the TIB servers, public for forking the architecture
MARVIN is the release bot that does automated DBpedia releases each month on three different servers for generic, mappings, wikidata extraction.
The repository at https://git.informatik.uni-leipzig.de/dbpedia-assoc/marvin-config can be used to fork the architecture for creating extensions, developing new extractors or debugging old ones.
Fixes and patches will be manually deployed via `git pull` from the `master` branch of the [DBpedia Extraction Framework](https://github.com/dbpedia/extraction-framework/).
The architecture and workflow can also be forked and adapted to completely different extractions and derive operations outside of the DBpedia framework.
# Acknowledgements
We thank Sören Auer and the Technische Informationsbibliothek (TIB) for providing three servers to run:
...
...
@@ -9,20 +16,20 @@ We thank Sören Auer and the Technische Informationsbibliothek (TIB) for providi
* community-provided extractors on Wikipedia, Wikidata or other sources
* enrichment, cleaning and parsing services, so-called [Databus mods](https://github.com/dbpedia/databus-mods/) for open data on the Databus
This contribution by TIB is a great push towards incentivizing Open Data and establishing a global and national research and innovation data infrastructure.
This contribution by TIB to DBpedia & its community is a great push towards incentivizing Open Data and establishing a global and national research and innovation data infrastructure.
# Workflow
## Downloading the wikimedia dumps
TODO
## Running the extraction
## Update and Run the extraction
TODO
## Deploy on Databus
## Deploy MARVIN on Databus
TODO
## Run Databus-Derive (clone and parse)
## [Manual] Run Databus-Derive (clone and parse)
On the respective server there is a user marvin-fetch, that has access to `/data/derive` containing the pom.xml of https://github.com/dbpedia/databus-maven-plugin/tree/master/dbpedia
```
...
...
@@ -37,15 +44,12 @@ SELECT distinct (?derive) WHERE {
BIND (CONCAT("<version>",?artifact,"/${databus.deriveversion}</version>") as ?derive)
}
order by asc(?derive)
```
```
#######
# This is still manual, will be a cronjob soon
#######
su marvin-fetch
tmux a -t derive
WHAT=mappings
NEWVERSION=2019.08.30
# prepare
...
...
@@ -53,12 +57,24 @@ cd /data/derive/databus-maven-plugin/dbpedia/$WHAT
RELEASECOMMENTPREFIX="(MARVIN is the DBpedia bot, that runs the DBpedia Information Extraction Framework (DIEF) and releases the data as is, i.e. unparsed, unsorted, not redirected for debugging the software. After its releases, data is cleaned and persisted under the dbpedia account.)"