Skip to content
Snippets Groups Projects
Commit b186c6b4 authored by kurzum's avatar kurzum
Browse files

pom migration

parent fadeb023
No related branches found
No related tags found
No related merge requests found
Showing
with 320 additions and 0 deletions
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<groupId>generic</groupId>
<artifactId>group-metadata</artifactId>
<version>2020.02.01</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<groupId>generic</groupId>
<artifactId>wikipedia-links</artifactId>
<packaging>jar</packaging>
</project>
# wikipedia-links dataset
help needed
# Geo-coordinates extracted with mappings
Contains geographic coordinates from the Wikipedia Infoboxes refined by the mapping-based extraction.
The dataset contains all triples extracted with the help of the [Geocoordinates Mappings](http://mappings.dbpedia.org/index.php/Template:GeocoordinatesMapping). Whereas [generic geo coordinates datasets](https://databus.dbpedia.org/dbpedia/generic/geo-coordinates) spot any geocoordinate in an infobox without contextualizing it, the mappings allow to describe which kind of location the coordinates are describing. This can be the coordinates of the actual location of the resource itself
<http://dbpedia.org/resource/Atlantic_Ocean> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> .
<http://dbpedia.org/resource/Atlantic_Ocean> <http://www.w3.org/2003/01/geo/wgs84_pos#lat> "0.0"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Atlantic_Ocean> <http://www.w3.org/2003/01/geo/wgs84_pos#long> "-25.0"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Atlantic_Ocean> <http://www.georss.org/georss/point> "0.0 -25.0" .
but also coordinates of locations associated with the resource (e.g. the resting place of Alfred Nobel)
<http://dbpedia.org/resource/Alfred_Nobel> <http://dbpedia.org/ontology/restingPlacePosition> <http://dbpedia.org/resource/Alfred_Nobel__restingPlacePosition__1> .
<http://dbpedia.org/resource/Alfred_Nobel__restingPlacePosition__1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> .
<http://dbpedia.org/resource/Alfred_Nobel__restingPlacePosition__1> <http://www.w3.org/2003/01/geo/wgs84_pos#lat> "59.356811111111114"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Alfred_Nobel__restingPlacePosition__1> <http://www.w3.org/2003/01/geo/wgs84_pos#long> "18.01928611111111"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Alfred_Nobel__restingPlacePosition__1> <http://www.georss.org/georss/point> "59.356811111111114 18.01928611111111" .
You can have a look at the mappings used for [Alfred Nobel (Person)](http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_person) and [Atlantic Ocean (body of water)](http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_body_of_water) .
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>group-metadata</artifactId>
<groupId>mappings</groupId>
<version>2020.02.01</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>geo-coordinates-mappingbased</artifactId>
<groupId>mappings</groupId>
<packaging>jar</packaging>
<properties>
<databus.codeReference>https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/GeoCoordinatesMapping.scala</databus.codeReference>
</properties>
</project>
# DBpedia Ontology instance types
Classification of instances with the DBpedia Ontology. Contains triples of the form `<$resource> rdf:type <$dbpedia_ontology_class>` generated by the mappings extraction.
## Most specific vs. transitive files
The dataset contains a file with just the types as classified with the mapping extractor. These types are normally the most specific class, they originate directly from the `map to class` template on [mappings.dbpedia.org](http://mappings.dbpedia.org). In addition, a file with the `_transitive` tag is generated by the [DIEF Transitive Closure Class](https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/util/TransitiveClosure.scala) on release, containing the pre-calculated forward-inferences (transitive closure), i.e. with all superclasses. If your application/store does not support reasoning, additionally use the `_transitive` file.
## Entities with double underscore
In addition to the `map to class` types, entities ending with `__[number]` are not duplicates but conform to the knowledge graph modelling choices and are generated by the [DIEF Node class](https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/Node.scala). Most of them are instances of `dbo:CareerStation`(see [examples](https://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+%3Fs+%7B+%3Fs+a+dbo%3ACareerStation+%7D+limit+100&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on&run=+Run+Query+)), which represent time periods, therefore provide statements that hold true only within a determined time span. In the case of [Ada Lovelace](http://dbpedia.org/resource/Ada_Lovelace), potential other metadata can be added to the person function "Countess of Lovelace" instead of adding predicates directly to the entity dbr:Ada_Lovelace. (see [Duplicate resource names with underscore](https://forum.dbpedia.org/t/duplicate-resource-names-with-underscore/104))
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>group-metadata</artifactId>
<groupId>mappings</groupId>
<version>2020.02.01</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>instance-types</artifactId>
<groupId>mappings</groupId>
<packaging>jar</packaging>
</project>
# Literals extracted with mappings
High-quality literal (datatyped) properties (numeric data and text) refined by the mappings extraction.
Contains strings and typed literal values (dates, numbers, currency, etc.) extracted from infoboxes refined by community-written mappings that help the parser to unify extracted values. The extracted triples are based on the parts of the mappings making use of [datatype properties](http://mappings.dbpedia.org/index.php/Template:DatatypeProperty) from the DBpedia ontology. Therefore values across different languages are comparable since they use the same property identifiers and units of measurement are normalized to standard units (e.g. inches to meters).
The `rdfs:range` of the datatype property defines the datatype expected as outcome of the mapped value. As a consequence, the appropriate [data parser](https://github.com/dbpedia/extraction-framework/tree/master/core/src/main/scala/org/dbpedia/extraction/dataparser) will be picked to extract the data from the infobox entry text and try its best to extract this type of value (e.g. a date for [dbo:birthDate](http://mappings.dbpedia.org/index.php/OntologyProperty:BirthDate)) or nothing. For datatype properties with (convertible) units of measurement the [unit parser](https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/dataparser/UnitValueParser.scala) will check if one of the known units for the unit dimension (see e.g. [units for Length](https://github.com/dbpedia/extraction-framework/blob/68e95bcab1d859d47690cc0c1536eaace7b01d3b/core/src/main/scala/org/dbpedia/extraction/ontology/OntologyDatatypes.scala#L371)) matches and will convert it into the standard/base unit for the dimension. Moreover, the parser will try to interpret quantity modifiers (e.g. Mio, millions), however they need to be defined and maintained by the community for every language invidually in a [config file](https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/config/dataparser/ParserUtilsConfig.scala). Available unit dimsenions and their standard units, units and their abbreviations can be found in [code](https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/ontology/OntologyDatatypes.scala).
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>group-metadata</artifactId>
<groupId>mappings</groupId>
<version>2020.02.01</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>mappingbased-literals</artifactId>
<groupId>mappings</groupId>
<packaging>jar</packaging>
</project>
# (Uncleaned) Object properties extracted with mappings
Uncleand High quality statements with IRI object values extracted by the mappings extraction from Wikipedia Infoboxes.
Offers complementary statements (Entity-to-Entity relations) from Wikipedia Infoboxes to [mappingbased-literals](https://databus.dbpedia.org/dbpedia/${project.groupId}/mappingbased-literals/${project.version})
NOTE: There also is a [cleaned version](https://databus.dbpedia.org/dbpedia/${project.groupId}/mappingbased-objects/${project.version}) of this dataset available:
Uncleaned means that two post-processing steps are *not* performed on this dataset:
* type consistency check, i.e. type of object matches the range of property
* redirecting of objects, i.e. http://dbpedia.org/resource/Billl_Clinton to Bill Clinton
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>group-metadata</artifactId>
<groupId>mappings</groupId>
<version>2020.02.01</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>mappingbased-objects-uncleaned</artifactId>
<groupId>mappings</groupId>
<packaging>jar</packaging>
</project>
# Cleaned object properties extracted with mappings
Cleaned version of high quality statements with IRI object values extracted by the mappings extraction from Wikipedia Infoboxes.
Statements are based on input from [mappingbased-objects-uncleaned](https://databus.dbpedia.org/dbpedia/${project.groupId}/mappingbased-objects-uncleaned/${project.version}) after applying **post processing steps**:
1. Canonicalization of all object values replacing them by their (transitive) redirects, i.e. `http://dbpedia.org/resource/Barack_Obama_Jr` will be replaced by `http://dbpedia.org/resource/Barack_Obama` . The `_transitive` file of the corresponding language chapter from [redirects dataset](https://databus.dbpedia.org/dbpedia/generic/redirects/${project.version}) will be used to resolve the transitive redirects. See [code](https://github.com/dbpedia/extraction-framework/blob/master/scripts/src/main/scala/org/dbpedia/extraction/scripts/MapObjectUris.scala) for more details.
2. Type consistency filtering: extracted `rdf:type` statements from [instance-types](https://databus.dbpedia.org/dbpedia/${project.groupId}/instance-types/${project.version}) are used to check domain and range according to the definition of the properties in the [DBpedia ontology](https://databus.dbpedia.org/dbpedia/ontology/dbo-snapshots). Statements with predicate *p* for which the subject resource is from a different type than specified in `rdfs:domain` of *p* are passed to `_disjointDomain` files, whereas statements with an object resource disjoint from `rdfs:range` will be passed `_disjointRange` files. Statements where the types match or are subtypes of the expected ones are passed to the regular dataset files (without content variant). See [code](https://github.com/dbpedia/extraction-framework/blob/master/scripts/src/main/scala/org/dbpedia/extraction/scripts/TypeConsistencyCheck.scala) for more details. We keep the `disjoint*` files since they can contain also false positives due to incomplete type information (e.g. no infobox exists for a specific resource or infobox class mapping is incomplete). If you union all 3 files the results is same as applying only step 1.
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>group-metadata</artifactId>
<groupId>mappings</groupId>
<version>2020.02.01</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>mappingbased-objects</artifactId>
<groupId>mappings</groupId>
<packaging>jar</packaging>
</project>
<?xml version="1.0" encoding="UTF-8"?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.dbpedia.databus</groupId>
<artifactId>super-pom</artifactId>
<version>1.3-SNAPSHOT</version>
</parent>
<groupId>mappings</groupId>
<artifactId>group-metadata</artifactId>
<version>2020.02.01</version>
<packaging>pom</packaging>
<modules>
<module>mappingbased-literals</module>
<module>specific-mappingbased-properties</module>
<module>geo-coordinates-mappingbased</module>
<module>instance-types</module>
<module>mappingbased-objects</module>
<module>mappingbased-objects-uncleaned</module>
</modules>
<properties>
<databus.documentation>## Attribution fulfilled by
* (when deriving another dataset and releasing to the Databus) adding the Databus link to the provenance https://databus.dbpedia.org/dbpedia/${project.groupId}/${project.artifactId}/${project.artifactId}/${project.version}
* on your website:
* include the DBpedia logo and mention the usage of DBpedia with this link: https://databus.dbpedia.org/dbpedia
* include backlinks from your pages to the individual entities, e.g. http://dbpedia.org/resource/Berlin
* in academic publications cite: DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia, J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, and C. Bizer. Semantic Web Journal 6 (2): 167--195 (2015)
## How to contribute
DBpedia is a community project, help us with:
* editing the mappings at http://mappings.dbpedia.org
* improve this documentation at https://github.com/dbpedia/databus-maven-plugin/tree/master/dbpedia/mappings/${project.artifactId}/${project.artifactId}.md
* help with the software relevant for extraction:
** https://github.com/dbpedia/extraction-framework/tree/master/core/src/main/scala/org/dbpedia/extraction/mappings
** in particular https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/InfoboxMappingsExtractor.scala
## Debug
Parselogs are currently kept here: http://downloads.dbpedia.org/temporary/parselogs/
## Origin
This dataset was extracted using the wikipedia-dumps available on https://dumps.wikimedia.org/
using the DBpedia Extraction-Framework available at https://github.com/dbpedia/extraction-framework
For more technical information on how these datasets were generated, please visit http://dev.dbpedia.org
## Changelog
* 2019.10.16
* fixed encoding issue
* fixed https://github.com/dbpedia/extraction-framework/issues/595 where a nullpointer caused some instance extractions to crash
* 2018.09.12
* were created as new modular releases, some issues remain
* we used rapper 2.0.14 to parse and `LC_ALL=C sort` to sort and ascii2uni -a U to unescape unicdoe xcharacters
* parsing removed 250k triples total, debugging pending
* object-uncleaned was not transformed into objects-cleaned and is missing
* link to Wikimedia dump version is missing
* 2016.10.01
* was taken from the previous BIG DBpedia releases under http://downloads.dbpedia.org/2016-10/ and included for completeness</databus.documentation>
<databus.license>http://purl.oclc.org/NET/rdflicense/cc-by3.0</databus.license>
<databus.codeReference>https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/MappingExtractor.scala</databus.codeReference>
<databus.issueTracker>https://github.com/dbpedia/extraction-framework/issues</databus.issueTracker>
<databus.documentationLocation>https://github.com/dbpedia/databus-maven-plugin/blob/master/dbpedia/${project.groupId}/${project.artifactId}</databus.documentationLocation>
<databus.downloadUrlPath>https://downloads.dbpedia.org/repo/lts/${project.groupId}/${project.artifactId}/${project.version}/</databus.downloadUrlPath>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<databus.packageDirectory>/media/bigone/25TB/www/downloads.dbpedia.org/repo/lts/${project.groupId}/${project.artifactId}</databus.packageDirectory>
<databus.tryVersionAsIssuedDate>true</databus.tryVersionAsIssuedDate>
<databus.publisher>https://webid.dbpedia.org/webid.ttl#this</databus.publisher>
<databus.feedbackChannel>https://forum.dbpedia.org/c/databus-dbpedia/mappings</databus.feedbackChannel>
<!-- used for derive plugin -->
<databus.deriveversion>2019.09.01</databus.deriveversion>
</properties>
<repositories>
<repository>
<id>archiva.internal</id>
<name>Internal Release Repository</name>
<url>http://databus.dbpedia.org:8081/repository/internal</url>
</repository>
<repository>
<snapshots>
<updatePolicy>always</updatePolicy>
</snapshots>
<id>archiva.snapshots</id>
<name>Internal Snapshot Repository</name>
<url>http://databus.dbpedia.org:8081/repository/snapshots</url>
</repository>
</repositories>
<build>
<plugins>
<plugin>
<groupId>org.dbpedia.databus</groupId>
<artifactId>databus-derive-maven-plugin</artifactId>
<version>1.0-SNAPSHOT</version>
<executions>
<execution>
<id>DeriveFromMarvin</id>
<goals>
<goal>clone</goal>
</goals>
</execution>
</executions>
<configuration>
<skipParsing>false</skipParsing>
<skipCloning>false</skipCloning>
<versions>
<version>https://databus.dbpedia.org/marvin/mappings/geo-coordinates-mappingbased/${databus.deriveversion}</version>
<version>https://databus.dbpedia.org/marvin/mappings/instance-types/${databus.deriveversion}</version>
<version>https://databus.dbpedia.org/marvin/mappings/mappingbased-literals/${databus.deriveversion}</version>
<version>https://databus.dbpedia.org/marvin/mappings/mappingbased-objects-uncleaned/${databus.deriveversion}</version>
<version>https://databus.dbpedia.org/marvin/mappings/mappingbased-objects/${databus.deriveversion}</version>
<version>https://databus.dbpedia.org/marvin/mappings/specific-mappingbased-properties/${databus.deriveversion}</version>
</versions>
</configuration>
</plugin>
</plugins>
<extensions>
<extension>
<groupId>org.apache.maven.wagon</groupId>
<artifactId>wagon-webdav-jackrabbit</artifactId>
<version>3.0.0</version>
</extension>
</extensions>
</build>
<profiles>
<profile>
<id>webdav</id>
<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>wagon-maven-plugin</artifactId>
<version>2.0.0</version>
<executions>
<execution>
<id>upload-databus</id>
<phase>install</phase>
<goals>
<goal>upload</goal>
</goals>
<configuration>
<fromDir>${project.build.directory}/databus/repo/${project.groupId}/${project.artifactId}</fromDir>
<url>dav:https://downloads.dbpedia.org/repo/</url>
<toDir>dbpedia/${project.groupId}/${project.artifactId}</toDir>
<serverId>downloads-dbpedia-org</serverId>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>group-metadata</artifactId>
<groupId>mappings</groupId>
<version>2020.02.01</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>specific-mappingbased-properties</artifactId>
<groupId>mappings</groupId>
<packaging>jar</packaging>
</project>
# Numeric Literals converted to designated units with class-specific property mappings
Infobox numerical data from the mappings extraction using units of measurement more convenient for the resource class/type.
The triples in [mappingbased-literals](https://databus.dbpedia.org/dbpedia/${project.groupId}/mappingbased-literals) use normalized values according to the base unit for the property (see [docu](https://databus.dbpedia.org/dbpedia/${project.groupId}/mappingbased-literals/${project.version}) for more details). However, this dataset contains triples where the values are converted to a specific unit of measurement more convenient for the resource class (e.g. square kilometres instead of square metres for the area of a city or runtime of a movie in minutes instead of seconds). To distinguish between values from [mappingbased-literals](https://databus.dbpedia.org/dbpedia/${project.groupId}/mappingbased-literals) which are normalized to base units, specific properties use the namespace of the form `http://dbpedia.org/ontology/$className/$propertyName`. The target conversion unit can be defined via [SpecificProperty mapping](http://mappings.dbpedia.org/index.php/Template:SpecificProperty) in the corresponding class (see e.g. [Work mapping](http://mappings.dbpedia.org/index.php/OntologyClass:Work)) and can also be retrieved via `rdfs:range` of the specific property (see e.g. [runtime specific property](http://dbpedia.org/ontology/Work/runtime)) from the [DBpedia Ontology](https://databus.dbpedia.org/dbpedia/ontology/dbo-snapshots). Moreover, the datatype IRI of the typed literal also denotes the converted unit via [DBpedia Datatypes](http://mappings.dbpedia.org/index.php?title=Special:AllPages&namespace=206),
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment