Commit 0e871e69 authored by Martin Franke's avatar Martin Franke
Browse files

Update README.md

parent 74416af0
......@@ -4,13 +4,46 @@
PRIMAT is an open source (ALv2) toolbox for the definition and execution of PPRL workflows.
It offers several components for data owners and the central linkage unit that provide state-of-the-art PPRL methods,
including Bloom-filter-based encoding and hardening techniques, LSH-based blocking, metric space filtering,
post-processing and more.
It offers modules for data owners and the linkage unit that provide state-of-the-art PPRL methods,
including Bloom-filter-based encoding and hardening techniques, LSH-based blocking, post-processing (clustering) and more.
[PRIMAT](https://dl.acm.org/citation.cfm?doid=3352063.3360392) is developed by the [Database Group](https://dbs.uni-leipzig.de/research/projects/pper_big_data) of the University of Leipzig, Germany.
## Using PRIMAT
To use PRIMAT in your project, simply add the following dependency to your build tool
```xml
<dependency>
<groupId>de.uni-leipzig.dbs.pprl</groupId>
<artifactId>primat-data-owner</artifactId>
<version>1.0.1</version>
</dependency>
```
for data owner components, including pre-processing and encoding methods, or
```xml
<dependency>
<groupId>de.uni-leipzig.dbs.pprl</groupId>
<artifactId>primat-linkage-unit</artifactId>
<version>1.0.1</version>
</dependency>
```
for linkage unit components, including linkage and post-processing (clustering) methods.
## PRIMAT Modules
- `primat-common` - Contains shared data model and various utility function, e.g, for input file handling, hashing, feature extraction.
- `primat-data-owner` - Contains typical pre-processing functions as well as techniques to encode/mask records for PPRL.
- `primat-linkage-unit` - Provides functionalities for batch and incremental linkage workflows, including blocking, similarity calculation, classification, post-processing (clustering) and evaluation.
- `primat-examples` - Contains example workflows showing use cases for PRIMAT.
## Privacy-preserving Record Linkage
- Task of identifying record in different databases reffering to the same person
......@@ -25,7 +58,7 @@ post-processing and more.
- Scalability to millions of records
- High linkage quality
## PRIMAT
## PRIMAT: Overview
- PPRL tool covering the entire PPRL life-cycle
- Flexible definition and execution of PPRL workflows
......@@ -53,11 +86,11 @@ post-processing and more.
|Component/Module | Function/Feature | Status |
|-----------------|------------------|--------|
| Data generator & corruptor | - Data generation<br> - Data corruption | Implemented<br>Planned |
| Data generator & corruptor | - Data generation<br> - Data corruption | Integration outstanding<br>Planned |
| Data cleaning | - Split/merge/remove attributes<br>- Replace/remove unwanted values<br>- OCR transformation | Implemented<br>Implemented<br>Implemented |
| Encoding | - Bloom filter encoding & hardening<br>- Support of alternative encoding schemes| Implemented<br>Planned |
| Matching | - Standard & LSH-based blocking, Metric Space filtering<br>- Threshold-based classification<br>- Post-processing<br>- Multi-threaded execution<br>- Distributed matching<br>- Multi-Party support, match cluster management<br>- Incremental Matching | Implemented<br>Implemented<br>Implemented<br>Implemented<br>Integration outstanding<br>In development<br>In development |
| Evaluation | - Measure for assessing quality & scalability<br>- Masked match result visualization | Implemented<br>Integration outstanding |
| Encoding | - Bloom filter encoding & hardening<br>- Support of alternative encoding schemes| Implemented<br>Partially implemented |
| Matching | - Standard & LSH-based blocking <br>- Threshold-based classification<br>- Post-processing<br>- Multi-threaded execution<br>- Distributed matching<br>- Multi-Party support, match cluster management<br>- Incremental Matching | Implemented<br>Implemented<br>Implemented<br>Partially implemented<br>Integration outstanding<br>Implemented<br>Implemented |
| Evaluation | - Measures for assessing quality & scalability<br>- Masked match result visualization | Implemented<br>Integration outstanding |
### Requirements
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment