Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
Hit-Girl
code
Commits
43d11c65
Commit
43d11c65
authored
Feb 28, 2022
by
Jerome Wuerf
Browse files
Improve readme
parent
9b0ca61c
Changes
6
Hide whitespace changes
Inline
Side-by-side
README.md
View file @
43d11c65
# Code
## Basic setup
This is the argument retrieval system of team hitgirl. Source code is located under
`./python/src`
.
### Create python env for dev tools.

## CLI
The CLI interface of our system has two sub commands:
`indexing`
and
`retrieval`
.
The entery to the runtime can be found in
`./docker/docker-compose.dev.yaml`
.
### Indexing
```
bash
usage: app.py indexing
[
-h
]
[
--elastic-host
ELASTIC_HOST]
[
--create
]
sentences_path embeddings_path
positional arguments:
sentences_path The file path to the csv file containing the
sentences.
embeddings_path The file path to the embeddings of the argument units.
optional arguments:
-h
,
--help
show this
help
message and
exit
--elastic-host
ELASTIC_HOST
The
hostname
of the server/docker container that runs
elastic search.
--create
If flag is present two new indices are created,
overriding existing ones.
```
### Retrieval
```
bash
usage: app.py retrieval
[
-h
]
[
--topic-nrb
TOPIC_NRB]
[
--nrb-conclusions-per-topic
NRB_CONCLUSIONS_PER_TOPIC]
[
--nrb-premises-per-conclusion
NRB_PREMISES_PER_CONCLUSION]
[
--min-length-factor
MIN_LENGTH_FACTOR]
[
--reranking
{
maximal-marginal-relevance,structural-distance,argument-rank,word-mover-distance
}]
[
--reuse-unranked
REUSE_UNRANKED]
[
--lambda-conclusions
LAMBDA_CONCLUSIONS]
[
--lambda-premises
LAMBDA_PREMISES]
[
--mu-conclusions
MU_CONCLUSIONS]
[
--mu-premises
MU_PREMISES]
[
--wait-for-es
]
run_name input_path output_path
positional arguments:
run_name The run name that will be included
in
the last column
of the trec file.
input_path The file path to the directory containing the input
files.
output_path The file path to the directory containing the output
files.
optional arguments:
-h
,
--help
show this
help
message and
exit
--topic-nrb
TOPIC_NRB
Restrict the current indexing and/or reranking to a
given topic number.
--nrb-conclusions-per-topic
NRB_CONCLUSIONS_PER_TOPIC
The number of conclusions that should be retrieved
from the index per topic.
--nrb-premises-per-conclusion
NRB_PREMISES_PER_CONCLUSION
The number of premises that should be retrieved from
the index per conclusion.
--min-length-factor
MIN_LENGTH_FACTOR
--reranking
{
maximal-marginal-relevance,structural-distance,argument-rank,word-mover-distance
}
--reuse-unranked
REUSE_UNRANKED
--lambda-conclusions
LAMBDA_CONCLUSIONS
--lambda-premises
LAMBDA_PREMISES
--mu-conclusions
MU_CONCLUSIONS
--mu-premises
MU_PREMISES
--wait-for-es
```
## Development setup
The setup is optimized for MSFT's VSCode.
### Create python env for dev tools
Dev tools will live in their own virtual env. This is better for osx systems.
...
...
@@ -13,27 +88,21 @@ $ pip install -r requirements.devtools.txt
$
pre-commit
install
```
### Enable linter and formatter in VScode
1.
Open settings
`.vscode/settings.dist.json`
2.
Change
`INSERT_USERNAME_HERE`
to your current uname
3.
Rename the file to
`settings.json`
### Debugging
1.
Right click on the docker-compose.dev.yaml
2.
Wait until container setup is ready (watch magic happening in the terminal)
Debugger is attached to the running containers via port
`5678`
. Look at
`.vscode/launch.json`
for
information.
1.
Right click on the
`./docker/docker-compose.dev.yaml`
2.
Wait until container is ready
3.
Open
`python/src/prototype/app.py`
4.
Set a Break Point
5.
Open debugger Window
6.
Execute Debugger
(click green paly button)
6.
Execute Debugger
Hope it helps see you soon!
### Data Set: processed Args.me
### Data Set
The provided dataset is a cluster fuck.
We have an csv with the following schema.
csv with the following schema.
`id, conclusion, premises, context, sentences`
...
...
@@ -60,11 +129,9 @@ We have an csv with the following schema.
-
`sourceTitle`
-
`sourceUrl`
-
`sentences`
is a string of a list of json objects
-
one json object in the list corresponds to one sentence
-
one json object in the list corresponds to one sentence
-
the last object is the conclusion
-
all objects preceeding are premisis
-
one object has the following keys
-
`sent_id`
-
`sent_text`
-
in the sentence attribute 692 conclusions are occurring word by word in one of the premises
\ No newline at end of file
docker/docker-compose.dev.yaml
View file @
43d11c65
...
...
@@ -19,28 +19,28 @@ services:
-
elastic
elastic
:
image
:
"
docker.elastic.co/elasticsearch/elasticsearch:7.15.2"
restart
:
always
networks
:
-
tira
ports
:
-
"
9200:9200"
-
"
9300:9300"
volumes
:
-
/mnt/data/elastic:/usr/share/elasticsearch/data
-
./conifg:/conifg
environment
:
-
discovery.type=single-node
-
logger.level=DEBUG
healthcheck
:
test
:
[
"
CMD"
,
"
curl"
,
"
-s"
,
"
-f"
,
"
http://localhost:9200/_cat/health"
]
#
elastic:
#
image: "docker.elastic.co/elasticsearch/elasticsearch:7.15.2"
#
restart: always
#
networks:
#
- tira
#
ports:
#
- "9200:9200"
#
- "9300:9300"
#
volumes:
#
- /mnt/data/elastic:/usr/share/elasticsearch/data
#
- ./conifg:/conifg
#
environment:
#
- discovery.type=single-node
#
- logger.level=DEBUG
#
healthcheck:
#
test:
#
[
#
"CMD",
#
"curl",
#
"-s",
#
"-f",
#
"http://localhost:9200/_cat/health"
#
]
networks
:
tira
:
null
python/src/app.py
View file @
43d11c65
...
...
@@ -10,6 +10,7 @@ import time
logging
.
basicConfig
(
format
=
'%(asctime)s - %(levelname)s - %(name)s - %(message)s'
,
level
=
'INFO'
)
logging
.
Logger
.
manager
.
loggerDict
[
"elastic_transport.transport"
].
disabled
=
True
class
App
:
def
__init__
(
self
,
configuration
:
Configuration
):
...
...
@@ -74,13 +75,17 @@ class App:
self
.
config
[
'LAMBDA_PREMISES'
]
)
elif
self
.
config
[
'RERANKING'
]
==
RerankingOptions
.
STRUCTURAL_DISTANCE
.
value
:
reranker
=
StructuralDistanceReranking
(
retrieved_results
,
self
.
config
[
'RUN_NAME'
],
topics
)
reranker
=
StructuralDistanceReranking
(
retrieved_results
,
self
.
config
[
'RUN_NAME'
],
topics
,
self
.
config
[
'MU_PREMISES'
],
self
.
config
[
'MU_CONCLUSIONS'
])
elif
self
.
config
[
'RERANKING'
]
==
RerankingOptions
.
ARGUMENT_RANK
.
value
:
reranker
=
ArgumentRankReranking
(
retrieved_results
,
self
.
config
[
'RUN_NAME'
],
topics
)
elif
self
.
config
[
'RERANKING'
]
==
RerankingOptions
.
WORD_MOVER_DISTANCE
.
value
:
reranker
=
WordMoverDistanceReranking
(
retrieved_results
,
self
.
config
[
'RUN_NAME'
],
topics
)
reranker
=
WordMoverDistanceReranking
(
retrieved_results
,
self
.
config
[
'RUN_NAME'
],
topics
,
self
.
config
[
'MU_PREMISES'
],
self
.
config
[
'MU_CONCLUSIONS'
])
else
:
reranker
=
NoReranking
(
retrieved_results
,
self
.
config
[
'RUN_NAME'
])
...
...
python/src/utils/configuration.py
View file @
43d11c65
...
...
@@ -27,6 +27,8 @@ class Configuration():
'REUSE_UNRANKED'
,
'LAMBDA_CONCLUSIONS'
,
'LAMBDA_PREMISES'
,
'MU_CONCLUSIONS'
,
'MU_PREMISES'
,
'WAIT_FOR_ES'
],
}
...
...
@@ -62,6 +64,8 @@ class Configuration():
args
.
reuse_unranked
,
args
.
lambda_conclusions
,
args
.
lambda_premises
,
args
.
mu_conclusions
,
args
.
mu_premises
,
args
.
wait_for_es
]
...
...
python/src/utils/parse_cli_args.py
View file @
43d11c65
...
...
@@ -10,7 +10,7 @@ class Text:
"""
TODO
"""
description
=
'$$$$ Graph based Argument Mining on Sentence Embeddings $$$$'
description
=
"Hit-Girl'Argument Retrieval System"
indexing
=
'Sub command to create a semantic index with elastic search.'
elastic_host
=
'The hostname of the server/docker container that runs elastic search.'
create
=
'If flag is present two new indices are created, overriding existing ones.'
...
...
@@ -85,8 +85,16 @@ def parse_cli_args() -> argparse.Namespace:
type
=
float
,
required
=
False
,
default
=
0.5
)
parser_retrieval
.
add_argument
(
'--mu-conclusions'
,
type
=
float
,
required
=
False
,
default
=
0.9
)
parser_retrieval
.
add_argument
(
'--mu-premises'
,
type
=
float
,
required
=
False
,
default
=
0.75
)
parser_retrieval
.
add_argument
(
'--wait-for-es'
,
action
=
'store_true'
)
action
=
'store_true'
)
parser_retrieval
.
add_argument
(
'run_name'
,
type
=
str
,
help
=
Text
.
run_name
)
parser_retrieval
.
add_argument
(
'input_path'
,
type
=
str
,
help
=
Text
.
input_path
)
...
...
system_architecture.png
0 → 100644
View file @
43d11c65
23 KB
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment