Commit 7364ca82 authored by Jerome Wuerf's avatar Jerome Wuerf
Browse files

Apply feedback from Theresa

parent cf9de91c
root = true
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
[*.{tex,cls,lua}]
indent_style = space
indent_size = 4
This diff is collapsed.
\begin{abstract}
Every day search engines are fact-driven. They try to serve the user's information needs as quickly as possible. This strategy could hinder a deep engagement with controversial topics thus there is a need for approaches that present a more diverse set of information to support opinion formation. This work aims to examine different reranking approaches using the preprocessed args.me corpus in order to solve the Touché 2022 Argument Retrieval Task. The proposed retrieval system relies on an initial retrieval using a semantic search on a sentence level and leverages heuristics based on first principles. Our reranking approaches incorporate \textit{maximal marginal relevance}, \textit{word mover distance}, and a novel approach, based on a fuzzy matching on part of speech tags, that we call \textit{structural distance}. Further, we explore the applicability of a graph-based reranking approach. The results indicate that the reranking approaches improve argument quality to varying degrees at the cost of relevance. \textit{Structural distance} performs best with minimal loss in relevance and the most significant gain in terms of quality.
Web search engines are fact-driven. They try to serve the user's information needs as fast as possible by only focusing on topic specific relevance. This strategy could hinder a deep engagement with controversial topics thus there is a need for approaches that present a more diverse set of information to support opinion formation. This work aims to examine different reranking approaches using the preprocessed args.me corpus in order to contribute to the Touché 2022 Argument Retrieval Task. The proposed retrieval system relies on an initial retrieval using a semantic search on a sentence level and takes advantage of simple heuristics. Our reranking approaches incorporate \textit{maximal marginal relevance}, \textit{word mover's distance}, and a novel approach, based on a fuzzy matching on part of speech tags, that we call \textit{structural distance}. Further, we explore the applicability of a graph-based reranking approach. The results indicate that the reranking approaches improve argument quality to varying degrees at the cost of relevance. \textit{Structural distance} performs best with minimal loss in relevance and the most significant gain in terms of quality.
%Furthermore, limitations of the applicability of a graph-based approach for our retrieval system were explored.
%The experiments are evaluated regarding the relevance and quality of the retrieved results and show the benefits of the approaches maximum marginal relevance, word mover distance and structural distance and furthermore describe the problems that arise with the graph based approach.
%Furthermore, limitations of the applicability of a graph-based approach for our retrieval system were explored.
%The experiments are evaluated regarding the relevance and quality of the retrieved results and show the benefits of the approaches maximum marginal relevance, word mover's distance and structural distance and furthermore describe the problems that arise with the graph based approach.
\end{abstract}
\ No newline at end of file
\end{abstract}
\section{Introduction}
The enduring protests to the pandemic restrictions seem to uncover an imminent problem in the current discussion culture. Despite increased exposure to data and information through our daily lives, we fail to present the gained knowledge to enable debates and support individuals' opinion formation. Regarding COVID-19, it is shown that people exposed to misinformation, biased media, and conspiracy have lower trust in democratic institutions \cite{pummerer2022conspiracy}. This situation makes it urgent for societies to confront misinformed individuals with reasonable arguments. Besides COVID-19, web resources, like blogs and news sites, address many other topics with a similar impact in an accelerating fashion, creating the need for automatic retrieval of reasonable arguments.
The protest waves as a response to the pandemic restrictions seem to uncover an imminent problem in the current discussion culture. Despite an increased exposure to facts on controversial topics through our daily lives, we fail to present the gained knowledge to enable debates and to support individuals' opinion formation. Regarding COVID-19, it has been shown that people exposed to misinformation, biased media, and conspiracy have lower trust in democratic institutions \cite{pummerer2022conspiracy}. This situation makes it urgent for societies to confront misinformed individuals with reasonable arguments. Besides COVID-19, web resources, like blogs and news sites, address many other topics with a similar, potentially harmful impact. This development motivates our research on the automatic retrieval of reasonable arguments.
This work, describes our submission for Task 1 of Touché 2022 \cite{bondarenko:2022c}. The task asks participants to create an argument retrieval system for a given corpus to support the opinion formation on controversial societal topics. In this year's version of the first task, the requirements for the final systems differ from the previous years. The submitted retrieval systems should output pairs of reasonable argumentative sentences for a given topic. An argument is reasonable if the retrieved sentences are relevant and qualitative. The quality of arguments is defined by (1) the argumentativeness of each sentence, (2) coherence between the sentences, and (3) together the sentences of the pair should form a summary of their originating arguments \cite{bondarenko:2022c}.
This work, describes the submission of team Hit-Girl\footnote{\url{https://en.wikipedia.org/wiki/Hit-Girl}} for Task 1 of Touché 2022 \cite{bondarenko:2022c}. The task asks participants to create an argument retrieval system for a given corpus to support the opinion formation on controversial societal topics. In this year's version of the first task, the requirements for the final systems differ from the previous years, as participants are asked to retrieve argumentative sentence pairs instead of whole arguments for a given topic. The sentence pair is reasonable if the retrieved sentences are topic-relevant and qualitative. The quality of arguments is defined by (1) the argumentativeness of each sentence, (2) coherence between the sentences, and (3) together, the sentences of the pair should form a summary of their originating arguments \cite{bondarenko:2022c}.
Our proposed system consists of three main components: indexing, initial retrieval, and reranking. The system's source code is publicly available\footnote{https://git.informatik.uni-leipzig.de/hit-girl/code}. Before indexing, sentences of the provided preprocessed args.me \cite{argsme2} are transformed into vector embeddings. Sentences and vector embeddings are persisted into two indices, one for premises and one for conclusions. We effectively conduct a nearest neighbor search in the embedding space at retrieval time. The search utilizes the cosine similarity between query embedding and the embeddings in the respective index. This approach maximizes the semantic similarity between sentences, which should reap relevant sentences. In the following, we will refer to this as \textit{semantic search}. Finally, we compare multiple reranking approaches that aim to balance relevance and diversification of query results by assessing differences between a query and the retrieved sentences. Having outlined our initial motivation and a rough system overview of how we approach the given task, we pose the following research question:
Our proposed system consists of three main components: indexing, initial retrieval, and reranking. The system's source code is publicly available\footnote{\url{https://git.informatik.uni-leipzig.de/hit-girl/code}}. Before indexing, sentences of the provided preprocessed args.me corpus \cite{argsme2} are transformed into vector embeddings. Sentences and vector embeddings are stored into two indices, one holds only premises,/ and the other holds only conclusions. We conduct a nearest neighbor search in the embedding space at retrieval time. Initially, we rank according to the cosine similarity between the query embedding and the embeddings in the respective index. This approach should maximize the semantic similarity between sentences, resulting in topic-relevant sentences. In the following, we will refer to this as \textit{semantic search}. Finally, we compare multiple reranking approaches that aim to balance relevance and diversification of query results by assessing differences between a query and the retrieved sentences. Having outlined our initial motivation and a rough system overview of how we approach the given task, we pose the following research question:
\vspace{8pt}
\textit{Do simple, argument agnostic reranking approaches improve argument quality compared to an initial semantic search?}
\textit{Do simple, argument quality agnostic reranking approaches improve argument quality compared to an initial semantic search?}
\vspace{8pt}
To answer our research question, we conducted experiments with three different reranking approaches utilizing \textit{maximal marginal relevance} (MMR), \textit{structural distance} (SD), and \textit{word mover's distance} (WMD). All three reranking approaches increase argument quality while sacrificing argument relevance. Further, we explored a graph-based argument reranking approach, which we did not fully develop due to challenges with our own setup and the structure of the processed dataset. Nevertheless, we would like to share our insights with the research community. Section 2 will introduce the reranking methods in detail. Following the related work, we describe our system and reranking approaches in detail. Section 4 presents the evaluation of our experiments, which are discussed in section 5.
To answer our research question, we conducted experiments with three different reranking approaches utilizing \textit{maximal marginal relevance} (MMR), \textit{structural distance} (SD), and \textit{word mover's distance} (WMD). All three reranking approaches increase argument quality while sacrificing argument relevance. Further, we analyze the challenges of implementing a graph-based argument reranking approach. Section 2 will introduce the related work. Following the related work, we describe our system and reranking approaches in section 3. Section 4 presents the evaluation of our experiments, which are discussed in section 5.
% /\_____/\
......
......@@ -3,12 +3,18 @@
This section introduces the challenge of argument retrieval and describes existing reranking approaches. We pick up on the shortcomings of previous studies to justify the design of our system.
\subsection{Challenges in argument retrieval}
Retrieving arguments for controversial topics requires a search system that anticipates the underlying information needs of users. Users want to find supportive and opposing arguments on the respective topics quickly and comprehensibly. Argument Search denotes a relatively new field of research dedicated to retrieving arguments on controversial topics. It unites challenges of natural language processing and information retrieval while opening up a broad range of research opportunities for computational argumentation \cite{building_arg_2017}. ArgumenText \cite{argtext}, the IBM Debater \cite{ibm} and args.me \cite{building_arg_2017, argsme2} are relevant pioneers in that field of research offering diverse technical approaches to the challenges of argument retrieval.
The argument corpus of args.me \cite{argsme2} used in the Touché 2022 Task 1 consists of arguments crawled from the debate portals debatepedia.org, idebate.org, debate.org, and debatewise.org \cite{argsme2}. Arguments in the corpus are composed of a conclusion paired with a set of premises following the argumentation scheme proposed by Walton \cite{Walton} and others. Each premise has a supportive or opposing stance towards a conclusion.
Search engines for argument retrieval for controversial topics aim to quickly and comprehensively provide users with supportive and opposing arguments. Argument search denotes a relatively new field of research. It unites challenges of natural language processing and information retrieval while opening up a broad range of research opportunities for computational argumentation \cite{building_arg_2017}. In contrast to relevance orientated search engines, systems for argument retrieval additionally needs to focus on:
\newpage
\begin{itemize}
\item incorporating the quality of the arguments to check for their validity
\item providing an overview of arguments with different stances instead of a single best answer
\item assessing and reflecting the connections between arguments in the final ranking
\end{itemize}
\subsection{Existing methods in argument retrieval}
Previous years of Touché showed substantial improvements in retrieval scores. In the first year, multiple submissions indicated that the DirichletLM \cite{zhai2017study} retrieval model is a strong baseline for the initial retrieval of whole arguments \cite{bondarenko:2020b}. Additionally, query expansion mechanisms were deployed to increase recall. Submissions for the second round of Touché indicated that argument-aware reranking approaches using fine-tuned language models improved previous years' results. Further, approaches focused on parameter tuning using existing relevance judgments \cite{bondarenko:2021d}.
ArgumenText \cite{argtext} and \emph{args} \cite{building_arg_2017} are important pioneers offering diverse technical approaches to the outlined challenges of argument retrieval. ArgumentText \cite{argtext} was one of the first systems ingesting heterogeneous Web documents, identifying arguments in topic-relevant documents and labeling the identified arguments with a "pro" or "con" stance. The identification of arguments relies on an attention-based neural network, and a stance recognition utilizes a BiLSTM model. Both models were trained on a dataset containing 49 topics with 600 sentences each, labeled as "pro", "con" or not an argument. The authors compare their system's performance to an expert-curated list of arguments within a specific online debate portal \footnote{\url{https://ProCon.org}} and reported that on three selected topics, the retrieved arguments matched 89\% the ones of the expert-curated list. Further, they pointed out that 12\% of the arguments identified by their approach were not contained in the expert-curated list. ArgumentText \cite{argtext} differs from our system as we are using a preprocessed dataset that does already contain arguments that are split into their constituent sentences. Further, these sentences are also already labeled by a stance. Therefore, our system only relies on initial retrieval and reranking approaches.
\emph{Args} \cite{building_arg_2017} is a prototype argument retrieval system using a novel argument search framework and a newly crawled Web-based corpus \cite{argsme2}. The framework incorporates a common argument model. In this model, one argument consists of a claim/conclusion, zero or more premises, and an argument's context, which provides the full text in which a specific argument occurred. In general, the framework splits into an indexing process and a retrieval process. The indexing process contains the acquisition of documents, argument mining, an assessment, and indexing. For the initial acquisition, the authors crawl the args.me \cite{argsme2} corpus. The crawl focuses on five different debate portals and includes 34,784 debates containing 291,440 arguments that were finally parsed into 329,791 argument units. Argument mining and parsing into the common argument model rely on Apache UIMA\footnote{\url{https://uima.apache.org}}. The final indexing is realized with Apache Lucene. In the retrieval process, the \emph{args} prototype performs an initial retrieval for a given query, relying on an exact string match between query terms and terms in an indexed argument and conducts a ranking on relevant arguments using a BM25 model. To be more specific, a BM25F model was used to weigh the individual components of the common argument model. The authors performed a quantitative analysis using controversial topics from Wikipedia as queries. The scores were reported on the systems' coverage for logical combinations of query terms and phrase queries and the three components of the proposed common argument model: conclusions, arguments, and argument's context. Finally, the system achieved a good initial coverage ranging from 41.6\%–84.6\% for all query types on the conclusions and a coverage of 77.6\% on phrase queries for whole arguments. The results indicate that a retrieval model with a higher weight on conclusions reaps arguments of higher relevance. Our system uses a preprocessed version of the args.me \cite{argsme2} corpus. To be specific, our system indexes sentences that were gained from the argument mining and the assessment step of the \emph{args} search engine. Like \emph{args}, our system's initial retrieval and reranking approaches will not rely on identifying argumentative structures within the indexed argument units. In contrast to \emph{args}, we use two indices, one for conclusions and one for premises, instead of indexing whole arguments at once. Motivated by the findings of the args search engine that the conclusions should have higher weight, our system queries our conclusion index first and uses the retrieved conclusions to query the premises index. Furthermore, our system enforces a minimum amount of tokens in a retrieved conclusion compared to a query. This constraint is also motivated by the expectations of \emph{args}' authors expectation, "that the most relevant arguments need some space to lay out their reasoning"\cite{building_arg_2017}.
Up to now, only a minority of submissions leveraged semantic embeddings for an initial retrieval, which motivates us to use a semantic search. Further, we try to mimic query expansion mechanisms by querying premises with multiple conclusions on a controversial topic. Finally, our reranking distinguishes us from existing approaches, as we do not rely on argument-specific domain features or any learned methods.
\ No newline at end of file
Previous years of Touché showed substantial improvements in retrieval performance. In the first year, multiple submissions indicated that the DirichletLM \cite{zhai2017study} retrieval model is a strong baseline for the initial retrieval of argumentative text \cite{bondarenko:2020b}. Additionally, query expansion mechanisms were deployed to increase recall. Submissions for the second round of Touché indicated that argument-aware reranking approaches using fine-tuned language models improved previous years' results. Moreover, approaches focused on parameter tuning of pipelines proposed in the previous year, using existing relevance judgments \cite{bondarenko:2021d}. Up to now, only a minority of Touché's submissions \cite{agarwal2021exploring, ros2021team} leveraged semantic embeddings for an initial retrieval, which motivates us to gain a deeper understanding of this approach. Motivated by the promising results of query expansion of last year's submissions \cite{akiki2021learning, RaimondiEtAl:CLEF-2021, MailachEtAl:CLEF-2021}, our system mimics a query expansion by first retrieving conclusions with an initial controversial topic and then using these conclusions to query an index holding the premises. Finally, our reranking distinguishes us from existing ones, as we do not rely on argument-specific domain features or machine learning methods.
This diff is collapsed.
\section{Evaluation} \label{sec:4}
We performed a manual evaluation of a subset of retrieved topics to evaluate the effectiveness of the presented retrieval and reranking approaches. For each of the first five topics, 100 sentence pairs were retrieved and ranked. After that, the Top 20 sentence pairs for each topic were assessed by hand according to relevance and quality criteria. The relevance criterion captures how well the content of the sentence pair fits the topic's content. Quality measures how a good a sentence pair is regarding
argumentativeness and comprises two sub criteria:
\begin{enumerate}
\item Whether both sentences contain argumentative judgements, premises, listings of advantages or disadvantages regarding the topic question or contain a founded conclusion
\item Whether the two sentences are coherent and argue for the same subject.
\end{enumerate}
The scale of the relevance and quality criterion are explained in \hyperref[tab:evaluationcriterions]{Table \ref{tab:evaluationcriterions}}.
\begin{table}[h]
\centering
\begin{tabular}{c|cl}\toprule
\textbf{Criterion} & \textbf{Scale} & \textbf{Explanation}\\\midrule
\textbf{Relevance} & -2 & Spam, no relation to topic \\
& 0 & Thematic reference in broadest sense, not relevant \\
& 1 & Related to topic\\
& 2 & Content matches exactly the debate of the topic\\ \midrule
\textbf{Quality} & 0 & No subcriterion fulfilled \\
& 1 & One subcriterion fulfilled \\
& 2 & Both subcriterion fulfilled \\
\hline
\end{tabular}
\centering
\caption{Evaluation criterions with explanation}
\label{tab:evaluationcriterions}
\end{table}
We calculate the ndcg@10 for the relevance values and the average quality overall topics of an approach or parameter combination as evaluation metrics. The outcomes are shown in \hyperref[tab:evaluationoutcomes]{Table \ref{tab:evaluationoutcomes}}.
\begin{table}[ht]
\centering
\begin{tabular}{lccc}\toprule
\textbf{Type of reranking}&\textbf{Average ndcg@10} & \textbf{Average quality} \\ \midrule
Baseline (no \textit{token factor}) &\textbf{0.80} & 0.47 \\
Baseline & 0.63 & 0.78 \\ \midrule
MMR \(\lambda = 0\) & 0.31 & 0.94 \\
MMR \(\lambda = 0.25\) & 0.01 & 0.84 \\
MMR \(\lambda = 0.50\) & 0.11 & 0.87 \\
MMR \(\lambda = 0.75\) & 0.52 & 0.88 \\ \midrule
SD & 0.56 & \textbf{0.97} \\ \midrule
WMD & 0.52 & 0.83 \\ \bottomrule
\end{tabular}
\centering
\caption{Evaluation scores. All values given for token factor are 1.75}
\label{tab:evaluationoutcomes}
\end{table}
Regarding the results, it is notable that the baseline retrieval approach without any reranking delivers sentence pairs that are highly relevant for the requested topic but have poor argumentative quality. While none of the other retrieval modifications and reranking approaches reaches this relevance score, all of them substantially increase the argumentative quality of the top 20 ranked sentence pairs. Notably, the usage of the token factor on both indices, for conclusions and premises, clearly outperforms the usage only for the conclusion index. Interestingly, structural distance improves the quality of the top 20 ranked sentence pairs, but at the cost of relevance for the requested topic. Unsurprisingly, the relevance scores for reranking using MMR with lambda below one are lower than the baseline.
On the other hand, the quality appears to be slightly improved upon reranking using MMR. Applying the MMR did improve the subjectively perceived diversity of results for specific topics with many similar results, to begin with, while still maintaining relatively high relevance. However, other topics had rather poor relevance scores from the start. In these cases, applied diversification posed a risk to worsen the relevance substantially. In contrast to the findings of \citeauthor{mmr}\cite{mmr} a significant relevance difference for each chosen \(\lambda\) value was observed.
%wichtige frage wäre vielleicht auch wieso haben manche themen so ne krasse relevance und andere gar nicht, gibt der datensatz evtl. nicht gleichmäßig was her sozusagen? man könnte auch sagen, aber evtl. das dann eher in der discussion? Dass bei der Evaluation auch noch allgemein einige dinge aufgefallen sind, wie beispielsweise dass das Referenzieren auf andere Sätze eine Schwierigkeit darstellt, weil sie ohne den anderen Satz dann den Bezug verlieren und man daher diese Sätze weniger gut nutzen kann als andere. Wie gut kann ein Satz allein stehen ist eben wichtig dafür. und so anderen blablakram - Qualität verbessern
\textcolor{red}{In the camera-ready version, this section will contain the evaluation of relevance and quality for the four runs that we uploaded to TIRA. The four runs include a baseline run (initial retrieval only) and three runs combining the initial retrieval and our reranking approaches (MMR, SD, and WDM). Further, we will discuss our parameter settings for the reranking approaches (\( \mu, \lambda\)) and the \emph{token factor}.}
% The organizers of the task provided relevance and quality judgments on an argument level created by experts to evaluate the submitted system. However, our system generates sentence pairs originating from different arguments. Thus, we can not use the provided judgments and perform a manual evaluation. We assess the top 20 sentence pairs of our generated rankings on five topics\footnote{Topics:
% Do we need sex education in schools?,
% Should stem cell research be expanded?,
% Should blood donations be financially compensated?,
% Should suicide be a criminal offense?,
% Should agricultural subsidies be reduced?} according the two criteria in table \ref{tab:evaluationcriterions}. The first criterion captures the relevance of a sentence pair to a topic and the second one reflects the argumentive quality between the premise and conclusion of a sentence pair. The criterion of argumentative quality consists of two subcriteria:
% \begin{itemize}
% \item[\textbf{S1}]
% both sentences contain argumentative judgements or listings of advantages or disadvantages regarding the topic question
% \item[\textbf{S2}] the sentence pair is coherent and both sentences argue for the same topic
% \end{itemize}
% Using the relevance criterion, we measure the nDCG@10 to assess the topic relevance for sentence pairs. Our reported results average the nDCG@10 over the five assessed topics. To quantify the argumentative quality, we first average the scores of the quality criteria within a single topic and then build an average of averages over the five topics. We submit four of our runs to TIRA \cite{potthast2019tira} the other experiments were executed on our local environments. The results in table \ref{tab:results} show that the baseline (initial retrieval only) generates highest nDCG@10 scores but do also have poorest argumentative quality. While none of the other retrieval modifications and reranking approaches reach this relevance score, all of them do substantially increase the argumentative quality of the top 20 sentence pairs. Notably, using the token factor without any reranking approach improves the argumentative quality, confirming the assumption of \citeauthor{building_arg_2017}\cite{building_arg_2017} that a qualitative arguments needs some space to fully develop. Interestingly, structural distance improves the quality of the top 20 sentence pairs, but at the cost of relevance for the requested topic. The quality appears to be slightly improved using MMR. Applying MMR did improve the diversity of results for specific topics with many similar results, while still maintaining relatively high relevance. However, other topics had rather poor relevance scores from the start. In these cases, applied diversification posed a risk to worsen the relevance substantially. In contrast to the findings of \citeauthor{mmr}\cite{mmr} a significant relevance difference for each chosen \(\lambda\) value was observed.
% \begin{table}
% \centering
% \begin{tabular}{c|cl}\toprule
% \textbf{Criterion} & \textbf{Scale} & \textbf{Explanation}\\ \midrule
% \textbf{Relevance} & -2 & Spam, no relation to topic \\
% & 0 & Thematic reference in broadest sense, not relevant \\
% & 1 & Related to topic\\
% & 2 & Content matches exactly the debate of the topic\\ \midrule
% \textbf{Quality} & 0 & No subcriterion fulfilled \\
% & 1 & Either \textbf{S1} or \textbf{S2} is fulfilled \\
% & 2 & Both \textbf{S1} and \textbf{S2} are fulfilled \\
% \hline
% \end{tabular}
% \centering
% \caption{Evaluation criteria to asses the sentence pairs in the final rankings. To evaluate our system we assess top 20 sentence pairs on five topics.}
% \label{tab:evaluationcriterions}
% \end{table}
% \begin{table}
% \centering
% \begin{tabular}{lccc}\toprule
% \textbf{Type of reranking}&\textbf{nDCG@10} & \textbf{Quality} \\ \midrule
% Baseline & \textbf{0.63} & 0.78 \\
% MMR & 0.52 & 0.88 \\
% SD & 0.56 & \textbf{0.97} \\
% WMD & 0.52 & 0.83 \\ \bottomrule
% \end{tabular}
% \centering
% \caption{Resulting nDCG@10 and quality of our approaches. nDCG@10 and quality are averages over five topics. For each topic we assessed the top 20 sentence pairs. All runs use a \emph{token factor} of 1.75 for the initial retrieval. The both final scores of SD and WDM use \(\mu=0.9\) for the reranking of conclusions and \(\mu=0.9\) for the reranking of premises. MMR uses a \(\lambda=0.75\).}
% \label{tab:results}
% \end{table}
% With a \textit{token factor} of 1.75 for conclusions and premises, the relevance score suffered notably. Applying the \textit{token factor} only to the conclusion index already worsened the relevance score significantly. In contrast, using the \textit{token factor} for both conclusion and premise index did not result in even lower relevance scores. Therefore, we suspect that the \textit{token factor} for the conclusions, in particular, may have been problematic for the relevance, since in most cases, the conclusions were relatively short, and adding the \textit{token factor} as a constraint limiting the possible options. Hence, applying a smaller \textit{token factor} to the conclusions, say 1.0, and a \textit{token factor} of 1.75 for the premises, the relevance score might have suffered less. However, due to the time- and resource-consuming nature of the indexing process, we refrained from testing this hypothesis by experimenting further with the \textit{token factor} and leave this for further research. On the other hand, we showed that a simple heuristic like this could improve the perceived quality of the arguments. Our underlying assumption is that premises with substantially more tokens than are present in the respective query are likely to provide an argument beyond claims and stronger at a content level.
% While the application of the MMR in its original work did not cause significant discrepancies of relevance in dependence of the tuning parameter lambda\cite{mmr}, in our case, the adjustment of lambda resulted in strongly varying relevance scores. One explanation could be that the application of the MMR is indicated only in case of an abundance of relevant results. In our case, this condition was not met equally across all topics but varied depending on the topics. Some topics had highly similar results, while others had quite a poor relevance initially. In these cases, the conditions for applying the MMR were not met in the first place, which may have further worsened the validity of the results.
% %opposite would have been necessary.
% %Thus, a topic-specific application of the MMR would be better --eh--.
% %explanation wieso struct. distacne evtl. beste results hatte? word mover distance
% %Interestingly, the structural distance method improved argument quality most, with the lowest impact on the relevance scores compared to the other methods. The word mover distance on the other hand did not increase argument quality
% %Daniel hier noch erklärung was du glaubst woran das liegt?
% All reranking techniques slightly reduced the relevance as compared to the respective baseline condition with \textit{token factor} on both indices, while the quality generally seemed to have improved slightly. Whether that difference is significant is hard to judge, though, as the evaluation by nature is always slightly subjective and has a specific range of variation. Of all tested reranking approaches, the SD approach achieved the best results with the lowest impact on the relevance and most remarkable improvements in terms of quality.
\section{Discussion and Conclusion}
\section{Conclusion}
\label{sec:5}
Our work shows the complexity of finding relevant and argumentative sentence pairs within a heterogeneous and partially dirty data set. Different reranking methods for retrieved sentence pairs were presented, and their quality was examined, allowing these documented results to serve for future experiments and research.
This work examines whether reranking approaches that do not make inferences about argument quality can improve rankings generated by an initial semantic search. In our theory, the initial search maximizes topic relevance, and the argument agnostic rerankings increase variety, potentially ranking more qualitative sentence pairs of premise and conclusion higher. We have implemented an argument retrieval system using word embeddings for the initial ranking and three argument quality agnostic reranking approaches to answer our research question. The reranking approaches foot on the \textit{maximal marginal relevance}, the word-mover's distance, and a novel distance measure based on a fuzzy matching on sentence tags, which we call structural distance. Based on our findings...\textcolor{red}{this sentence will be available in the camera-ready version.} Our system introduces several parameters the initial ranking uses a \textit{token factor}, \textit{maximal marginal relevance} imposes \(\lambda\) and \textit{structural distance} and \textit{word mover's distance} use \(\mu\). For the next iteration of Touche, when relevance and quality judgments on a sentence pair level are available, we will perform parameter fine-tuning to improve our outlined approaches in future research.
With a \textit{token factor} of 1.75 for conclusions and premises, the relevance score suffered notably. Applying the \textit{token factor} only to the conclusion index already worsened the relevance score significantly. In contrast, using the \textit{token factor} for both conclusion and premise index did not result in even lower relevance scores. Therefore, we suspect that the \textit{token factor} for the conclusions, in particular, may have been problematic for the relevance, since in most cases, the conclusions were relatively short, and adding the \textit{token factor} as a constraint limiting the possible options. Hence, applying a smaller \textit{token factor} to the conclusions, say 1.0, and a \textit{token factor} of 1.75 for the premises, the relevance score might have suffered less. However, due to the time- and resource-consuming nature of the indexing process, we refrained from testing this hypothesis by experimenting further with the \textit{token factor} and leave this for further research. On the other hand, we showed that a simple heuristic like this could improve the perceived quality of the arguments. Our underlying assumption is that premises with substantially more tokens than are present in the respective query are likely to provide an argument beyond claims and stronger at a content level.
While the application of the MMR in its original work did not cause significant discrepancies of relevance in dependence of the tuning parameter lambda\cite{mmr}, in our case, the adjustment of lambda resulted in strongly varying relevance scores. One explanation could be that the application of the MMR is indicated only in case of an abundance of relevant results. In our case, this condition was not met equally across all topics but varied depending on the topics. Some topics had highly similar results, while others had quite a poor relevance initially. In these cases, the conditions for applying the MMR were not met in the first place, which may have further worsened the validity of the results.
%opposite would have been necessary.
%Thus, a topic-specific application of the MMR would be better --eh--.
%explanation wieso struct. distacne evtl. beste results hatte? word mover distance
%Interestingly, the structural distance method improved argument quality most, with the lowest impact on the relevance scores compared to the other methods. The word mover distance on the other hand did not increase argument quality
%Daniel hier noch erklärung was du glaubst woran das liegt?
All reranking techniques slightly reduced the relevance as compared to the respective baseline condition with \textit{token factor} on both indices, while the quality generally seemed to have improved slightly. Whether that difference is significant is hard to judge, though, as the evaluation by nature is always slightly subjective and has a specific range of variation. Of all tested reranking approaches, the SD approach achieved the best results with the lowest impact on the relevance and most remarkable improvements in terms of quality.
% Whether simple argument agnostic reranking approaches can improve argument quality compared to an initial semantic search can not be answered conclusively. The results hint at a possible improvement of argument quality at the cost of relevance, with potential improvements through adjustments of the \textit{token factor} of the conclusion index regarding the relevance.
In the end, more experimentation with the different parameters may result in improvements. Whether simple argument agnostic reranking approaches can improve argument quality compared to an initial semantic search can not be answered conclusively. The results hint at a possible improvement of argument quality at the cost of relevance, with potential improvements through adjustments of the \textit{token factor} of the conclusion index regarding the relevance.
%evtl. hier noch mini ethical stuff, Ups bin eskaliert. Sonst kann das auch weg leude.
In the age of fake news, information has become a political instrument that should not be underestimated. Therefore, on what basis people should form their opinions is a question of great importance, not just for the regulars' table discussion in the bar. It can reach political proportions. With this in mind, our intention here is to outline ethical concerns regarding argument search as a whole briefly. Forming opinions, especially on sensitive argument topics, requires intensive research, even for experts. Debate portals have a very heterogeneous user base. Everyone can share their views on debate portals, and we were more than aware of this when evaluating the sentence pairs. One user, for instance, advocated child pornography to display a rather extreme example. Filtering this kind of opinion garbage would be censorship, yet making such opinions available through a retrieval system comes with a bitter aftertaste. Therefore, the content quality of the system's arguments depends directly on the user community of the debating portals. Even if this is not the ambition of the approach, it should still be made warningly transparent that the arguments are based on people's opinions and not on facts. In this respect, developing an argument search system comes with great responsibility.
%In this sense, the challenge that the Task is dealing with this year, or that argument search systems are dealing with, goes hand in hand with great responsibility.
%The correct assessment of sensitive debateable issues, for which no yes-no answer is adequate, is often highly complex and cannot be answered even by experts without intensive research.
%In this sense, the challenge that the Task is dealing with this year, or that argument search systems are dealing with, goes hand in hand with great responsibility.
%The correct assessment of sensitive debateable issues, for which no yes-no answer is adequate, is often highly complex and cannot be answered even by experts without intensive research.
% To conclude, we built and presented a retrieval system for arguments consisting of indexing, initial retrieval, and the application of various reranking approaches, including MMR, WMD, and an original SD approach. This work demonstrates the effect of those reranking techniques on the quality and relevance of argumentative premise-conclusion pairs. All reranking approaches resulted in a higher argument quality at the cost of relevance to varying extents. Parameter optimization of the \textit{token factor} is left for future research.
%A responsible way-it is essential to handle it responsibly-Argument search is supposed to help form opinions.
To conclude, we built and presented a retrieval system for arguments consisting of indexing, preliminary retrieval, and the application of various reranking approaches, including MMR, WMD, and an original SD approach. This work demonstrates the effect of those reranking techniques on the quality and relevance of argumentative premise-conclusion pairs. All reranking approaches resulted in a higher argument quality at the cost of relevance to varying extents. Parameter optimization of the \textit{token factor} is left for future research.
%A responsible way-it is essential to handle it responsibly-Argument search is supposed to help form opinions.
%- wir machen retrieval system für satzpaare von conclusions und premises
%- wir nutzen das retrieval system für research frage ob reranking techniques qualität verbessern und beantworten gröbstens
%-
%wie controverse sind die topics wirklich? Gibt es nicht immer auch eine richtung in der die wahrheit eher liegt? Impfungen sind auch kontrovers - aber es gibt eine deutlichere Datenlage für die eine Seite als für die Andere - so ein Ungleichgewicht sollte doch auch portraitiert werden und nicht nur blaah. Gibt es für manche kontroverse themen nicht schon studien, die das Thema eben doch weniger kontrovers machen als man es als Laie glaubt? politische Tragweite % Problematisch wird es dann wenn kontroverse themen gar nicht kontrovers sind sondern echte antworten haben und nur aufgrund der unterschiedlichen Meinung die die Bevölkerung heißt nicht, dass die
% -make sure that you have a stated research question
% -restate the rq in the first sentence
% -restate the particular theory
% -based on the findings we wer able to resolve (this were the findings)
% - give a big picture where future researchers can pickup the strings (which direction)
% - State a big Idea (can be broad and loose, this findings show that there are many loose things)
......@@ -3,7 +3,7 @@
\begin{figure}[th!]
\centering
\subfloat[\label{app:cosineDegreeHistogramm}Node degree histogramm of all topics.]{\includegraphics[scale=0.45]{figures/cosine_degree_histogramm.pdf}}
\subfloat[\label{app:cosineDegreeHistogramm}Node degree histogramm over all topics. A majority of the conclusion and premises is conneced by a single edge. Some nodes showed a very high degree. ]{\includegraphics[scale=0.45]{figures/cosine_degree_histogramm.pdf}}
\centering
\subfloat[\label{app:cosineArgumentEdges}Total count of edges between arguments per topic.]{\includegraphics[scale=0.5]{figures/cosine_argument_edges.pdf}}
......@@ -24,4 +24,4 @@
\centering
\subfloat[\label{app:wmdArgumentGraphs}Example graphs for five topics.]{\includegraphics[scale=0.5]{figures/wmd_argument_graphs.pdf}}
\end{figure}
\ No newline at end of file
\end{figure}
<mxfile host="65bd71144e">
<diagram id="p9Hnm9kZcsXbAVImyNd9" name="Page-1">
<mxGraphModel dx="832" dy="454" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
<root>
<mxCell id="0"/>
<mxCell id="1" parent="0"/>
<mxCell id="5" value="" style="rounded=0;whiteSpace=wrap;html=1;strokeColor=#000000;fillColor=none;dashed=1;" parent="1" vertex="1">
<mxGeometry x="630" y="60" width="180" height="300" as="geometry"/>
</mxCell>
<mxCell id="4" value="" style="rounded=0;whiteSpace=wrap;html=1;strokeColor=#000000;fillColor=none;dashed=1;" parent="1" vertex="1">
<mxGeometry x="340" y="60" width="270" height="300" as="geometry"/>
</mxCell>
<mxCell id="8" value="" style="rounded=0;whiteSpace=wrap;html=1;strokeColor=#000000;fillColor=none;dashed=1;" parent="1" vertex="1">
<mxGeometry x="50" y="60" width="270" height="300" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-22" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;" parent="1" source="hp4KDhJicbJTmFi_2CdE-1" target="hp4KDhJicbJTmFi_2CdE-10" edge="1">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="410" y="180"/>
<mxPoint x="410" y="235"/>
</Array>
</mxGeometry>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-1" value="conclusion index" style="shape=cylinder3;whiteSpace=wrap;html=1;boundedLbl=1;backgroundOutline=1;size=15;" parent="1" vertex="1">
<mxGeometry x="300" y="150" width="60" height="80" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-21" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.75;entryDx=0;entryDy=0;" parent="1" source="hp4KDhJicbJTmFi_2CdE-2" target="hp4KDhJicbJTmFi_2CdE-10" edge="1">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="410" y="320"/>
<mxPoint x="410" y="265"/>
</Array>
</mxGeometry>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-2" value="premise index" style="shape=cylinder3;whiteSpace=wrap;html=1;boundedLbl=1;backgroundOutline=1;size=15;" parent="1" vertex="1">
<mxGeometry x="300" y="270" width="60" height="80" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-13" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" parent="1" source="hp4KDhJicbJTmFi_2CdE-4" target="hp4KDhJicbJTmFi_2CdE-1" edge="1">
<mxGeometry relative="1" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-15" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" parent="1" source="hp4KDhJicbJTmFi_2CdE-4" target="hp4KDhJicbJTmFi_2CdE-2" edge="1">
<mxGeometry relative="1" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-4" value="indexing module" style="rounded=0;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="160" y="220" width="80" height="60" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-26" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;endArrow=none;endFill=0;" parent="1" source="hp4KDhJicbJTmFi_2CdE-6" target="hp4KDhJicbJTmFi_2CdE-4" edge="1">
<mxGeometry relative="1" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-6" value="sentences" style="shape=note;whiteSpace=wrap;html=1;backgroundOutline=1;darkOpacity=0.05;labelPosition=center;verticalLabelPosition=top;align=center;verticalAlign=bottom;" parent="1" vertex="1">
<mxGeometry x="60" y="170" width="60" height="60" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-27" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.5;entryDx=0;entryDy=0;endArrow=block;endFill=1;" parent="1" source="hp4KDhJicbJTmFi_2CdE-7" target="hp4KDhJicbJTmFi_2CdE-4" edge="1">
<mxGeometry relative="1" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-7" value="embeddings" style="shape=note;whiteSpace=wrap;html=1;backgroundOutline=1;darkOpacity=0.05;labelPosition=center;verticalLabelPosition=top;align=center;verticalAlign=bottom;" parent="1" vertex="1">
<mxGeometry x="60" y="270" width="60" height="60" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-25" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.375;entryY=0.017;entryDx=0;entryDy=0;entryPerimeter=0;endArrow=none;endFill=0;" parent="1" source="hp4KDhJicbJTmFi_2CdE-8" target="hp4KDhJicbJTmFi_2CdE-10" edge="1">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="430" y="180"/>
<mxPoint x="475" y="180"/>
<mxPoint x="475" y="221"/>
</Array>
</mxGeometry>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-8" value="topics" style="shape=note;backgroundOutline=1;darkOpacity=0.05;overflow=visible;whiteSpace=wrap;html=1;labelPosition=center;verticalLabelPosition=top;align=center;verticalAlign=bottom;" parent="1" vertex="1">
<mxGeometry x="400" y="80" width="60" height="60" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-23" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" parent="1" source="hp4KDhJicbJTmFi_2CdE-9" target="hp4KDhJicbJTmFi_2CdE-10" edge="1">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="519" y="180"/>
<mxPoint x="475" y="180"/>
</Array>
</mxGeometry>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-9" value="&lt;div&gt;config&lt;/div&gt;" style="shape=note;backgroundOutline=1;darkOpacity=0.05;overflow=visible;whiteSpace=wrap;html=1;labelPosition=center;verticalLabelPosition=top;align=center;verticalAlign=bottom;" parent="1" vertex="1">
<mxGeometry x="489" y="80" width="60" height="60" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-18" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" parent="1" source="hp4KDhJicbJTmFi_2CdE-10" target="hp4KDhJicbJTmFi_2CdE-17" edge="1">
<mxGeometry relative="1" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-19" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=1;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;" parent="1" source="hp4KDhJicbJTmFi_2CdE-10" target="hp4KDhJicbJTmFi_2CdE-1" edge="1">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="390" y="250"/>
<mxPoint x="390" y="190"/>
</Array>
</mxGeometry>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-20" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=1;entryY=0.5;entryDx=0;entryDy=0;entryPerimeter=0;" parent="1" source="hp4KDhJicbJTmFi_2CdE-10" target="hp4KDhJicbJTmFi_2CdE-2" edge="1">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="390" y="250"/>
<mxPoint x="390" y="310"/>
</Array>
</mxGeometry>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-10" value="retrieval module" style="rounded=0;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="430" y="220" width="80" height="60" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-30" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.5;entryDx=0;entryDy=0;endArrow=block;endFill=1;" parent="1" source="hp4KDhJicbJTmFi_2CdE-17" target="hp4KDhJicbJTmFi_2CdE-29" edge="1">
<mxGeometry relative="1" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-17" value="premises per conclusion per topic" style="shape=note;backgroundOutline=1;darkOpacity=0.05;overflow=visible;whiteSpace=wrap;html=1;labelPosition=center;verticalLabelPosition=top;align=center;verticalAlign=bottom;" parent="1" vertex="1">
<mxGeometry x="541" y="220" width="60" height="60" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-32" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;endArrow=block;endFill=1;" parent="1" source="hp4KDhJicbJTmFi_2CdE-29" target="hp4KDhJicbJTmFi_2CdE-31" edge="1">
<mxGeometry relative="1" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-29" value="reranking" style="rounded=0;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="640" y="230" width="80" height="40" as="geometry"/>
</mxCell>
<mxCell id="hp4KDhJicbJTmFi_2CdE-31" value="&lt;br&gt;ranked sentence pairs" style="shape=note;backgroundOutline=1;darkOpacity=0.05;overflow=visible;whiteSpace=wrap;html=1;labelPosition=center;verticalLabelPosition=top;align=center;verticalAlign=bottom;" parent="1" vertex="1">
<mxGeometry x="740" y="220" width="60" height="60" as="geometry"/>
</mxCell>
<mxCell id="6" value="Initial Retrieval" style="text;html=1;strokeColor=none;fillColor=none;align=left;verticalAlign=middle;whiteSpace=wrap;rounded=0;dashed=1;" parent="1" vertex="1">
<mxGeometry x="340" y="30" width="90" height="30" as="geometry"/>
</mxCell>
<mxCell id="7" value="Reranking" style="text;html=1;strokeColor=none;fillColor=none;align=left;verticalAlign=middle;whiteSpace=wrap;rounded=0;dashed=1;" parent="1" vertex="1">
<mxGeometry x="630" y="30" width="90" height="30" as="geometry"/>
</mxCell>
<mxCell id="9" value="Indexing" style="text;html=1;strokeColor=none;fillColor=none;align=left;verticalAlign=middle;whiteSpace=wrap;rounded=0;dashed=1;" parent="1" vertex="1">
<mxGeometry x="50" y="30" width="90" height="30" as="geometry"/>
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>
figures/system_architecture.png

23 KB | W: | H:

figures/system_architecture.png

45.8 KB | W: | H:

figures/system_architecture.png
figures/system_architecture.png
figures/system_architecture.png
figures/system_architecture.png
  • 2-up
  • Swipe
  • Onion skin
No preview for this file type
......@@ -13,7 +13,7 @@
\sloppy
%%
%% Minted listings support
%% Minted listings support
%% Need pygment <http://pygments.org/> <http://pypi.python.org/pypi/Pygments>
\usepackage{listings}
\usepackage[english]{babel}
......@@ -25,7 +25,8 @@
\usepackage[center]{caption}
\usepackage{subcaption}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{xcolor}
% tikz configuration
\usepackage{tikz}
\usetikzlibrary{shapes.geometric, arrows, positioning}
......@@ -43,12 +44,12 @@
%% CC-BY is default license.
\copyrightyear{2022}
\copyrightclause{Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).}
Use permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).}
%%
%% This command is for the conference information
\conference{CLEF'22: Conference and Labs of the Evaluation Forum,
\conference{CLEF'22: Conference and Labs of the Evaluation Forum,
September 5--8, 2022, Bologna, Italy}
%%
......@@ -63,9 +64,10 @@
%% the authors and their affiliations.
\author{Jerome Würf}
\author{Daniel Kinzel}
\author{Maryam Khodaei}
% The following two persons were part of the seminar group, but decided not to be included
% in the published paper.
% \author{Daniel Kinzel}
% \author{Maryam Khodaei}
\address[1]{Leipzig University, Augustusplatz 10, 04109 Leipzig, Germany}
%%
......@@ -77,11 +79,11 @@
%% Keywords. The author(s) should pick words that accurately describe
%% the work being presented. Separate the keywords with commas.
\begin{keywords}
information retrieval \sep
argument retrieval \sep
semantic search \sep
reranking \sep
Touché 2022
information retrieval \sep
argument retrieval \sep
semantic search \sep
reranking \sep
Touché 2022
\end{keywords}
%%
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment