Zeroshot model based on cosine distance of embedding vectors.
This changes the default activation to identity function (lambda x:x)
Args:
mode: one of ("vanilla", "max", "mean", "max_mean", "attention", "attention_max_mean"). determines how the sequence are weighted to build the input representation
entailment_output: the format of the entailment output if NLI pretraining is used. (experimental)