Skip to main content
Version: 2.0

Cluster Class

Class for coreference cluster.

Attributes

NameTypeDescription
cluster_idxIntegerIndex of the cluster in the document.
spans_idxListIndexes of the spans of the cluster.
ref_documentDocumentReference to the Document object
headSpanReturns the head of a cluster, which is the span that best represent the cluster.
childrenListReturns the children of the cluster, i.e. all spans except head.

cluster_idx

Index of the cluster in the document.

self.spans_idx = spans_idx

spans_idx

Indexes of the spans of the cluster.

self.cluster_idx = cluster_idx

ref_document

Reference to the Document object

self.ref_document = ref_document

Returns the head of a cluster, which is the span that best represent the cluster. This is done according to a hierarchy that use the POS tags inside the spans.

Return:

TypeDescription
SpanSpan of the head of the cluster
hierarchy_pos_cluster = ['NP', ['PD','N'], ['D', 'N'], 'N', 'CLS', 'CLO', 'PRON', 'D', 'PROREL', 'PD', 'P', 'ENTITY'] 

head()

for h in hierarchy_pos_cluster:
if isinstance(h, list):
match = [s for s in self.spans if set(h) & set(s.get_attributes('pos')) == set(h)]
else:
match = [s for s in self.spans if h in s.get_attributes('pos')]
if match:
return match[0]
return self.spans[0]

children

Returns the children of the cluster, i.e. all spans except head.

Return:

TypeDescription
ListList of children spans.
children()

head = self.head
return [span for span in self.spans if span != head]