Cluster Class
Class for coreference cluster.
Attributes
Name | Type | Description |
---|---|---|
cluster_idx | Integer | Index of the cluster in the document. |
spans_idx | List | Indexes of the spans of the cluster. |
ref_document | Document | Reference to the Document object |
head | Span | Returns the head of a cluster, which is the span that best represent the cluster. |
children | List | Returns the children of the cluster, i.e. all spans except head. |
cluster_idx
Index of the cluster in the document.
self.spans_idx = spans_idx
spans_idx
Indexes of the spans of the cluster.
self.cluster_idx = cluster_idx
ref_document
Reference to the Document object
self.ref_document = ref_document
head
Returns the head of a cluster, which is the span that best represent the cluster. This is done according to a hierarchy that use the POS tags inside the spans.
Return:
Type | Description |
---|---|
Span | Span of the head of the cluster |
hierarchy_pos_cluster = ['NP', ['PD','N'], ['D', 'N'], 'N', 'CLS', 'CLO', 'PRON', 'D', 'PROREL', 'PD', 'P', 'ENTITY']
head()
for h in hierarchy_pos_cluster:
if isinstance(h, list):
match = [s for s in self.spans if set(h) & set(s.get_attributes('pos')) == set(h)]
else:
match = [s for s in self.spans if h in s.get_attributes('pos')]
if match:
return match[0]
return self.spans[0]
children
Returns the children of the cluster, i.e. all spans except head.
Return:
Type | Description |
---|---|
List | List of children spans. |
children()
head = self.head
return [span for span in self.spans if span != head]