OpenHowNet package

Submodules

OpenHowNet.BabelNetSynset module

BabelNetSynset Class

class OpenHowNet.BabelNetSynset.BabelNetSynset(babelnet_synset)

Bases: object

BabelNet synset class. Contains the abundant information in the BabelNet.

Attributes:

id (str): The unique identity of the BabelSynset in BabelNet. cat (str): The category of the BabelSynset en_synonyms (list): The English synonyms in the BabelSynset. zh_synonyms (list): The Chinese synonyms in the BabelSynset. en_glosses (list): The English glosses in the BabelSynset. zh_glosses (list): The Chinese glosses in the BabelSynset. related_synsets (dict):

The related BabelSynsets and the corresponding relations.

sememes (list):

The sememes labeled to the BabelSynset.

get_image_url_list()

Get the image url list of the synset.

Get the synsets related with the synset. You can set the relation to get the synsets that have the exact relation with the synset.

Args:

relation(str) : set the relation between target synset and retrieved synsets. return_triples(bool) : whether to return the triples or the synsets.

get_sememe_list()

Get the sememe list labeled to the synset.

OpenHowNet.Download module

OpenHowNet.Download.download()

Download the HowNet resource file. The HowNet resource file is openhownet_data.zip.

OpenHowNet.Download.download_file(url, dest_file=None)

Download resources files from url to dest path.

Args:
url (str):

download url of resource file.

dest_file (str):

target download path.

OpenHowNet.Download.get_resource(path, mode='r', encoding='utf-8')

Open the resource file.

OpenHowNet.HowNetDict module

HowNetDict Class

class OpenHowNet.HowNetDict.HowNetDict(init_sim=False, init_babel=False)

Bases: object

Class for running the OpenHowNet API. Provides a convenient way to search information in HowNet, display sememe trees, calculate word similarity via sememes, etc.

Example:

>>> # Initialize the OpenHowNet
>>> import OpenHowNet
>>> hownet_dict = OpenHowNet.HowNetDict()

>>> # Search a word in HowNet and get a list of senses contain the word
>>> result_list = hownet_dict.get_sense("苹果")

>>> # Visualize the sememe tree of the sense
>>> hownet_dict.get_sememes_by_word('苹果', display='visual')
calculate_word_similarity(word0, word1, strict=True)

Calculate the word similarity between two words via sememes Args:

word0 (str): target word #0 word1 (str): target word #1 strict (bool):

you can choose to search the sense strictly or not.

Returns:

(float) the word similarity calculated via sememes. If word0 or word1 does not exist in HowNet annotation, it will return -1 If the initialization method of word similarity calculation has not been called yet, it will also return 0.0 and print corresponding error message.

get_all_babel_synsets()

Get the complete BabelNet synsets.

Returns:

(list[BabelNetSynset]) a list of all BabelNet synsets.

get_all_sememe_relations()

Get all the relations between sememes in HowNet.

Returns:

(list[str]) all the relations between sememes in HowNet.

get_all_sememes()

Get the complete sememes in HowNet.

Returns:

(list[Sememe]) a list of all sememes

get_all_sense_pos()

Get all the pos of words in senses in HowNet.

Returns:

(list[str]) the pos of the words in HowNet.

get_all_senses()

Get the complete senses in HowNet

Returns:

(list[Sense]) a list of all senses

get_all_synset_pos()
get_all_synset_relations()

Return all the relations between synsets in BabelNet.

get_en_words()

Get all English words annotated in HowNet

Returns:

(list[str]) All annotated English words in HowNet.

get_nearest_words(word, language=None, score=False, pos=None, merge=False, K=10, strict=True)

Get the topK nearest words of the given word, the word similarity is calculated based on HowNet annotation. If the given word does not exist in HowNet annotations, this function will return an empty list.

Args:

word (str): target word. language (str):

specify the language of the word and the search result, you can choose from en/zh.

score (bool):

you can choose to get the similarity score between the words.

pos (str):

you can set the part of speech of the word.

merge (bool):

you can choose to merge the words of all the result senses into one list.

K (int):

specify the number of the nearest words you want to retrieve.

strict (bool):

you can choose to search the word strictly or not.

Returns:

(list) a list of the nearest K words. if merge==False, returns a list of senses retrieved by the word and their synonym seperately. If the given word does not exist in HowNet annotations, this function will return an empty list.

Show all sememes that x has any relation with. By setting the relation you can get the sememes that have the exact relation with the target sememe.

Args:

x (str): the word to search the sememe. return_triples (bool):

you can choose to get the list of triples or just the list of the sememes.

strict (bool):

you can choose to search the sememe relation strictly by the word. set to False if you are not sure about the x.

Returns:

(list) a list contains sememe triples or contains sememes.

Show all BabelNet synset that x has any relation with. By setting the relation you can get the synsets that have the exact relation with the target synset.

Args:

x (str): the word to search the synset. return_triples (bool):

you can choose to get the list of triples or just the list of the synsets.

strict (bool):

you can choose to search the synset relation strictly by the word. set to False if you are not sure about the x.

Returns:

(list) a list contains synset triples or contains synsets.

get_sememe(word, language=None, strict=True)

The commen sememe search API. you can specify the language of the target word to boost the search performance. Besides if you are not sure about the word, you can set strict to False to fuzzy match the sememe.

Args:
word (str):

target word.

language (str):

target language, default: None. (The func will search both in English and Chinese, which will consume a lot of time.) you can set to en or zh, which means search in English or Chinese

strict (bool):

whether to search the sense strictly.

Returns:

(list[Sememe]) candidates HowNet sememes, if the target word does not exist, return an empty list.

get_sememe_relation(x, y, return_triples=False, strict=True)

Show relationship between two sememes. The function will search for the sememes by the words and retrieve the relation of two sememe.

Args:

x (str): the word #0 to search the sememe. y (str): the word #1 to search the sememe. return_triples (bool):

you can choose to get the list of triples or just the list of the relations.

strict (bool):

you can choose to search the sememe relation strictly by the word. set to False if you are not sure about the x and y.

Returns:

(list) a list contains sememe triples or a list contains relations. Note that x is the head sememe and y is the tail sememe in the triples.

get_sememes_by_word(word, display='list', merge=False, expanded_layer=- 1, K=None)

Commen sememe search API. Given specific word, you can get corresponding HowNet annotations. The result can be display in various forms.

Args:
word (str):

Specific word(en/zh/id) you want to search in HowNet. You can use “*” to specify that you need annotations of all words.

display (str):

How to display the sememes you retrieved, you can choose from tree/dict/list/visual.

merge (bool):

Only works when display == ‘list’. Decide whether to merge multi-sense word query results into one

expanded_layer (int):

Only works when display == ‘list’. Continously expand k layer. By default, it will be set to -1 (expand full layers)

K (int):

Only works when display == ‘visual’.The maximum number of visualized words, ordered by id (ascending). Illegal number will be automatically ignored and the function will display all retrieved results.

Examples:

>>> # Returns the sememe tree of the retrieved senses in the form of dict
>>> hownet_dict.get_sememes_by_word('苹果')
>>> # Returns the root node of sememe tree of the retrieved senses in the form of anytree
>>> hownet_dict.get_sememes_by_word('苹果', display='tree')
>>> # Returns the sememe list of the retrieved senses separately
>>> hownet_dict.get_sememes_by_word('苹果', display='list')
>>> # Returns the sememe list of the retrieved senses merged into one
>>> hownet_dict.get_sememes_by_word('苹果', display='list', merge=True)
>>> # Visualize the sememe tree
>>> hownet_dict.get_sememes_by_word('苹果', display='visual')
get_sememes_by_word_in_BabelNet(x, merge=False)

The sememe search API based on BabelNet synsets. Given specific word, you can get corresponding sememe annotations.

Args:

x(str): the target word to search for the sememes. merge(bool): whether to merge the results into one.

get_sense(word, language=None, pos=None, strict=True)

Common sense search API, you can specify the language of the target word to boost the search performance. Besides if you are not sure about the word, you can set strict to False to fuzzy match the sense.

Args:

word (str) : target word. language (str) :

target language, default: None. (The func will search both in English and Chinese, which will consume a lot of time.) you can set to en or zh, which means search in English or Chinese.

pos (str) : limit the part of speech of the result. strict (bool) : whether to search the sense strictly.

Returns:

(list[Sense]) candidates HowNet senses, if the target word does not exist, return an empty list.

get_sense_synonyms(sense)

Get the senses that have the same sememe annotation with the sense

Args:

sense(Sense) : the targe sense to search the synonyms.

Returns:

(list[Sense]) the list of senses that have the same sememe annotation with the sense.

get_senses_by_sememe(x, strict=True)

Get the senses labeled by sememe x.

Args:

x (str) : the word to search the sememe.

Returns:

(list[Sense]) The list of senses which contains the sememe x.

get_synset(word, language=None, pos=None, strict=True)

Get the synset by the word. You can choose to set the limit of the language of the word.

Args:

word(str): target word to search for the synset. language(str): the language of the retrieved word. strict(bool): whether to search for the synset by word strictly. pos(str): limitation on the result. Can be set to a/v/n/r.

Returns:

(list[BabelNetSynset]) the list of retrieved synsets.

get_synset_relation(x, y, return_triples=False, strict=True)

Get the relation between two synsets. The function will search for the candidate synsets by x and y.

Args:

x(str): the word #0 to search the synset. y(str): the word #1 to search the synset. return_triples(bool): whether to return the triples. strict(bool): whether to search for the synsets strictly.

Returns:

(list) list contains the relations or triples.

get_zh_words()

Get all Chinese words annotated in HowNet

Returns:

(list[str]) All annotated Chinese words in HowNet.

has(item, language=None)

Check that whether certain word(English Word/Chinese Word/ID) exist in HowNet Only perform exact match because HowNet is case-sensitive By default, it will search the target word in both the English vocabulary and the Chinese vocabulary

Args:

item (str):target word to be searched in HowNet language (str):specify the language of the target search word

Returns:

(bool) whether the word exists in HowNet annotation

initialize_babelnet_dict()

Initialize the BabelNet Synset dict.

initialize_similarity_calculation()

Initialize the similarity calculation via sememes. Implementation is contributed by Jun Yan, which is based on the paper : “Jiangming Liu, Jinan Xu, Yujie Zhang. An Approach of Hybrid Hierarchical Structure for Word Similarity Computing by HowNet. In Proceedings of IJCNLP”

OpenHowNet.Sememe module

Sememe Class

class OpenHowNet.Sememe.Sememe(hownet_sememe, freq)

Bases: object

Sememe class. The smallest semantic unit. Described in English and Chinese.

Attributes:

en (str): English word to describe the sememe. zh (str): Chinese word to describe the sememe. freq (int):

the sememe occurence frequency in HowNet.

related_sememes (dict):

the sememes related with the sememe in HowNet.

Get the sememes related with the sememe.

Args:

relation (str) : set the limitation on the relation between target sememe and the retrieved sememes. return_triples (bool):

you can choose to return the list of triples or return the list of related sememes.

Returns:

(list) the list of triples or return the list of related sememes.

get_senses()

Get the senses annotated with the sememe. Initialized by HowNetDict.__init__()

Returns:

(list[Sense]) the list of the senses annotated with the sememe.

OpenHowNet.Sense module

Sense Class

class OpenHowNet.Sense.Sense(hownet_sense)

Bases: object

Contains variables of a sense. Initialized by an item in HowNet. Contains numbering, word, POS of word, sememe tree, etc.

Args:
hownet_sense (dic):

Dict contains the annotation of the sense in HowNet.

get_sememe_list(layer=- 1)

Expand the sememe tree by iteration. Return the sememe set of the tree.

Args:
layer(int):

the layer num to expand the tree.

Returns:

(list[Sememe]) the sememe set of the sememe tree.

get_sememe_tree(return_node=False)

Generate sememe tree for the sense by the Def.

Args:
return_node(bool):

whether to return as anytree root node.

Returns:

(dict`or`anytree.Node) the sememe tree of the sense in the form of dict or the root node of the sememe tree.

visualize_sememe_tree()

Visualize the sememe tree by sense Def.

Returns:

(str) the visualized sememe tree.

OpenHowNet.version module

Module contents

Welcome to the API references for OpenHowNet!

OpenHowNet <https://github.com/thunlp/OpenHowNet> provides a convenient way to search information in HowNet, display sememe trees, calculate word similarity via sememes, etc.