projects:gui4swe:start
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
projects:gui4swe:start [2015/11/02 08:37] – hj | projects:gui4swe:start [2015/11/04 02:10] (current) – hj | ||
---|---|---|---|
Line 4: | Line 4: | ||
[[http:// | [[http:// | ||
- | This is a virtualization tool to examine and improve the quality of the popular word embedding models, accessible | + | This is a virtualization tool to examine and improve the quality of the popular word embedding models, accessible |
[[http:// | [[http:// | ||
- | For a long time, people have been researching how to build a machine (or computer) that is able to understand human languages, just like we do everyday. This is normally called natural language processing (NLP), a very important research and application area in computer science. NLP is normally regarded as a flagship task in artificial intelligent (AI). However, this task seems effortless to us but it turns to be extremely difficult for machines. | + | For a long time, people have been researching how to build a machine (or computer) that is able to understand human languages. This is normally called natural language processing (NLP), a very important research and application area in computer science. NLP is normally regarded as a flagship task in artificial intelligent (AI). However, this task seems effortless to us but it turns to be extremely difficult for machines. |
- | As the first step towards NLP, we need to teach computers to understand all words used in a natural language. The popular approach | + | As the first step towards NLP, we need to teach computers to understand all words used in a natural language. The popular approach |
- | In the past few years, some methods have been proposed to learn the so-called word embedding model, to project each discrete word into a point in a continuous space [1]. These methods are typically designed to be efficient so that they can quickly process large text corpora available in the Internet, such as wikipedia. However, the word embedding models learned in this way are not perfect and they normally suffer from severe deficiencies. This virtualization tool is designed for two purposes: | + | In the past few years, some methods have been proposed to learn the so-called word embedding model to project each discrete word into a point in a continuous space [1]. These methods are typically designed to be efficient so that they can quickly process large text corpora available in the Internet, such as wikipedia. However, the word embedding models learned in this way are not perfect and they normally suffer from severe deficiencies. This virtualization tool is designed for two purposes: |
- Providing a graphical interface to virtually examine the quality of an existing word embedding model, which is learned from English wikipedia text of 5 billion words as in [2]. | - Providing a graphical interface to virtually examine the quality of an existing word embedding model, which is learned from English wikipedia text of 5 billion words as in [2]. | ||
- Using a user-friendly graphical interface to teach computers to correct mistakes to improve the quality of the above model. | - Using a user-friendly graphical interface to teach computers to correct mistakes to improve the quality of the above model. | ||
- | The virtualization tool shows the near neighbourhood of one particular word in the embedding space. For a word (randomly sampled or inputed from the search bar), it shows a certain number of words locating | + | The virtualization tool shows the near neighbourhood of one particular word in the embedding space. For a word (randomly sampled or inputed from the search bar), it shows a certain number of words located |
{{ : | {{ : | ||
- | As you may see, the model is far from perfect | + | As you may see, the model is far from perfect: many displayed neighbouring words do not make sense at all or their ranking order is not satisfactory. If you see these problems, **you may use the mouse to drag and re-rank all neighbouring words to best reflect your own linguistic knowledge. Ideally, all synonyms should be ranked closest to the central word, followed by similar words, antonyms, relevant words and so on. All irrelevant words may be dragged into the garbage bin to discard. |
** | ** | ||
- | The newly input ranking information | + | Your corrections |
Other hints for how to use this tool: | Other hints for how to use this tool: | ||
- | * You may use an ID to login at the beginning. The ID is used ONLY for saving all work you have done. | + | * You may use any ID to login. The ID is used ONLY for saving all corrections |
+ | * You may login as anonymous by clicking the symbol ' | ||
* You are asked to work on 20 random words in each session. | * You are asked to work on 20 random words in each session. | ||
- | * You may check the near neighbourhood of any word by inputing from the search bar. | + | * You may check the near neighbourhood of any word by inputing |
- | * You may change the number of neighbouring words to display (maximum | + | * You may change the number of neighbouring words to display (limited to max 50 for good viewing). |
+ | * Use the left button of the mouse to drag-and-drop to re-ranking words. | ||
+ | * Use the right button of the mouse to go to an online dictionary to look up an unknown word. | ||
+ | * Make changes only to the words you are certain and leave as is if you are unsure. | ||
+ | * Simply skip if the central word is a typo or proper noun (which you think hard to rank). | ||
+ | * Discard all typo-like neighbouring words to the garbage bin. | ||
projects/gui4swe/start.1446453421.txt.gz · Last modified: 2015/11/02 08:37 by hj