(new) e-LiSe usability was thoroughly tested. We created automated test to check the value of e-LiSe-generated information about hereditary diseases.
As a reference we chose OMIM, the best online source of information about hereditary diseases. We selected omim entries describing well-known phenotypes (# omim data type), which contain 'disease' or 'syndrome' word in the name. Then we clustered the data, in a such way that all types of given phenotype comprised one record (eg. allagile syndrome 1 and allagile syndrome 2 as "allagile syndrome"). It was done to enable creating queries in e-LiSe for which there will be more than 20 abstracts in the Medline Database.
Two types of queries were generated. The first group comprised the queries created by means of AND logical operator (e.g. alzheimer AND disease). Under such conditions correlated words were searched in the abstracts containing all query-words, present in any order. Second group constituted multiword queries (e.g. "alzheimer disease"). The latter are supposed to be more accurate, because they limit the data analysed for results generation to abstract containing query words in the appropriate order.
We evaluated how many best correlated words in the e-LiSe results for the given phenotype are found in OMIM. The results for both types of queries are similar (e.g for 50 best correlated words respectively 74% and 76%). It is key information for us, because sometimes there is too few abstracts while using multi-queries and under such condition only logical-operators-bases-only queries are possible. Moreover such queries are less computationally-demanding and users get the results faster. Furthermore the fact described above indicates that we used appropriate statistics in our algorithm, which favours really correlated words.
Results of comparison are present here. The one can choose number of words to compare(1-100), as well as the type of query (AND-logical-based-only or multiquery). Additionally summary spreadsheet can be downloaded
We also tested the value of the correlated words not present in OMIM. We proved that thay are also closely correlated words. To do so, we chose randomly(Fisher-Yates shuffle) twenty phenotypes from the results set. Then for each of them we chose randomly one word from e-LiSe result not present in OMIM. The spreadsheet file with results and notes about them are present here.