Medical Domain Search Ranking: Tutorial and Course - Medical Domain Search Ranking Tutorial and Medical Domain Search Ranking Course, The Ultimate Guide to Medical Domain Search Ranking. Learn Medical Domain Search Ranking Tutorial and Medical Domain Search Ranking Course at Medical Domain Search Ranking Tutorial and Course.
Medical Domain Search Ranking Tutorial and Course - Medical Domain Search Ranking tutorial and Medical Domain Search Ranking course, the ultimate guide to Medical Domain Search Ranking, including facts and information about Medical Domain Search Ranking. Medical Domain Search Ranking Tutorial and Course is one of the ultimate SEO Tutorials and Courses created by SEO University to help you learn and understand Medical Domain Search Ranking and the related SEO technologies, as well as facts and information about Medical Domain Search Ranking.
With the exponential growth of data stored in electronic health records (EHRs), it is imperative to identify effective means to help clinicians as well as administrators and researchers make full use of them. Recent research advances in natural language processing (NLP) have provided improved capabilities for automatically extracting concepts from narrative clinical documents. However, until these NLP-based tools become widely available and versatile enough to handle vaguely defined information retrieval needs of EHR users, a convenient and cost-effective solution continues to be in great demand. In this Tutorial and Course for Medical Domain Search Ranking, we introduce the concept of medical information retrieval, which provides medical professionals a handy tool to search among unstructured clinical narratives via an interface similar to that of general-purpose Web search engines, e.g., Google. we also introduce several advanced features, such as intelligent, ontology-driven medical search query recommendation services and a collaborative search feature that encourages sharing of medical search knowledge among end users of EHR search tools.
Each patient visit to an outpatient care facility or hospital stay in an inpatient ward generates a great volume of data, ranging from physiological measurements to clinician judgments and decisions. Such data are not only important for reuse in current and later care episodes, they are also critical in supporting numerous secondary-use scenarios, including epidemic surveillance, population health management, and clinical and translational research. It is widely believed that effective and comprehensive use of patient care data created in day-to-day clinical settings has the potential to transform the healthcare system into a self-learning vehicle to achieve better care quality, lower costs, and faster and greater scientific discoveries.
The launch of the health IT incentive program established through the American Recovery and Reinvestment Act of 2009 has stimulated widespread adoption of health IT systems in the United States, electronic health records (EHRs) in particular. Medical professionals' everyday interactions with such systems have in turn led to the exponential growth of rich, electronically captured data at the patient level, providing great promise for large-scale computational reuse. However, the increasing availability of electronic data does not automatically warrant the increasing availability of information. A majority of clinical documents continue to exist in an unstructured, narrative format in the EHR era; these documents are extremely difficult to process due to many characteristics unique to narrative medical data, such as frequent use of nonstandard terminologies and acronyms. Recent studies have also shown that the quality of data stored in EHRs, compared to the quality of data recorded in paper forms, has deteriorated considerably due to the inappropriate use of electronic documentation features such as automated fill-in and copy-and-paste. As a result, onerous, costly, and error-prone manual chart reviews are often needed in order to reuse the data in direct patient care or to prepare it for secondary-use purposes.
It is therefore imperative to identify effective means to help clinicians, as well as administrators and researchers, retrieve information from EHRs. Recent research advances in natural language processing (NLP) have provided improved capabilities for automatically extracting concepts from narrative clinical documents. However, until these NLP-based tools become widely available and versatile enough to handle vaguely defined information retrieval needs of EHR users, a convenient and cost-effective solution continues to be in great demand. In this Tutorial and Course for Medical Domain Search Ranking, we introduce the concept of medical information retrieval, which provides medical professionals a handy tool to search among unstructured clinical narratives via an interface similar to that of general-purpose Web search engines such as Google, and we also introduce several advanced features, such as intelligent, ontology-driven medical search query recommendation services and a collaborative search feature that encourages sharing of medical search knowledge among end users of EHR search tools.
This Tutorial and Course for Medical Domain Search Ranking will focuses on information retrieval systems for electronic health records in a clinical setting, to be distinguished from information retrieval systems for biomedical literature, such as PubMed, and those for consumer-oriented health information, such as MedlinePlus. Interested readers can refer to Hersh, 2009, for information retrieval systems of biomedical literature and consumer-oriented information.
Clinicians and researchers routinely search medical records, but today they are doing so in a highly inefficient manner. Many of them simply go through each document manually and read through each clinical note to find the information they are looking for, a simple procedure known as chart review or chart abstraction. The tedious effort of manual search sometimes returns no results, either because the information was not there at all or because it was overlooked. An automatic search engine becomes essential in the modern setting of retrieving electronic health records.
Although there are multiple search engines to assist with searching medical literature or health-related Web pages, adoption of search engines for EHRs remains limited. This may, in part, be due to a lack of understanding of the information needs of clinicians and researchers compared to those of the general population. Additional factors limiting the widespread adoption of EHR search engines include the complex medical information contained in clinical documents and the inadequacy of standard search engines to meet users' needs. As a result, customized solutions are required. Even with such solutions, obtaining the proper approvals from a medical information technology department to integrate and support such a system and meeting all regulatory and privacy requirements are ongoing challenges that also limit the number of investigators who are able to work with such protected data in the clinical environment.
Only a few medical record search engines have been reported, and even among those it is difficult to know what level of adoption or usefulness has been achieved. The StarTracker system at Vanderbilt University was discussed in a brief report from 2003. At the time, it was available to 150 pilot users. It was reported to have been used successfully for cohort identification of clinical studies and to help with physician evaluation of clinical outcomes.
Columbia University also has a search engine, CISearch, that has been integrated with its locally developed EHR, WebCIS, since 2008. Supporting the notion that the information needs of an EHR search engine differ from those of standard search engines, the CISearch tool limits searches to a single patient at a time, and the system does not rank documents but rather displays the results in reverse chronological order, which is a common data view in medical record systems. The system was reported to have incorporated a limited number of document types, including discharge summaries, radiology reports, and pathology reports.
Other research systems have also been reported to support searching clinical documents, although these are not explicitly labeled as search engines. An example is the STRIDE system at Stanford University, which has been shown to be of benefit for clinical decision making. Additionally, the Informatics for Integrating Biology and the Bedside (i2b2) Workbench tool has been modified to handle searching free text notes.
The University of Michigan also have had a search engine in their production environment since 2005. The Electronic Medical Record Search Engine (EMERSE) was developed to support users with an efficient and accurate means of querying the repository of clinical documents. As of April 2013, they had over 60 million clinical documents and reports from approximately 2 million patients. EMERSE is used for a wide variety of information retrieval tasks, including research (e.g., cohort identification, eligibility determination, and data abstraction), quality improvement and quality assurance initiatives, risk management, and infection control monitoring.
From a technical perspective, EMERSE utilizes Apache Lucene to index and retrieve documents, but the Web application itself has been modified substantially to meet the needs of medical search. For example, stop words are not removed from the index, since many are themselves important acronyms. Examples include AND (axillary node dissection), ARE (active resistance exercise), and IS (incentive spirometry). Additionally, two indices are maintained of all documents: a standard lowercased index for the default case-insensitive searches and a case-sensitive index so users can distinguish medical terms from potential false positives. An example is the need to distinguish ALL (acute lymphoblastic leukemia) from the common word all.
EMERSE contains a large vocabulary of synonyms and related concepts that are presented to users to expand their search queries. For example, searching for "ibuprofen" would bring up suggestions that include brand names such as "Advil" and "Motrin" as well as common misspellings derived from our search logs, including "ibuprofin" and "ibuprophen." As of April 2013, the synonym list contained 45,000 terms for about 11,000 concepts.
There are many aspects of clinical documents that make information retrieval challenging. These include the use of ambiguous terminology, known as hedge phrases (e.g., "the patient possibly has appendicitis"), as well as negation (e.g., "there is no evidence of appendicitis in this patient"). EMERSE provides a mechanism for handling negation that is easy for users to implement, but that deviates from standard practices for "typical" search engines. Negation can be achieved by adding exclude phrases, which are phrases that contain a concept of interest but in the wrong context. These phrases are notated with a minus sign in front of them, which tells the system to ignore those specific phrases but not the overall document itself. Thus, one can look for "complications" but ignore "no complications occurred." Indeed, this was recently done for the Department of Ophthalmology as part of a quality assurance initiative to identify post-operative complications. The term "complication" or "complications" was searched and a collection of approximately 30 negated phrases were excluded. This greatly reduced the false-positive rate for the searches and allowed the department to conduct an efficient and accurate search for potential complications.
The EMERSE system also provides a framework to encourage the collaborative sharing and use of queries developed by others. Saved searches, called bundles, can be created by any user and either shared or kept private. Many users have shared their bundles, and many bundles have been used by a wide variety of users. This paradigm allows users with specific medical expertise to share their knowledge with other users of the system, allowing those with less domain expertise to benefit from those with more. Bundles also provide a means for research teams to standardize their searches across a large group to ensure that the same processes are applied to each medical record. We believe social search features an important functionality of the next generation of medical search engines.
Another study looked into the feasibility of using a tool such as EMERSE to collect data for the American College of Surgeons, National Surgical Quality Improvement Program (ACS NSQIP). EMERSE was used to identify cases of post-operative myocardial infarction pulmonary embolus, a process that is traditionally performed manually. Utilizing the negation components of EMERSE was essential in these tasks to rule out false-positive results.
According to the survey, EMERSE users were using the search engine for various kinds of tasks. Two-thirds used the search engine to determine medication use for patients. Nearly as many users reported using EMERSE for assisting with clinical trials. Other uses included detection of adverse events, determining eligibility for clinical studies, infection surveillance, internal quality assurance projects, study feasibility determination, billing/claims abstraction, and risk management review. As a comparison, researches of the Columbia University’s EHR search engine revealed that searches for laboratory or test results and disease or syndromes constituted the majority of the search purposes.
The availability of longitudinal collection of search logs of EHR search engines made it possible to quantitatively analyze EMERSE users' search behaviors. The results suggest that information needs in medical searches are substantially more complicated than those in general Web searches. Specifically, the frequency of medical search queries does not follow a power-law distribution, as that of Web search queries does. A medical search query contains five terms on average, which is two times longer than the average length of a Web search query. Users of the EHR search engine typically start with a very short query (1.7 terms on average) and end up with a much longer query through a search session. A session of EHR search is considerably longer than a session of Web search, in terms of both the time period (14.8 minutes on average) or the number of queries (5.64 on average). All of these points suggest that it is substantially more difficult for users to compose an effective medical search query than a general Web search query.
In what aspects are the medical search queries more difficult? It is reported that more than 30% of the query terms are not covered by a common English dictionary, a medical dictionary, or a professional medical ontology, compared to less than 15% of terms in Web searches. Furthermore, 2,020 acronyms appeared 55,402 times in the EMERSE query log, where 18.9% of the queries contain at least one acronym.
The low coverage of query terms by medical dictionaries not only implies the substantial difficulty of developing effective spell-check modules for EHR search, but it also suggests the need to seek beyond the use of medical ontologies to enhance medical information retrieval. One possible direction leads towards deeper corpus mining and natural language processing of electronic health records.
Moreover, the categorization of medical search queries is substantially different from those in Web searches. The well-established categorization of information needs in Web searches into navigational, informational, and transactional queries does not apply to medical search. This calls for a new categorization framework for medical search queries, and there is an interesting question to ask to what extent such a semantic categorization is useful in enhancing the performance of an EHR search engine.
All the findings from the analysis of real users suggest that medical search queries are in general much more sophisticated than Web search queries. This difficulty imposes considerable challenges for users to accurately formulate their search strategies. There is an urgent need to design effective mechanisms to assist users in formulating and improving their queries. These findings motivate many important directions for improving EHR search engines.
Conventional search engines for medical records do not provide the functionality of ranking matched results as is commonly done by Web search engines. This largely contributes to the uniqueness of the information needs in medical search. Indeed, when users search for medical records, the concept of a "hit" is different where there are usually no "best document" or "most relevant" records to rank and display. Instead, the final "answer" depends heavily on the initial question and is almost always determined after the user views a collection of documents for a patient, not a single document.
The heavy dependence on manual reviewing is largely due to the uncertainty and ambiguity that is inherent in medical records. Consider an example: A clinician is trying to identify patients diagnosed with asthma. The diagnosis of asthma for a young child can be difficult. When a child with difficulty breathing is on an initial clinic visit, the clinical note may mention terms such as wheezing, coughing, reactive airway disease, and, perhaps, asthma. However, this does not mean that the patient actually has asthma, because many children may develop wheezing during an upper respiratory viral infection. Observations of documented recurrent episodes have to be done before one might conclude that a child truly has asthma, and one should also take into account the prescribed medications and the changes in the patient's condition based on the use of those medications. Therefore, a single medical record mentioning asthma, regardless of how frequently it is mentioned in the document, cannot truly provide a confident diagnosis. Therefore, in conventional EHR searches, the requirement of a high recall is usually more important than a high precision. The ranking of search results is not critical, since all the retrieved records will be manually reviewed by the users.
It was not until recently that designers of EHR search engines started to adopt relevance ranking in the retrieval architecture. On one hand, the volume of electronic records is growing rapidly, making it difficult for users to manually review all the documents retrieved. On the other hand, the recent development of natural language processing for EHRs has made it possible to handle negation, uncertainty, and ambiguity to a certain degree. Relevance ranking has become appealing in many scenarios of medical record search, especially in scenarios with a need for high precision, such as identifying patients for clinical trials.
To summarize, it is a common practice to enhance relevance ranking in medical records searches by introducing concept-level relevance assessment and query expansion, that is called a proof-of-concept prototype of the next generation of EHR search engine. The next generation of EHR search engine features a concept-level relevance ranking and a component of query recommendation. As a proof of concept, the search engine adopted straightforward query recommendation and relevance ranking algorithms. Under the same architecture, search engine could apply more sophisticated query suggestion methods and/or relevance ranking methods.
In general, the new EHR search engine advances EMERSE and other existing full-text search engines for medical records by assessing document relevance and recommending alternative query terms at the level of medical concepts instead of at the level of individual words. A document is retrieved because one of the medical concepts implied by its terms matches the concepts in the query. Relevant documents are ranked based on how well the concepts in the documents match the concepts in the query, through classical retrieval methods extended to the concept level.
In general, concept-level relevance assessment and automatic query expansion play an important role in relevance ranking of medical record searches. In this way, the challenge introduced by the sophistication of the medical domain and the information needs are alleviated to a certain degree. However, there are still quite a few barriers to solving all the problems with automatic query expansion or recommendation. In a recent paper summarizing the failure analysis of the TREC medical record track, the authors listed a few common causes of retrieval errors, including irrelevant chart references to medical terms; variation of the uses and spellings of terminology; ambiguity of different conditions with similar names; ambiguity among past, present, and future conditions or procedures, and imperfect treatments of negations and uncertainties.
In the long run, most of these barriers may be overcome with more advanced natural language processing techniques for electronic health records. An alternative approach to helping users formulate the effective queries, however, may go beyond machine intelligence and require the utilization of social intelligence. The following section describes our exploration of collaborative searches, which allows users to disseminate search knowledge in a social setting.
Collaborative search, that is an alternative to the methods based on machine intelligence. With such an approach, the users of medical record search engines are enabled to preserve, disseminate, and promote search knowledge that leads to satisfactory search results. A promising future direction of medical information retrieval will be incentive-centered designs that better engage users with the collaborative search features.
An intelligent search engine does not solve all problems of information retrieval for medical records; its performance depends highly on the quality of search queries that users are able to construct. Average users often do not have adequate knowledge to construct effective and inclusive search queries, given that users usually revise their queries multiple times through a search session and frequently adopt system-generated or socially disseminated suggestions.
One way to address this issue that has drawn increasing attention is the concept of social information seeking or collaborative search. Such a process enables users to collaboratively refine the quality of search queries as well as the quality of information resources.
To quantify and compare the effectiveness of publicly shared bundles and privately shared bundles in the diffusion of search knowledge, analysis was conducted on a few creator-consumer networks, the nodes of which represent EMERSE users, and a (directed) edge connects the creator of a bundle to the consumer of the bundle.
Overall, a creator-consumer network established through collaborative search presents a high average degree, a high clustering coefficient, and a small average shortest path compared to random networks of the same size. This indicates that the collaborative search feature has successfully facilitated the diffusion of search knowledge, forming a small-world network of search users. Between privately shared bundles and publicly shared bundles, the latter seems to be slightly more effective, with the combination of the two types of bundles significantly more effective than either individual type.
An analysis through comparing the search knowledge diffusion networks with a hypothetical social network of users constructed by connecting users who had ever typed in the same queries has revealed a big potential gain if we had a better mechanism of collaborative search. The hypothetical network was featured with much fewer singletons, a much higher average degree, and a much shorter average shortest path length. The findings of this study suggest that although the collaborative search feature had effectively improved search-knowledge diffusion in medical record search, its potential was far from being fully realized.