Home Sitemap Contact

Technology

Massive volumes of information are available in textual form, for instance on the Internet or on Intranets. Such a large and rapid-shifting amount of knowledge cannot be efficiently exploited on human level.

In most of standard search technologies, text and meaning representations of both query and document collection are limited to an unordered bag of keywords.

For instance, a query such as When was Mercury discovered? is equivalent to mercury discovered. Standard Information Retrieval will return documents that are the most similar to that query. If you are searching the web, there is some good chance that the date will be found nearby those keywords, along with the names of the first observers.

While this approach can be sufficient for some search tasks, it is always not satisfactory:

Either way, you will need to rephrase your query and check documents to ensure there is good evidence to support the answer you identified (not the search engine!). While this may be acceptable for personal searches, it is extremely time consuming in batch searches, for instance market watch or open source intelligence.

The issue applies to web search of course, but it is also a general search problem that applies to corporate intranets, log analysis, large technical manuals, data banks to name a few.

Linguit's approach

We have a triptych approach to search:

In our view, answering is a much more complex process than similarity matching. We see three essential steps to the answering process, articulated around intelligent information extraction, user modeling and rendering/visualization.

What we do

Our core technology is a platform dedicated to Information Retrieval and text mining. We do not offer a generic search engine for an undefined search task. Instead, we focus on task-oriented search problems. Meaning that we offer a generic architecture to perform a different kind of search, but each component is dedicated to a specific and well-defined task. These components can be used as standalone tools (for instance in the case of Named Entity Recognition), or as part of a higher-level task, such as query analysis or text indexing.

The generic aspect of the platform lies in its ability to handle different types of knowledge sources, for instance relational databases, text documents or semi-structured streams such as RSS feeds, and different types of front-ends, from graphical user interfaces to custom in-house applications or web services.

Linguit platform

The input (query) and output (search results) process of the platform is designed as an XML transaction. Customization of search parameters in one hand, and search result rendering on the other hand, requires only minimal development using XSLT. This offers an extremely versatile approach to user interfacing since both the query interface and the rendering of the search results are completely separated from the actual search process. This includes the possibility of offering different views of the search results in terms of layout and structure for a minimal cost.

The platform is also designed to handle needs for meta-searches. It can be configured to access multiple sources of information from a single front-end, and it provides components to aggregate search results from such different sources.

For each new search task, we focus our attention to the customization of components performing natural language processing. In this area, we have the expertise to assess your needs in terms of language technology requirements. This assessment usually translates into a hybrid strategy consisting in plug-in in available linguistic resources, or acquiring new knowledge for your domain through machine learning techniques.

Linguit components in a nutshell

We provide the following components as part of our solutions, and offer corresponding expertise in consulting/training projects.

ArrowLanguage identifier
ArrowSentence boundary detection
ArrowText tokenization
ArrowMorphological processing (stemming, lemmatization)
ArrowSpell checking and phonetic suggestions
ArrowGrammatical part-of-speech (POS) tagging
ArrowFast, robust syntactic phrase parser
ArrowSyntactic parser
ArrowDependency parser
ArrowSemantic analysis component
ArrowMining of patterns from textual data
ArrowNamed entity recognition and classification
ArrowCo-reference finder
ArrowToponym resolution (geo-tagger)
ArrowMachine learning for natural language processing
ArrowQuestion classification
ArrowIndexer: crawls data to construct a searchable index for fast access
ArrowText to SQL parser
ArrowRetrieve documents from a collection with high relevance
ArrowClustering

System Integration

Linguit platform is developed in Java (version 1.5 or higher), and will run on any operating system with a Java Runtime Environment (Windows, Linux/UNIX, Macintosh/BSD).

Application Integration

Linguit applications are accessible through:

In an end to end application, we can provide a full integration based on a Tomcat container, including Apache integration on UNIX servers.

User Interface Integration

Linguit platform supports multiple user interfaces, from Java GUIs to web-based interfaces. Input and output of the platform are XML-enabled, and we offer built-in XSLT transformations to facilitate layout customizations.

System Requirements

Actual requirements may vary depending on the components used and the type of text processing needed. As a default, we would recommend

Additional space requirements may be needed for :