Webhamer Weblog: Search & ICT-related blogging


links for 2009-12-23

Posted in LinkBlog by Staut on the December 23rd, 2009

links for 2009-12-21

Posted in LinkBlog by Staut on the December 21st, 2009

links for 2009-12-18

Posted in LinkBlog by Staut on the December 18th, 2009
  • Information – it’s the key to knowledge. According to industry analyst Gartner, “information access technology will locate and analyze more than 90% of data in more than 50% of Global 2000 enterprises by YE12”. To help organizations meet this challenge Mindbreeze Enterprise Search enables organizations to mature their information “ecosystem” with an easy to handle but impressively powerful enterprise search software solution.

links for 2009-12-16

Posted in LinkBlog by Staut on the December 16th, 2009

links for 2009-12-12

Posted in LinkBlog by Staut on the December 12th, 2009
  • Ashlee Vance’s insightful piece in Monday’s NYTimes on the implications of the wrangling between the EU and Larry Ellison over Sun and MySQL lit up a lot of conversation in open source circles. And with Open Source reaching something like a ten-year mark since Redhat and Linux broke forth in a big way, it’s a good time to ask the question? Is Open Source a business model, and if so, can it succeed? I think the answer lies in a more nuanced understanding of open source, from three perspectives: as a business model, as development method, and as social network.

  • Faceted search has become a critical feature for enhancing findability and the user search experience for all types of search applications. In this article, Solr creator Yonik Seeley gives an introduction to faceted search with Solr.

  • Semantic search has been the new black in the high fashion of content management and the industries around it. Nstein (news, site), a provider of Web CMS, DAM and text-mining technologies, just released a new product — which they say is more flexible, intuitive and extensible than Google Search Appliance — called Semantic Site Search, or the “new kind of site search,” as the vendor humbly refers to it.

    (tags: emid sparks)

links for 2009-12-09

Posted in LinkBlog by Staut on the December 9th, 2009
  • Solr 1.3 brings a powerful set of features that make it more attractive than ever. The rest of this article takes a look at new Solr features and how you can incorporate them into your applications. To demonstrate them, I'll build a simple application that combines an RSS feed with a rating of that feed. The ratings will be stored in a database, and the RSS feed will be taken from my Lucene blog's RSS feeds. Given this simple setup, I'll demonstrate the use of:

  • "Named entities" is the NLP jargon for proper nouns which represent people, places, organisations, and so on. This module provides a very simple way of extracting these from a text. If we run the extract_entities routine on a piece of news coverage of recent UK political events, we should expect to see it return a list of hash references looking like this:

  • CRFClassifier is a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. The software provides a general (arbitrary order) implementation of linear chain Conditional Random Field (CRF) sequence models, of the sort pioneered by Lafferty, McCallum, and Pereira (2001), coupled with well-engineered feature extractors for Named Entity Recognition. Included are a good 3 class (PERSON, ORGANIZATION, LOCATION) named entity recognizer for English (in versions with and without additional distributional similarity features) and another pair of models trained on the CoNLL 2003 English training data. The distributional similarity features improve performance but the models require considerably more memory.

  • The HTML tags on a web page must be stripped away to get clean text for a PHP search engine, keyword extractor, or some other page analysis tool. PHP's standard strip_tags( ) function will do part of the job, but you need to strip out styles, scripts, embedded objects, and other unwanted page code first. This tip shows how.

links for 2009-12-08

Posted in LinkBlog by Staut on the December 8th, 2009

. 2009 Medical Weblog adult downloads