What php search engine would you use?
I'm facing a challenge with a customer running a site we built on a hosting environment that is quite restrictive. We used to use htdig to add search functionality, but the hoster doesn't allow execution of binaries.
I've taken a look at several search engine implementations, but so far, none match all requirements:
- Zend Framework's lucene search: requires PHP5, but we have to run it on PHP4
- phpdig: uses exec() to perform the search, but we are running in safe mode
- google: layout not flexible enough
- htdig: requires execution of binaries, which is not allowed
- perlfect: requires execution of perl, which is not allowed
Maybe isearch is an option but I cannot find a lot of info about it.
The requirements are simple, but restrictive:
- Should be usable/integratable in a PHP4 based site
- Should run with safe mode on
- Layout should be customisable
- Should not require execution of perl or binaries, at least not for the search part (uploading an index that is created on a separate machine is somewhat acceptable although not prefered)
- Should be spider based, so no database query based search engine
Free Software/Open Source/Free as in Free Beer would be nice, but is no strict requirement.
Anybody any suggestion? What are you guys using?
Tags: google, htdig, perlfect, PHP, phpdig, search, zend framework



March 8th, 2006 at 3:15 pm
Does the limitation on binaries extend to extensions? I’m only asking because if you can sneak in an extension that you can load dynamically, you could use Xapian (which powers BeebleX), which fulfills all the other requirements. I’m not sure if that’s going to fly with safe_mode on, though.
March 8th, 2006 at 3:43 pm
You could have a look at:
http://www.phpdig.net/index.php
I currently don’t remember if it strictly requires a shell script to be run, but I think it does. However, as it’s written in PHP, it should be fairly easy to take that shell script and hack it a bit so that it can be invoked via HTTP request.
But, reading your requirements I can’t help but think: Really man, if *that’s* your limitations and you really *need* a search engine… for heaven’s sake pay the $0.02 extra per month and get a real hosting service!
March 8th, 2006 at 4:23 pm
I know this isn’t always possible, but it seems like it would be easier and better in the long run to find more accommodating hosting.
March 8th, 2006 at 4:25 pm
Actually, the company I work for provides better hosting, but in this case, that’s not an option. Choice of hosting is not always a matter of price unfortunately
March 8th, 2006 at 4:27 pm
May be there’s some ideas you can get from Dokuwiki, which contains it’s own search engine:
[url]http://dev.splitbrain.org/view/darcs/dokuwiki/inc/search.php[/url]
[url]http://dev.splitbrain.org/view/darcs/dokuwiki/lib/exe/indexer.php[/url]
This is Dokuwiki-specific but perhaps some of the ideas can be re-used, in particular the approach to indexing, using a web bug - tried to explain that a little [url=http://www.sitepoint.com/blogs/2005/11/03/web-bugs-for-job-scheduling-hack-or-solution/]here[/url]
March 8th, 2006 at 4:51 pm
You could port ZF Lucene to PHP4. I dont think it’d be much of a struggle. Though would have to take make sure the current limitations of ZF implementation are ok.
March 8th, 2006 at 5:16 pm
Take a look at the new http://www.php.net search:
[url=http://www.php.net/results.php?q=demo&l=en&p=all]http://www.php.net/results.php?q=demo&l=en&p=all[/url]
It uses Yahoo!’s web services API.
March 8th, 2006 at 6:50 pm
I use ZOOM, wich comes in several flavors, e.g. PHP. It works like a dream and the layout is flexible. Have a look: http://www.wrensoft.com/
the best, in my experience
March 8th, 2006 at 9:28 pm
Have you tried the google search webservice? http://www.google.com/apis/
March 9th, 2006 at 12:14 am
I second the recommendation for Yahoo! Search API
http://developer.yahoo.net/search/
http://pear.php.net/manual/en/package.webservices.services-yahoo.php
March 9th, 2006 at 1:37 am
sphider
http://www.cs.ioc.ee/~ando/sphider/
March 9th, 2006 at 3:23 pm
# Zend Framework’s : well, do u want to reprogram entirely your application with a full framework ?
# google: well, do you seriously want to wait until most of your content is indexed ??
The CMS SPIP (http://www.spip.net) offers an integrated search engine that works even with safe_mode and shared web servers. The indexing process is diluted on fiew http requests.
March 9th, 2006 at 3:28 pm
I ran into this situation not too long ago and decided Sphider was right for me: http://cs.ioc.ee/~ando/sphider/
It doesn’t take a lot to run and should fit quite nicely into your requirements. You can also add in things like PDF, DOC, PPT indexing, etc… if at any point your hosting provider allows you to run a few open source apps on their server.
~Matt Simpson
March 9th, 2006 at 5:22 pm
Here’s an interesting standards compliant solution for smaller sites: http://www.gr0w.com/amos/growsearch/. Indexing is done on the fly, so it’s not a good solution for larger sites.
March 9th, 2006 at 5:32 pm
I don’t know what your budget is over there, but you could try Google mini
http://www.google.com/enterprise/mini/
Anyway if you want to host the search engine by yourself, _and_ you want the search engine to use spidering, but you can’t run executables, how are you planning to initiate a spider application? That said, using PHP directly for live searching without spidering sounds icky to me (although I don’t know about how much data we’re talking about)
March 9th, 2006 at 6:35 pm
http://fenec.noplay.net
March 9th, 2006 at 11:26 pm
[url=http://www.isearchthenet.com/isearch/]isearch[/url] is used for the [url=http://blog.dreamhosters.com/kbase/]DreamHost Knowledge Base Mirror[/url]. Works pretty well.
March 22nd, 2006 at 10:21 pm
You can initiate using CRON on Linux server…ofc, if you have such option
March 24th, 2006 at 7:51 am
Ivo Jansch de Achievo s’est posé la même question que moi : comment implémenter un moteur de recherche en PHP pour son site ?
Classiquement, les moteurs de recherche sont en deux parties :
Un composant d’indexation : qui permet de rajou…
November 14th, 2006 at 1:02 am
Hello!
Why don’t you change hosting like Markus said? I usually promote DreamHost, but they also have some restrictions (you can’t use fopen, just cURL), so you better check before signing up. The first year will be almost free with the code we affiliates can give. Mine is MAXIMUMPROMO , and it’s a $97 discount, the maximum affiliates can give. I won’t earn a dime from your direct referral.
More information is here: http://dreamhost97.wordpress.com/