Technology at Geneseo Community School District 228
I spent considerable time this week addinga search engine into the existing www.dist228.org website. Some would approach adding a search engine as simply creating a sitemap submitting to google and then using their api to embed a search engine. I would argue that this approach is not to be desired and is generic. First think about every major modern internet browser, Firefox, IE, Opera, Safari, SeaMonkey they all have a google search buillt into the browser itself. Meaning if you want to search google then you would simply use this feature in the web browser itself. Second I find it somewhat annoying when you go to use a search engine on a specific site and you end up seeing a bunch of google results, results not connected to the website and further advertisements at the top of your search results.
Okay so without using Google or an preexisting major search engine instead take a look at the open source community. After looking at few choices I decided to use the GNU General Public License version of Sphider 1.3.5. This project is written primarily in PHP (like the District Site) and is a series of scripts that you are free to change/add/modify to your hearts desire( see the advantage of GNU here).
To work nicely into the website I removed the form_search.html file and instead wrote a new <form> tag utilizing the search.php file and embedding the search bar directly into the website tool bar itself. Since the Form function was embedded into a table I used CSS position to shrink the bar and move it uniformly into the search bar – this process took me some time as I was not fully versed in relative positioning with CSS – but with this experience I can see that is quite useful.
To use Sphider without using its build it Search_form.html file instead I just blanked this file out and instead called directly to the search.php file itself. Since in our case I did not want to use an advanced search option nor a category option I stripped down the form call request to simply this: (Did leave the Suggest function active)
To maintain a uniform look to the site I edited the Search_results.html file and adjusted it so that its format matched our site – I did this by changing the css stylesheet from sphider to the main districts combining the two and chaning all of sphider’s relative paths to full paths. The most time consuming struggle I had was adjusting the Search_results.html file so that it retained a custom search form, kept the sites appearance and was coded correctly across all browsers (Trouble validating code and displaying on IE).
To add more functionality to the search engine, I installed the pdftotext shell program to the webserver, and set Sphider to convert all PDF’s to text and index and sort the results when reindexing. The result is that the custom search engine can now easily parse and link to PDF files. I hope to soon add the catdoc shell program and do this same functionality with all word documents. For example when searching for Dress Code.
You can see that the results include references to these search terms inside of PDF’s. With the Board Policies being posted as PDF’s I thought as I refine this feature it would be an easy way to perhaps get to a direct board policy.
The next steps in this process are to clean up the website – since with the search engine it is now easy to find old outdated material that has been orphaned on the server. In this process I noticed that much code on the site does not pass W3C Validation and it would be nice to get the site fully validated (Although our site works fine on all major browsers – for future stability it would be nice to meet the validation stamp of approval).
Another next step is to create meta headings on major pages in the website and adjust the search algorithm to get better results, I currently have this set at the following:
Additionally I set the index options to allow the site to search external sites referenced directly from out site – this improved the results since it now has the ability to show relevant info from teacher websites – or educational links specifically referenced by our site.
For example searching for me Roodhouse – you can see in the results that this website itself appears in the results (gcsdblogs.org is on a different webserver than www.dist228.org).
After the server is cleaned of old material and re-indexed these setting may be tweaked to get the desired results. Overall thought the scripts and functionality of this code is perfect – providing quick accurate results – with full flexibility to adjust to any future needs or specific searching ability. Although this project took me much longer than I thought – this was really because I just needed to learn more on PHP and CSS stylesheets.