As we
have mentioned that MySQL is one of the prerequisite in our approach, our first
step would be setup the MySQL database up and running. Connect to MySQL, we can
any use any of the UI based free tools e.g. Squirrel, HeidiSQL or DBVisualiser
or the MySQL admin console. Once connected, let run the following SQL which
will create a table called SEARCH_ENGINE.
Listing 1: An SQL statement which will create a table –
The
above query will create a table in the database which will be used to store the
details or information to be stored in the database.
Creating the Form:
Now, once the database is ready, let us make the form which will be used by the visitors or the end users to perform their search. Let us call this file - 'index.php' which is a simple search forms having a button. Here we will use GET instead of POST. Thus the information is made quite visible in the address bar.
Listing 2: Our index.php file –
Our form
is now completed and ready to be used. This form will be used by the end users to
enter in a query and at the same time will enable the users to restrict the count
of results which needs to be shown on the form.
Processing the Query:
Let us create a new file 'search.php' which is the page where the results from the search will be listed or shown. This file is divided into following sections -
· Let us connect to the database first:
Listing 3: DB connection –
·
Form the query - Once we are connected to the DB, we
then form the query using the tokens that the end users have entered. This is
shown below -
Listing 4: Construct the query along with the tokens users have entered –
·
Our next job is
to fetch the results from the database and present it to the user. If the
search doesn't yield any result, we should show an appropriate message to the
user as shown below -
Listing 4: Fetch the result and present it to the user –
Now our Search engine is ready to be used.
The code explained above in parts is listed under -
Listing 5: The Complete Search.PHP file –
In order to crawl, we should come up with a list of URL’s. There are a few generic ways to do this as listed under -
Listing 6: The parser –
DOWNLOADING
Downloading the data is going to take some time hence we should be prepared for a longer wait. We can write a very basic crawler in PHP simply by using a file_get_contents and sticking in a url. Let us have a look into the following code -
Listing 7: The crawler –
The
above code is essentially a single threaded crawler. It simply loops over every
url in the file, extracts down the content and then saves the content to the disk.
The only thing we should note here is that it stores the url and the content in
a document since we might need to to use
the URL for ranking purpose and also it is helpful to keep a track where the
document came from. We should keep in mind that we may run out of file system storage limits while trying to
store lots of documents in one folder.
THE INDEX
The reason I initially talked about the test driven development mechanism, is that I prefer the bottom up approach. The index, which we are going to create, should have a few very simple responsibilities as listed under -
Listing 8: The interface –
THE DOCUMENT STORE
The document store is a somewhat odd if we are going to index things that we probably already have what we wanted to be stored somewhere else. The most obvious thing in this case is that the documents are already in some database.
THE INDEXER
The next step in our approach to build our search is to create the indexer. An indexer takes a document, breaks it apart and feeds it into the index, and also possibly to the document store depending upon our implementation.
INDEXING
Now that we have the ability to store and index some documents. Let us go through the steps we need here to have the indexing in place -
Searching requires a relatively simple approach. In fact we only require a single method as shown below -
Listing 9: The search interface –
Of
course, the actual implementation is not that easy. It is rather more complex
than it appears.
Listing 1: An SQL statement which will create a table –
CREATE TABLE SEARCH_ENGINE ( `id` INT(11) NOT NULL AUTO_INCREMENT, `pageurl` VARCHAR(255) NOT NULL, `pagecontent` TEXT NOT NULL, PRIMARY KEY (`id`))
Creating the Form:
Now, once the database is ready, let us make the form which will be used by the visitors or the end users to perform their search. Let us call this file - 'index.php' which is a simple search forms having a button. Here we will use GET instead of POST. Thus the information is made quite visible in the address bar.
Listing 2: Our index.php file –
<html> <head> <title> My search engine </title> </head> <body> < form action = 'search.php' method = 'GET' > < center > <h1 > My Search Engine </h1 > < input type = 'text' size='90' name = 'search' > </ br > </ br > < input type = 'submit' name = 'submit' value = 'Search source code' > < option > 10 </ option > < option > 20 </ option > < option > 50 </ option > </ center > </ form > </ body > </ html >
Processing the Query:
Let us create a new file 'search.php' which is the page where the results from the search will be listed or shown. This file is divided into following sections -
· Let us connect to the database first:
Listing 3: DB connection –
mysql_connect ( "localhost", "USER_NAME", "PASSWORD" ) ; mysql_select_db ( "DB_NAME" );
Listing 4: Construct the query along with the tokens users have entered –
$search_exploded = explode ( " ", $search ); $x = 0; foreach( $search_exploded as $search_each ) { $x++; $construct = " "; if( $x == 1 ) $construct .= "keywords LIKE '%$search_each%' "; else $construct .= "AND keywords LIKE '%$search_each%' "; } $construct = " SELECT * FROM SEARCH_ENGINE WHERE $construct "; $run = mysql_query( $construct );
Listing 4: Fetch the result and present it to the user –
if ($foundnum == 0) echo "Sorry, there are no matching result for <b> $search </b>. </ br > </ br > 1. Try more general words. for example: If you want to search 'how to create a website' then use general keyword like 'create' 'website' </ br > 2. Try different words with similar meaning </ br > 3. Please check your spelling"; else { echo "$foundnum results found !<p>"; while ( $runrows = mysql_fetch_assoc($run) ) { $title = $runrows ['title']; $desc = $runrows ['description']; $url = $runrows ['url']; echo "<a href='$url'> <b> $title </b> </a> <br> $desc <br> <a href='$url'> $url </a> <p>"; } }
Listing 5: The Complete Search.PHP file –
<?php $button = $_GET [ 'submit' ]; $search = $_GET [ 'search' ]; if( !$button ) echo "you didn't submit a keyword"; else { if( strlen( $search ) <= 1 ) echo "Search term too short"; else { echo "You searched for <b> $search </b> <hr size='1' > </ br > "; mysql_connect( "localhost","USERNAME","PASSWORD") ; mysql_select_db("DBNAME"); $search_exploded = explode ( " ", $search ); $x = 0; foreach( $search_exploded as $search_each ) { $x++; $construct = ""; if( $x == 1 ) $construct .="keywords LIKE '%$search_each%'"; else $construct .="AND keywords LIKE '%$search_each%'"; } $construct = " SELECT * FROM SEARCH_ENGINE WHERE $construct "; $run = mysql_query( $construct ); $foundnum = mysql_num_rows($run); if ($foundnum == 0) echo "Sorry, there are no matching result for <b> $search </b>. </br> </br> 1. Try more general words. for example: If you want to search 'how to create a website' then use general keyword like 'create' 'website' </br> 2. Try different words with similar meaning </br> 3. Please check your spelling"; else { echo "$foundnum results found !<p>"; while( $runrows = mysql_fetch_assoc( $run ) ) { $title = $runrows ['title']; $desc = $runrows ['description']; $url = $runrows ['url']; echo "<a href='$url'> <b> $title </b> </a> <br> $desc <br> <a href='$url'> $url </a> <p>"; } } } } ?>
Search Engine architecture
Before going into further details, let us talk about what should be our goals while developing a search engine. Listed below is a brief set of goals which we should be focused on -- WebCrawler, indexer and document storage which should be capable of handling a large volume of documents may be 1 million or even more. .
- We should follow the test driven development which would help to enforce good design and modular code.
- We should have the ability to support various strategies for things like the index, document store, search etc.
- A crawler which is used to pull external documents.
- An index which is the place where the documents are stored in an inverted tree and
- A document store to keep the documents.
In order to crawl, we should come up with a list of URL’s. There are a few generic ways to do this as listed under -
- The most common is to feed the crawler with a list of links which contain lots of links as listed. Our next job is to crawl them and harvest as we go down the list
- Another approach is to download a list of URL’s and then use that list.
Listing 6: The parser –
$file_handle = fopen( " Quantcast-Top-Million.txt ", "r" ); while ( !feof ( $file_handle ) ) { $line = fgets( $file_handle ); if( preg_match( '/^\d+/',$line ) ) { # if it starts with some amount of digits $tmp = explode( "\t",$line ); $rank = trim( $tmp[0] ); $url = trim( $tmp[1] ); if( $url != 'Hidden profile' ) { # Hidden profile appears sometimes just ignore then echo $ } } } fclose( $file_handle );
Downloading the data is going to take some time hence we should be prepared for a longer wait. We can write a very basic crawler in PHP simply by using a file_get_contents and sticking in a url. Let us have a look into the following code -
Listing 7: The crawler –
$file_handle = fopen("urllist.txt", "r"); while (!feof($file_handle)) { $url = trim(fgets($file_handle)); $content = file_get_contents($url); $document = array($url,$content); $serialized = serialize($document); $fp = fopen('./documents/'.md5($url), 'w'); fwrite($fp, $serialized); fclose($fp); } fclose($file_handle);
THE INDEX
The reason I initially talked about the test driven development mechanism, is that I prefer the bottom up approach. The index, which we are going to create, should have a few very simple responsibilities as listed under -
- It needs to store its contents to disk and retrieve them.
- It needs to be able to clear itself when we decide to regenerate things.
- It should validate documents that its storing.
Listing 8: The interface –
interface iindex { public function storeDocuments($name,array $documents); public function getDocuments($name); public function clearIndex(); public function validateDocument(array $document); }
The document store is a somewhat odd if we are going to index things that we probably already have what we wanted to be stored somewhere else. The most obvious thing in this case is that the documents are already in some database.
THE INDEXER
The next step in our approach to build our search is to create the indexer. An indexer takes a document, breaks it apart and feeds it into the index, and also possibly to the document store depending upon our implementation.
INDEXING
Now that we have the ability to store and index some documents. Let us go through the steps we need here to have the indexing in place -
- The first thing we are supposed to do here is to set the time limit to unlimited since the indexing process might take a longer time than expected.
- Our next step is to define the position of the index and the documents that are going to stay in order to avoid the errors.
Searching requires a relatively simple approach. In fact we only require a single method as shown below -
Listing 9: The search interface –
interface isearch { public function dosearch($searchterms); }
1 comment:
Do you need to increase your credit score?
Do you intend to upgrade your school grade?
Do you want to hack your cheating spouse Email, whatsapp, Facebook, instagram or any social network?
Do you need any information concerning any database.
Do you need to retrieve deleted files?
Do you need to clear your criminal records or DMV?
Do you want to remove any site or link from any blog?
you should contact this hacker, he is reliable and good at the hack jobs..
contact : cybergoldenhacker at gmail dot com
Post a Comment