EUROPEAN COURT RULING REDEFINES 'GOOGLING' PEOPLE

AMSTERDAM—A European court decision will require Google to sanitize its Internet search results to protect people who can demonstrate the information unfairly tarnishes their reputation. The landmark ruling empowers the roughly 500 million people living in 28 European Union countries to prevent Google and other search engines from listing embarrassing or illegal episodes from their past. It will also change the role that Google and its rivals play in Europe, transforming them into caretakers of personal reputations. Some key issues to consider: What was the court’s ruling? The European Court of Justice, the closest thing the European Union has to the Supreme Court in the United States, ruled that Google and other search engines must respond to user requests seeking to remove links to personal information. Google and the other search engines, including Yahoo and Microsoft’s Bing, won’t necessarily have to omit all the links covered in an individual’s request, but they will have to make difficult decisions about what should remain within the reach of any Web surfer. The Luxembourg-based court said an individual’s right to privacy has to be weighed against the public’s interest in accessing information. READ MORE...

ALSO: The Google Search Engine

Google's search engine is a powerful tool. Without search engines like Google, it would be practically impossible to find the information you need when you browse the Web. Like all search engines, Google uses a special algorithm to generate search results. While Google shares general facts about its algorithm, the specifics are a company secret. This helps Google remain competitive with other search engines on the Web and reduces the chance of someone finding out how to abuse the system. Google uses automated programs called spiders or crawlers, just like most search engines. Also like other search engines, Google has a large index of keywords and where those words can be found. What sets Google apart is how it ranks search results, which in turn determines the order Google displays results on its search engine results page (SERP). Google uses a trademarked algorithm called PageRank, which assigns each Web page a relevancy score. A Web page's PageRank depends on a few factors: The frequency and location of keywords within the Web page: If the keyword only appears once within the body of a page, it will receive a low score for that keyword. How long the Web page has existed: People create new Web pages every day, and not all of them stick around for long. Google places more value on pages with an established history. The number of other Web pages that link to the page in question: Google looks at how many Web pages link to a particular site to determine its relevance. Out of these three factors, the third is the most important. It's easier to understand it with an example. Let's look at a search for the terms "Planet Earth." CONTINUE READING...

ALSO: How Internet Search Engines Work

The good news about the Internet and its most visible component, the World Wide Web, is that there are hundreds of millions of pages available, waiting to present information on an amazing variety of topics. The bad news about the Internet is that there are hundreds of millions of pages available, most of them titled according to the whim of their author, almost all of them sitting on servers with cryptic names. When you need to know about a particular subject, how do you know which pages to read? If you're like most people, you visit an Internet search engine. Internet search engines are special sites on the Web that are designed to help people find information stored on other sites. There are differences in the ways various search engines work, but they all perform three basic tasks: They search the Internet -- or select pieces of the Internet -- based on important words.They keep an index of the words they find, and where they find them. They allow users to look for words or combinations of words found in that index. Early search engines held an index of a few hundred thousand pages and documents, and received maybe one or two thousand inquiries each day. Today, a top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day. In this article, we'll tell you how these major tasks are performed, and how Internet search engines put the pieces together in order to let you find the information you need on the Web.THIS IS THE FULL ARTICLE.


READ FULL MEDIA REPORTS:

European court ruling redefines ‘Googling’ people


AP FILE

AMSTERDAM
, MAY 19, 2014 (INQUIRER) Associated Press —A European court decision will require Google to sanitize its Internet search results to protect people who can demonstrate the information unfairly tarnishes their reputation.

The landmark ruling empowers the roughly 500 million people living in 28 European Union countries to prevent Google and other search engines from listing embarrassing or illegal episodes from their past. It will also change the role that Google and its rivals play in Europe, transforming them into caretakers of personal reputations.

Some key issues to consider:

What was the court’s ruling?

The European Court of Justice, the closest thing the European Union has to the Supreme Court in the United States, ruled that Google and other search engines must respond to user requests seeking to remove links to personal information.

Google and the other search engines, including Yahoo and Microsoft’s Bing, won’t necessarily have to omit all the links covered in an individual’s request, but they will have to make difficult decisions about what should remain within the reach of any Web surfer.

The Luxembourg-based court said an individual’s right to privacy has to be weighed against the public’s interest in accessing information.

How did this case come about?

The case began with a Spaniard seeking to have outdated information about himself removed from the Internet. His quest became a key test of the so-called “right to be forgotten” — to have unflattering information erased after a period of time.

Specifically, in 2010 Mario Costeja asked for the removal of links to a 1998 newspaper notice that his property was due to be auctioned because of an unpaid welfare debt.

A Spanish privacy agency agreed to his request, but Google protested, saying it should not have to censor links to information that was legal and publicly available.

A top Spanish court asked the European court for an interpretation of how European privacy law applies to search-engine results, and got a broader ruling than it had asked for.

How does change things in Europe?

The immediate impact will be on 200 cases still pending in the Spanish courts, which will now be guided by the ruling in Europe’s highest court.

Similar cases in other European countries are likely to be affected, too. Even more European citizens are now expected to challenge results produced alongside their names.

Those complaints will create logistical headaches and ethical dilemmas for Google, which processes most of the search requests in Europe. Google said it was disappointed by the ruling and will need time to analyze its implications.

Will this change the way Google and the other search engines show personal information in the results displayed in the US?

Legal experts doubt it, although the search engines are still trying to figure out how they will draw the lines about what does and doesn’t belong in their results.

The most likely outcome is that search engines will have different rules for different countries. This isn’t unprecedented.

For instance, Google censors some information — such as in Germany where there are laws banning it from displaying links to websites promoting Nazi principles — while showing the results in other countries.

The First Amendment makes it unlikely that a US court would ever issue a ruling similar to the one made in Europe.

Is this a major blow to Google?

Google isn’t pleased, but the ruling probably won’t make the Mountain View, California, company any less powerful or prosperous.

That’s because the European ruling doesn’t touch the ads that Google shows alongside its search results to generate most of its revenue.

The decision isn’t likely to prompt people to defect from Google’s search engine to find information elsewhere either because all its major rivals also will have to limit the breadth of their results.

Investors took Tuesday’s news in stride, bidding up Google’s Class A stock $3.11 to close at $541.54.

FROM HOWCSTUFFWORKS,COM

The Google Search Engine

Google's search engine is a powerful tool.

Without search engines like Google, it would be practically impossible to find the information you need when you browse the Web.

Like all search engines, Google uses a special algorithm to generate search results.

While Google shares general facts about its algorithm, the specifics are a company secret. This helps Google remain competitive with other search engines on the Web and reduces the chance of someone finding out how to abuse the system.

Google uses automated programs called spiders or crawlers, just like most search engines. Also like other search engines, Google has a large index of keywords and where those words can be found.

What sets Google apart is how it ranks search results, which in turn determines the order Google displays results on its search engine results page (SERP).

Google uses a trademarked algorithm called PageRank, which assigns each Web page a relevancy score.

A Web page's PageRank depends on a few factors: The frequency and location of keywords within the Web page:
If the keyword only appears once within the body of a page, it will receive a low score for that keyword.

How long the Web page has existed: People create new Web pages every day, and not all of them stick around for long. Google places more value on pages with an established history.

The number of other Web pages that link to the page in question: Google looks at how many Web pages link to a particular site to determine its relevance. Out of these three factors, the third is the most important. It's easier to understand it with an example.

Let's look at a search for the terms "Planet Earth."

As more Web pages link to Discovery's Planet Earth page, the Discovery page's rank increases. When Discovery's page ranks higher than other pages, it shows up at the top of the Google search results page.

Because Google looks at links to a Web page as a vote, it's not easy to cheat the system. The best way to make sure your Web page is high up on Google's search results is to provide great content so that people will link back to your page.

The more links your page gets, the higher its PageRank score will be.

If you attract the attention of sites with a high PageRank score, your score will grow faster. Google initiated an experiment with its search engine in 2008.

For the first time, Google is allowing a group of beta testers to change the ranking order of search results. In this experiment, beta testers can promote or demote search results and tailor their search experience so that it's more personally relevant.

Google executives say there's no guarantee that the company will ever implement this feature into the search engine globally.­ Google offers many different kinds of services in addition to chat. In the next section, we'll see how some of them work.

How Internet Search Engines Work by Curt Franklin



A top search engine will index hundreds of millions of pages per day.

The good news about the Internet and its most visible component, the World Wide Web, is that there are hundreds of millions of pages available, waiting to present information on an amazing variety of topics.

The bad news about the Internet is that there are hundreds of millions of pages available, most of them titled according to the whim of their author, almost all of them sitting on servers with cryptic names. When you need to know about a particular subject, how do you know which pages to read?

If you're like most people, you visit an Internet search engine.

Internet search engines are special sites on the Web that are designed to help people find information stored on other sites.

There are differences in the ways various search engines work, but they all perform three basic tasks:

They search the Internet -- or select pieces of the Internet -- based on important words.

They keep an index of the words they find, and where they find them.

They allow users to look for words or combinations of words found in that index.

Early search engines held an index of a few hundred thousand pages and documents, and received maybe one or two thousand inquiries each day. Today, a top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day.

In this article, we'll tell you how these major tasks are performed, and how Internet search engines put the pieces together in order to let you find the information you need on the Web.

"Web Spiders" take a Web page's content and create key search words that enable online users to find pages they're looking for.

Web Crawling

When most people talk about Internet search engines, they really mean World Wide Web search engines. Before the Web became the most visible part of the Internet, there were already search engines in place to help people find information on the Net.

Programs with names like "gopher" and "Archie" kept indexes of files stored on servers connected to the Internet, and dramatically reduced the amount of time required to find programs and documents. In the late 1980s, getting serious value from the Internet meant knowing how to use gopher, Archie, Veronica and the rest.

Today, most Internet users limit their searches to the Web, so we'll limit this article to search engines that focus on the contents of Web pages.

Before a search engine can tell you where a file or document is, it must be found. To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called spiders, to build lists of the words found on Web sites.

When a spider is building its lists, the process is called Web crawling. (There are some disadvantages to calling part of the Internet the World Wide Web -- a large set of arachnid-centric names for tools is one of them.) In order to build and maintain a useful list of words, a search engine's spiders have to look at a lot of pages.

How does any spider start its travels over the Web? The usual starting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site.

In this way, the spidering system quickly begins to travel, spreading out across the most widely used portions of the Web.

Google began as an academic search engine. In the paper that describes how the system was built, Sergey Brin and Lawrence Page give an example of how quickly their spiders can work. They built their initial system to use multiple spiders, usually three at one time. Each spider could keep about 300 connections to Web pages open at a time.

At its peak performance, using four spiders, their system could crawl over 100 pages per second, generating around 600 kilobytes of data each second.

Keeping everything running quickly meant building a system to feed necessary information to the spiders. The early Google system had a server dedicated to providing URLs to the spiders. Rather than depending on an Internet service provider for the domain name server (DNS) that translates a server's name into an address, Google had its own DNS, in order to keep delays to a minimum.

When the Google spider looked at an HTML page, it took note of two things:

*The words within the page

*Where the words were found

Words occurring in the title, subtitles, meta tags and other positions of relative importance were noted for special consideration during a subsequent user search. The Google spider was built to index every significant word on a page, leaving out the articles "a," "an" and "the." Other spiders take different approaches.

These different approaches usually attempt to make the spider operate faster, allow users to search more efficiently, or both. For example, some spiders will keep track of the words in the title, sub-headings and links, along with the 100 most frequently used words on the page and each word in the first 20 lines of text. Lycos is said to use this approach to spidering the Web.

Other systems, such as AltaVista, go in the other direction, indexing every single word on a page, including "a," "an," "the" and other "insignificant" words.

The push to completeness in this approach is matched by other systems in the attention given to the unseen portion of the Web page, the meta tags. Learn more about meta tags on the next page.

Meta Tags

Meta tags allow the owner of a page to specify key words and concepts under which the page will be indexed. This can be helpful, especially in cases in which the words on the page might have double or triple meanings -- the meta tags can guide the search engine in choosing which of the several possible meanings for these words is correct.

There is, however, a danger in over-reliance on meta tags, because a careless or unscrupulous page owner might add meta tags that fit very popular topics but have nothing to do with the actual contents of the page.

To protect against this, spiders will correlate meta tags with page content, rejecting the meta tags that don't match the words on the page.

All of this assumes that the owner of a page actually wants it to be included in the results of a search engine's activities.

Many times, the page's owner doesn't want it showing up on a major search engine, or doesn't want the activity of a spider accessing the page.

Consider, for example, a game that builds new, active pages each time sections of the page are displayed or new links are followed. If a Web spider accesses one of these pages, and begins following all of the links for new pages, the game could mistake the activity for a high-speed human player and spin out of control.

To avoid situations like this, the robot exclusion protocol was developed. This protocol, implemented in the meta-tag section at the beginning of a Web page, tells a spider to leave the page alone -- to neither index the words on the page nor try to follow its links.

Building the Index

Once the spiders have completed the task of finding information on Web pages (and we should note that this is a task that is never actually completed -- the constantly changing nature of the Web means that the spiders are always crawling), the search engine must store the information in a way that makes it useful.

There are two key components involved in making the gathered data accessible to users:

*The information stored with the data

*The method by which the information is indexed

In the simplest case, a search engine could just store the word and the URL where it was found. In reality, this would make for an engine of limited use, since there would be no way of telling whether the word was used in an important or a trivial way on the page, whether the word was used once or many times or whether the page contained links to other pages containing the word. In other words, there would be no way of building the ranking list that tries to present the most useful pages at the top of the list of search results.

To make for more useful results, most search engines store more than just the word and URL. An engine might store the number of times that the word appears on a page. The engine might assign a weight to each entry, with increasing values assigned to words as they appear near the top of the document, in sub-headings, in links, in the meta tags or in the title of the page. Each commercial search engine has a different formula for assigning weight to the words in its index.

This is one of the reasons that a search for the same word on different search engines will produce different lists, with the pages presented in different orders.

Regardless of the precise combination of additional pieces of information stored by a search engine, the data will be encoded to save storage space.

For example, the original Google paper describes using 2 bytes, of 8 bits each, to store information on weighting -- whether the word was capitalized, its font size, position, and other information to help in ranking the hit. Each factor might take up 2 or 3 bits within the 2-byte grouping (8 bits = 1 byte). As a result, a great deal of information can be stored in a very compact form. After the information is compacted, it's ready for indexing.

An index has a single purpose: It allows information to be found as quickly as possible.

There are quite a few ways for an index to be built, but one of the most effective ways is to build a hash table. In hashing, a formula is applied to attach a numerical value to each word. The formula is designed to evenly distribute the entries across a predetermined number of divisions. This numerical distribution is different from the distribution of words across the alphabet, and that is the key to a hash table's effectiveness.

In English, there are some letters that begin many words, while others begin fewer. You'll find, for example, that the "M" section of the dictionary is much thicker than the "X" section.

This inequity means that finding a word beginning with a very "popular" letter could take much longer than finding a word that begins with a less popular one. Hashing evens out the difference, and reduces the average time it takes to find an entry.

It also separates the index from the actual entry. The hash table contains the hashed number along with a pointer to the actual data, which can be sorted in whichever way allows it to be stored most efficiently.

The combination of efficient indexing and effective storage makes it possible to get results quickly, even when the user creates a complicated search.


Chief News Editor: Sol Jose Vanzi
© Copyright, 2014 by PHILIPPINE HEADLINE NEWS ONLINE
All rights reserved


PHILIPPINE HEADLINE NEWS ONLINE [PHNO] WEBSITE