SEOHits.com

This blog is really out of necessity. There are too many phony search engine marketers out there, too many money hungry SEOs and too many new comers that are in this game just to cash out leaving their clients without a product or decent service. Enough is enough guys, let’s show some results.

Monday, May 12, 2008

what Google knows about spam?

Thursday, April 17, 2008

Best practices when moving your site
Wednesday, April 16, 2008 at 1:55 PM
Posted by Ríona MacNamara, Webmaster Tools Team

Planning on moving your site to a new domain? Lots of webmasters find this a scary process. How do you do it without hurting your site's performance in Google search results?

moving your site
Your aim is to make the transition invisible and seamless to the user, and to make sure that Google knows that your new pages should get the same quality signals as the pages on your own site. When you're moving your site, pesky 404 (File Not Found) errors can harm the user experience and negatively impact your site's performance in Google search results.

Let's cover moving your site to a new domain (for instance, changing from www.example.com to www.example.org). This is different from moving to a new IP address; read this post for more information on that.

Here are the main points:

* Test the move process by moving the contents of one directory or subdomain first. Then use a 301 Redirect to permanently redirect those pages on your old site to your new site. This tells Google and other search engines that your site has permanently moved.

* Once this is complete, check to see that the pages on your new site are appearing in Google's search results. When you're satisfied that the move is working correctly, you can move your entire site. Don't do a blanket redirect directing all traffic from your old site to your new home page. This will avoid 404 errors, but it's not a good user experience. A page-to-page redirect (where each page on the old site gets redirected to the corresponding page on the new site) is more work, but gives your users a consistent and transparent experience. If there won't be a 1:1 match between pages on your old and new site, try to make sure that every page on your old site is at least redirected to a new page with similar content.

* If you're changing your domain because of site rebranding or redesign, you might want to think about doing this in two phases: first, move your site; and second, launch your redesign. This manages the amount of change your users see at any stage in the process, and can make the process seem smoother. Keeping the variables to a minimum also makes it easier to troubleshoot unexpected behavior.

* Check both external and internal links to pages on your site. Ideally, you should contact the webmaster of each site that links to yours and ask them to update the links to point to the page on your new domain. If this isn't practical, make sure that all pages with incoming links are redirected to your new site. You should also check internal links within your old site, and update them to point to your new domain. Once your content is in place on your new server, use a link checker like Xenu to make sure you don't have broken legacy links on your site. This is especially important if your original content included absolute links (like www.example.com/cooking/recipes/chocolatecake.html) instead of relative links (like .../recipes/chocolatecake.html).

* To prevent confusion, it's best to make sure you retain control of your old site domain for at least 180 days.

* Add your new site to your Webmaster Tools account, and verify your ownership of it. Then create and submit a Sitemap listing the URLs on your new site. This tells Google that your content is now available on your new site, and that we should go and crawl it.

* Finally, keep both your new and old site verified in Webmaster Tools, and review crawl errors regularly to make sure that the 301s from the old site are working properly, and that the new site isn't showing unwanted 404 errors.

We'll admit it, moving is never easy - but these steps should help ensure that none of your good web reputation falls off the truck in the process.

Posted by Ríona MacNamara

Monday, April 14, 2008

Hitwise: Google Hits New High; Microsoft & Yahoo New Lows



Last week, Hitwise released the latest statistics for search engine share in the United States for March 2008, showing Google at an all-time high while Microsoft and Yahoo hit all-time lows.
The four major search engines stack up as follows:

* Google: 67.3%
* Yahoo: 20.3%
* Microsoft: 6.7%
* Ask: 4.1%

The trend over time? Here's the past year's worth of data:

Google's previous high (according to stats I have going back to August 2006) was in April 2007, with a 65.3 percent share. Microsoft and Yahoo, in contrast, hit an all time lows for the same period.

Caveat Time!

As a reminder, my general rules when evaluating popularity stats:


* Avoid drawing conclusions based on month-to-month comparisons. Lots of things can cause one month's figures to be incomparable to another month. It's better to see the trend across multiple months in a row.

* Avoid drawing conclusions based on one ratings service's figures. Each service has a unique methodology used to create popularity estimates. This means that ratings will rarely be the same between services. However, a trend that you see reflected across two or more services may give you faith in trusting that trend.

* Consider Actual Number Of Searches: While share for a particular search engine might drop, the raw number of searches might still be going up (and thus they might be earning more money, despite a share drop). This is because the "pie" of searches keeps growing, so even a smaller slice of the pie might be more than a bigger slice in the past. See Nielsen NetRatings: August 2007 Search Share Puts Google On Top, Microsoft Holding Gains for a further explanation of this.

Also specifically for Hitwise, you don't see AOL on the chart because Hitwise doesn't break it out. Instead, it is included in the "Other" figure. Hitwise generally undercounts AOL's share; other services then to put it at 4 percent or higher.

Thursday, March 20, 2008

Guilty Until Proven Innocent

What If?
In the movie "The Lives of Others" (Das Leben Der Anderen), a question is asked, "What does an actor do if they can no longer act?"

The movie takes place before the fall of the Wall, in the socialist Deutsche Demokratische Republik, or DDR, a state with an all powerful government that has complete and utter control over its inhabitants lives. The state has complete autonomy to determine the success or failure of any person. There is no appeals process for those whom the state has deemed broke the rules, nor consistency in the passing of judgment. That element of whim is especially dangerous for those trying to live by the state's rule.

At first, such a state almost works, but in the end such suffocating power alienates those living by its rule. And, in a just world where enough outside pressure exists, the people can ultimately find a better life through an alternative system.

Google is such a system. It is the equivalent of a socialist state, and each day it ruins the lives of those dependent upon it. For the ruined, the blacklisted, there is no redemption or reprieve. I know. I am one of them, and I could live with that because I had options. I cannot now because often others do not.

In "The Lives of Others", Albert Jerske is a director blacklisted by the state, and his life never recovers. It wasn't just the impact of his ban on art that was the tragedy but the ruining of a good man. A central character, the writer Georg Dreyman, finally takes action, motivated by the impact the state's overly harsh decision to ban Jerske had on his friend's life. For Albert Jerske to be denied a livelihood didn't just impact Albert but all those whose lives he touched.



Google. Not Just the New Microsoft.
Much has been written about Google being the new Microsoft. The latter has its fair share of detractors, and for the savvy few, they can live their lives free of Microsoft's reach. There is a difference between being able to and having equivalent options easily within reach. You might be able to live without Microsoft, but it's not easy for the vast majority to do so. The same is true of Google. As a consumer, other options exists; whether they are equivalent is up in the air, but they are available for use without difficulty.

As a business, though, you do not have the same ability to live without Google. Whether you rely on organic search traffic or paid search traffic, if the Google Socialist Sate judges you unfit, your business will be ruined. There are workarounds, but all rely on deceiving the state that you no longer practice your trade.

When the State is young and its reach not as widespread, the number of bans to false positives are in alignment. When it must make judgments across an ever increasing universe that has grown in not just scale but complexity and especially subtlety, the State makes mistakes. And that is OK, because everyone will make mistakes. Where it is not OK, and where we are today is when the innocent have no voice.

It's a search engine, what's the big deal?
I used to be of this camp when I'd hear of people who had their publisher sites shut down or their advertiser account banned. Surely, the State had its reasons. They broke the rules. Then, it happened to me, and I had a taste of its potential impact.

As someone that has written extensively about the online space, I am fortunate to have both a broad and deep understanding of many aspects to the online advertising ecosystem. That knowledge doesn't necessarily translate into practical expertise, the same way that some of the best coaches will never play as well as the players they coach. Still, first-hand experience is critical in making for not just a better player but coach. For me, that means understanding what those I write about go through.

I started promoting an offer, having first built out a site to support it. It was a pretty unique offer but the backend for it relied on a technology provider that a wide range of other companies also leveraged. The experience provided just what I hoped it would and a chance to put my money where my words were. Then, one day, about three months after I begin, all traffic stopped. After checking the settings and ruling out the obvious, I wrote in to Google. The reply surprised me.

No AdWords for You.
Google suspended my activities because of "repeated violations." Given that I had never received a warning or error, I sat dumbfounded. I wrote in again. The reply let me know that it wasn't necessarily my account but one to which I was linked. (Given Google's ultimate goal of user experience which includes policies against double listing, if they feel a person has set up two different accounts to game the system, they treat those accounts as one.) That they do this in an automated fashion makes sense given the sheer size of accounts and ease with which a person can set up an account. It's ripe for abuse.

There are countless stories of how accounts get linked, and many are cautionary tales at best, horror stories at worst for companies who might not appreciate the consequences. A classic example is as follows. A person who has access to the company's AdWords accounts has their own AdWords account. They are a good employee and don't work on their personal project at the office, but as a good employee they do work on your business while at home. By accessing both AdWords accounts on the same machine, Google decides both accounts are the same person despite their being different. Worst case, the employee breaks the rules with their personal account. The employer finds their campaigns stopped and can't get them back online.

With my account, linked because of the technology provider's other clients, I called Google support to see if any additional information would come to light. Dealing with Google is like a bad dream, like the a perversion of justice. Want to know what it's like? Read John Grisham's non-fiction book, The Innocent Man. The arrogance, lack of information, and unwillingness to help by Google employees who find themselves in the position of power and more frustratingly the almost unquestioning trust in their system's correctness in dispensing sentencing. Without a doubt, you are presumed Guilty, but you will not be allowed to prove your innocence.

My suspension was not just frustrating, but it felt like a questioning of my character. You feel like crying out, "Don't you know who I am? People will vouch for me. I'm respected in my field." And on and on. You don't because you know it will fall on deaf ears. And, while your friends feel for you, they know how the State works and won't offer up their relationships in the State to help, lest they need to use their one get out of jail free card for themselves.

No Longer Silent.
The real problem with the suspension is that it's not a suspension. It's a ban, a blacklist. I am tainted. If I want to help someone else out, I can't, unless I do so from a machine that never logs into my gmail account. That was my first mistake. I set up my AdWords account using the gmail account that I use for my entire life.

I didn't write about my ban initially because I had other things to do. My time and effort was being spent on LeadsCon, which is looking fantastic. It's unfortunate that I cannot actually advertise my conference on Google because of my suspension, not unless I go through some extraordinary lengths to make it seem as though it's not actually me behind the conference. I know how to do it, but I don't want to live two lives. My childlike was reaction has simply been to make sure Google is not welcome at my show. We need to understand how to use them, but that doesn't mean they represent the type of company that I want around the people I like and respect. If MSN were smart, they'd make sure to be at the show to tap into marketers who spend more than $2 billion are big in search and display. Yahoo will have some people there, which I think is a smart decision and consistent with their aims of better understanding lead generation especially aftertheir acquisition of Blue Lithium.

My Albert Jerske
This post has certainly had a quasi-therapeutic effect, allowing me to finally share, rather vent, about a personal frustration, but if it comes off only as that, then I've failed at expressing the main point, the danger of Google's policies and raising awareness of a growing problem that impacts a growing number of a legitimate talent each day. It's much like a global warming; you know the problem exists, but until your life has an interruption due to it, you can do a pretty good job ignoring it and paying it lip service.

What really forced me into writing about the Google State was not my experience, which is several months old now, but one that happened to a dear friend, not just my friend but a friend to the industry, a remarkable person, who as it turned out also happened to spend several hundred thousand monthly with Google. Almost insultingly, Google wouldn't assign him an account manager. Can you imagine a company that you spend seven figures with yearly and you didn't have a person in the company to whom you could speak and knew your business?

His particular problem started when Google having sent him a note informing of an infraction early February. Then, on February 13th, a follow-up email came saying that they had done as requested. Google had said that his campaigns contained too many irrelevant keywords. Like many of the more sophisticated, he was a long-tail bidder, and long-tail is often audience based, especially, and as was the case for him, when spending money on content sites. It makes sense to advertise a high end watch on high end car words. There is an audience overlap. A person spending money on a $100k+ car is the same one who is likely to want a $5k Panerai. His weren't quite that far reached, but it's the same concept.

My Albert Jerske, my amazing talent of a human being who was blacklisted, didn't expect to receive a note sent at 5pm PST this past Friday saying that his accounts were being shut done and that any new campaigns or accounts he tries to set up would be declined. He's not on PST. Do you think Google works on the weekend? No. What type of behavior is that on their part to shut down his business, which isn't just him but a staff of people who rely on him for employment, at the end of day on a Friday?

You could blame him for not diversifying, not figuring out display or email or not doing scale with other engines, but anyone who actually spends money on search, I mean truly spends money, knows the fallacy of those arguments. Google is the online advertising platform for an enormous group of companies, much like Microsoft is for personal computing. Not everyone can be a Toyota or Proctor and Gamble (or wants to be); not every company has a business which doesn't really need search. For those that do, Google is the only real alternative.

Remembering Spider Man
The Jewish people call it Yiddishe Kopf. It's a way of thinking and caring for your fellow human, a type of consideration and compassion. If you are a superhero, you could describe it as Spider Man's Uncle Ben did, "With great power comes great responsibility." It's the Golden Rule, and that's just the problem. For Google, the Golden Rule is mathematical. They revere phi. For a just society, we revere a much different Golden Rule. Google could learn to get in touch with that other Golden Rule. Mathematics might help describe the universe, but it can't help the people of the world.

I can accept a mistake made against me, but I can't can't when something happens to someone as dear as family. I respect and appreciate what Google has done and built, what they provide. But, I don't like them, and I certainly don't trust them.

Actions speak louder than words. Google has always had the right things to say, but their words are empty in the face of their actions, a shield that weak people hide behind to feel righteous and better. it's time for the world to see that the emperor has no clothes and for those inside the State to change lest their Wall comes down and leaves all the bureaucrats without their layer of ill-earned, ill-deserved, and protective self-importance.

Wednesday, March 12, 2008

Open Letter To Google: Do The Right Thing, Divest Yourself Of Performics

Open Letter To Google: Do The Right Thing, Divest Yourself Of Performics

At long last, Google owns DoubleClick. In doing so, the company has done something else that many people would have never believed possible. Become an SEO. That's right -- Google's in the SEO business now, selling services through DoubleClick's Performics to people who want to rank well on -- um -- Google. Conflict of interest? You bet. And worse from an image perspective, the purchase puts Google in the paid inclusion business, something it dissed as evil back in 2004, when it went public. Don't get me wrong, I have absolutely no problem with Performics as a company and have good friends that work there. But Google shouldn't own it. The Google announcement yesterday should have said that Performics was being quickly spun off. Larry, Sergey, Eric, Google! Please do the right thing and make this a priority. Below, more on why this should be done, plus the official Google stance, so far.

Performics is a long-time leader in the SEO and search marketing space. It provides paid search services (getting you listed on search engines like Google itself through ads), as well as "natural search" or search engine optimization work:

Using robust technology, the DoubleClick Performics' team scientifically optimizes existing client sites to create new, dynamic, crawler-friendly sites highlighting brand, nonbrand and long-tail keywords. Our experts methodically optimize copy and content for each page to boost page rankings. DoubleClick Performics understands the importance of creating an NSO culture at your company, and one that is not necessarily dependent on the bandwidth of your IT department. Our customized solutions combat existing issues and leverage Web 2.0 technology to boost rankings:

There's nothing wrong with SEO. Even Google tells you this. But on that same page, Google also says:

While Google doesn't have relationships with any SEOs...

Now it does -- it owns an SEO. And therein lies the problem. Even if Performics is kept completely separate from the Google search team, there's the impression that Performics might have some special "in" with Google's non-paid search results. After all, Google owns it! It's also not hard to imagine that despite all the best intentions, some new sales rep might pitch that Performics has some type of in. That a bad thing. FYI, Performics already touts its relationship on the paid side, as do many other search marketing firms.

It just doesn't feel right. To me, it's the same thing as if the New York Times owned a PR company, where much of that company's main work focused on getting articles to show up in the New York Times. It's a conflict that will hurt Google's trust. It's a conflict that's going to cause many in the SEO community to constantly poke at Google. If you think the paid links debate over the past year has gotten rough or mean at times, I don't think you've seen anything yet as SEOs digest that Google is both competing with them and harboring such a conflict.

It's not like Google didn't see this coming. The issue was raised way back when the DoubleClick purchase plans were first announced. But the response then that it would be "business as usual" made me feel that the conflict simply was being lost in the main focus on the DoubleClick ad serving business. The suits doing the deal, if you will, to me simply didn't get that Google's core trust was being put at stake.

Let me bring it home to the suits using a financial document they should be familiar with. That's the Google IPO registration document of April 2004. In it, Google addresses the issue of paid inclusion several times.

Back then, paid inclusion was a big deal. Big deal. All the major search engines but Google practiced it. Paid inclusion is where someone can pay to get their pages listed in the "natural" or "free" results. Unlike paid placement (AdWords, Yahoo Search Marketing ads, Microsoft adCenter), you are not guaranteed that your listing will show up for any given term. Instead, it's like a lottery. You can buy your way into the main search index where you hope that your number will come up. Supposedly, paid inclusion is not supposed to provide any ranking benefit, though with the remaining search engine that runs such a program -- Yahoo -- it is a way for you to cloak optimized content within the rules and assure that your content automatically is given a quality score rating boost.

Google was the lone hold-out against paid inclusion at the time and often used this as a marketing point to help promote itself. Not only was it used for marketing, but Google's cofounders strongly believed the practice was wrong. That's why in the letter from the founders that formed part of the IPO filing, they called it out several times. Below, key sections where this was done, with the parts about paid inclusion bolded:

DON’T BE EVIL

Don’t be evil. We believe strongly that in the long term, we will be better served—as shareholders and in all other ways—by a company that does good things for the world even if we forgo some short term gains. This is an important aspect of our culture and is broadly shared within the company.

Google users trust our systems to help them with important decisions: medical, financial and many others. Our search results are the best we know how to produce. They are unbiased and objective, and we do not accept payment for them or for inclusion or more frequent updating.

And:

We will do our best to provide the most relevant and useful search results possible, independent of financial incentives. Our search results will be objective and we will not accept payment for inclusion or ranking in them.

And:

Objectivity. We believe it is very important that the results users get from Google are produced with only their interests in mind. We do not accept money for search result ranking or inclusion. We do accept fees for advertising, but it does not influence how we generate our search results. The advertising is clearly marked and separated. This is similar to a newspaper, where the articles are independent of the advertising. Some of our competitors charge web sites for inclusion in their indices or for more frequent updating of pages. Inclusion and frequent updating in our index are open to all sites free of charge. We apply these principles to each of our products and services. We believe it is important for users to have access to the best available information and research, not just the information that someone pays for them to see.

And:

Froogle enables people to easily find products for sale online. By focusing entirely on product search, Froogle applies the power of our search technology to a very specific task—locating stores that sell the items users seek and pointing them directly to the web sites where they can shop. Froogle users can sort results by price, specify a desired price range and view product photos. Froogle accepts data feeds directly from merchants to ensure that product information is up-to-date and accurate. Most online merchants are also automatically included in Froogle’s index of shopping sites. Because we do not charge merchants for inclusion in Froogle, our users can browse product categories or conduct product searches with confidence that the results we provide are relevant and unbiased. As with many of our products, Froogle displays relevant advertising separately from search results.

Paid inclusion bad. Not just bad, but evil. It's listed in the second sentence of the Don't Be Evil section! And Performics? It's a big business for them -- and now Google. From the Performics site:

Using superior technology and experienced practitioners, DoubleClick Performics’ Paid Inclusion tool seamlessly integrates content into Yahoo!’s natural search results to improve site visibility. Our copywriters prepare listings that maximize rank and clickability for brand and nonbrand terms, eliminating reliance on a spider to index site pages.

And:

Online shoppers who visit a comparison shopping engine (CSE) spend 24 percent more than the average online consumer. DoubleClick Performics has superior relationships with the top CSEs, enabling advertisers to expand reach, improve relevancy and cost effectively connect with more consumers.

Working together, DoubleClick Performics will distribute your single feed to multiple sites, facilitating quick and simultaneous updates across properties. Today, we manage programs for more than 150 clients across 18 CSEs. Our vertical experts utilize advanced technology to enhance program development through a proven methodology: DoubleClick Performics is thinking forward about data feed distribution and optimization:

OK, so we're not talking paid inclusion on Google itself. We're talking selling inclusion into Google's competitors, both in general search and against Froogle/Google Product Search. But does that make it less evil? If Google was so against paid inclusion in 2004, should it in 2008 now be in the business of selling it to anyone?

The debate about paid inclusion has largely died down, and former Ask.com CEO Jim Lanzone got no real traction trying to revive it last year. So maybe this part of the Performics purchase may not matter except for those who want to call Google hypocritical on the count.

But to me, the conflict of owning an SEO firm remains. Yes, I know that Microsoft owns one, gaining Avenue A/Razorfish as part of its AQuantive purchase. That never sat right with me, and I wish like Google, Microsoft would divest themselves of it. At the same time, yes, stop selling branded SEO services through other companies. You own the pie; do you really need to sell the pie cutters too?

And the Google official stance? Here's what I received yesterday:

We intend to spend the next several months assessing all of DoubleClick's products and services including those offered by Performics. In the near term, we intend to operate Performics as a stand-alone business unit consistent with its past practices. Upon the completion of our integration planning with respect to Performics, we will be in a better position to announce our future plans for this business.

A purchase initially worth more than $3 billion, and Google still hasn't assessed that Performics poses such a huge conflict to own? Disappointing. Google should have announced a spin-off yesterday. I'd hope they'll rapidly move forward with doing that.

Tuesday, March 04, 2008

The Importance of Page Layout in SEO

If a search engine could understand the layout of a web page and identify the most important part of a web page, it could pay more attention to that section of the page when indexing content from the page.

It could give links found within that section of the page more weight than links found in other sections of the page, and it could consider information within that area more weight when determining what the page is about.

We’ve seen the idea of breaking pages up into parts from a couple of the major commercial search engines:

* Microsoft VIPS and block level link analysis (pdf)
* Google - Document Segmentation based on Visual Gaps

A patent application from Yahoo explores how to approximate the layout of a web page, without actually displaying the page as a web page the way that a browser program does.

Not actually rendering a page like a browser might makes the process faster, which is important when a search engine has to look at lots and lots of web pages.

The patent filing also explores ways to identify what the most important section of a page might be from the approximated version of a layout. The patent filing is:

Techniques for approximating the visual layout of a web page and determining the portion of the page containing the significant content
Invented by Anandsudhakar Kesari
US Patent Application 20080033996
Published February 7, 2008
Filed August 3, 2006

Here’s the abstract from the patent application:

To approximate a visual layout of a web page without rendering the page, an object tree representing elements within the page is recursively traversed to determine bounds for the width of the elements, resulting in lower bounds induced for non-leaf nodes by elements within these nodes and upper bounds induced by ancestors and siblings of nodes.

For each element, the minimum required width (lower bound), the desired width were there no constraints, and the maximum available width (upper bound) based on constraints of parents are computed, and an approximate width is derived therefrom.

A positioning process positions each element within its corresponding parent container by advancing a cursor according to the elements’ approximate width and appropriate constraints.

The element that contains the most meaningful content is determined based on the amount of weighted content of elements and their position within the page.

Information Extraction Systems and Data Structures

The ways that information might be presented on web sites can often be described as structured, semi-structured, or unstructured.

Structured means that the pages are generated using a common layout or template, and contain the same information fields from one page to another.

Semi-structured sites may use templates that have a number of variations to them. For example, one page may include information and fields that other pages don’t have, or some pages might show a wider range of information and values. .

Some sites that may use a structured format might include job sites, or travel sites, or ecommerce product pages.

The majority of pages on one of those sites may display all the same information fields from one page to another, and if there isn’t information to fill a field, the field is shown anyway, but might show that there is no information for that field. An online bookstore might be set up that way, too.

A semi-structured format just might not display fields that are empty, or may show some new fields if there is unique information to show.

Information Extraction (IE) systems are used to gather and manipulate the unstructured and semi-structured information on the web and populate backend databases with structured records.

One of the challenges faced by an information extraction system is to quickly and accurately extract information from HTML pages.

So, how does an information extraction system find the good stuff on a page full of HTML code, and bypass the useless content?

It might look for some cues from the HTML, such as

(a) Style of the content, like color, emphasis, size, etc.;

(b) Geometric layout of page elements of the page, like the absolute and relative placement of elements; and,

(c) A visually significant region on the page which appears to contain the main content.

Looking at HTML to get cues about layout and which section might contain the main content of a page can be difficult, without using something like a browser to display a page the way that people actually see it.

But the cost of looking at a page in that way can be computationally expensive, and if a good approximation can be done that doesn’t involve that kind of expense, then it may be ideal for information extraction purposes.

Identifying the Most Significant Element of a Page

A search engine doesn’t really want to pay too much attention to sections of a page that it might consider noisy, such as navigation bars, or banner or targeted ads when extracting information from a page.

It probably doesn’t want to focus upon a footer part of a page, with information like a copyright notice, or the header of a page which may contain a site logo repeated from one page to another on the site.

The most significant element of the web page would be estimated by this visual layout process, by trying to find the element that contains most of the meaningful content on the page.

That most significant element of the page would be based on the amount of weighted content of elements and the position of the elements within the page as approximated by the visual layout process.

Conclusion

The patent application goes into a lot of detail on a method to estimate the layout of a page, and to understand the positions of elements within a page, as well as identifying the most significant element of pages.

If you build web pages, and you want an idea of how a search engine might be looking at and weighing the content of your pages, you may want to spend some time with this patent filing.

Considering that Google and Microsoft also have developed methods to segment the contents of web pages, It’s not a bad idea to get a sense of how they all might be breaking pages down into parts.

Distribution of Clicks on Google’s SERPs
October 26th, 2006


Eye-tracking studyWhat is the distribution of clicks on a search engine results page? What percentage of clicks gets each search result according to its rank? How much more users’ attention gets the first listing compared to the second? Or how often do users click the listing below the page fold? The way users interact with SERPs is one of the most frequently discussed topics in the SEO community and is also a very important field of study for the search engine specialists. To answer the above questions researchers employ the so-called eye tracking experiments.

Eye-Tracking Studies

The objective of eye tracking studies is gaining insight into how users browse the presented abstracts and select links to click. The results of eye tracking research provide Internet marketers with information on clickthrough rates, thus allowing them to make correct predictions on traffic changes as their rankings are gained or lost. For SE engineers the results provide a basis for improving the interfaces of search engines and metrics to evaluate the relevancy of the presented search results.

To detect users’ interaction patterns the eye tracking experiment observes a number of indicators of ocular behavior using a CCD (charged couple device) camera similar to the appliance used to read bar codes. The indices of ocular behavior include eye fixations, saccades, scan paths and pupil dilation. Eye fixations are defined as a stable gaze lasting for 200-300 milliseconds representing visual attention to a specific area of a SERP. Pupil dilations or pupil diameter changes represent a measurement of interest in a particular listing. This variable is especially important as it helps interpreting an implicit user feedback to the relevancy of the presented search results.

Cornell University Eye-Tracking Analysis of SE Users’ Behavior

One of the most recent eye tracking studies was performed at Cornell University by Laura A. Granka, Thorsten Joachims and Geri Cay ([1]). They used a sample of undergraduate students instructed to perform search in Google for 397 queries o topics covering movies, travel, music, politics, local and trivia. This study has produced the following results.

Google Click Distribution map

Fig 1. Google SEPR Click and Attention distribution ‘heat-map’

Study Results: Clicks and Attention Distribution

As you can see from the graph below and a SERP ‘heat-map’ based on it, the first two listings capture over a half of the user’s attention in terms of time of the eye fixation. Whereas the attention is shared almost equally, the difference in number of click between the first two listings is much more surprising: over four times! After the second listing the eye fixation drops sharply. Search results number 6 to 10 receive roughly equal attention. Here an interesting thing is that the 7th listing gets less attention than the succeeding 8th – apparently here we can observe the effect of the page fold. The 7th listing is just below the screen edge and is often skipped as users scroll the page down to the bottom (during the study the 7th listing was clicked only once). On the graph you can also see the 11th listing from the second page of the search results. It gets only about 1 percent of clicks and user attention – 2.5 times less than the lowest ranked result on the page one.


Click and attention distribution

Fig 2. Time spent on viewing each results compared to the number of clicks. Source [1]

Often people consider getting to the ‘top-ten’ of Google as a measurement of the SEO success. Evidently this is a rather rough approximation. The ‘top-ten’ itself is a very diverse group with the number of clicks increasing almost logarithmically as your rank grows. For instance, the first five positions get over 88% of the traffic, and the first three – 79%.
SERP Browsing Patterns

Another important result of this study is the discovery of the browsing pattern: the way people read a SEPR. To assess the performance of the search algorithm it is vital to know how users evaluate the presented abstracts before clicking one of them. For example, if a user clicks the third listing, did he look the abstracts above and below it? The following figure shows how many results above and below of the selected listing are scanned on average.

Browsing pattern

Fig.3 Number of results scanned above and below the selected abstract. Source [1]

The effect of the page fold is clearly demonstrated here as well. While the first 5 listings are clicked after browsing through 1 to 2.68 listings above and below, the 7th listing is clicked after the entire page is examined! The listings below the page fold (8-10) are clicked after the first five or four listings are scanned. You can also see that the number of listings scanned above the clicked result is much bigger than the number of listings below. This indicates that users browse the list from top to bottom.
To Sum Up

While the study deals only with the first page of the organic search results, it can be assumed that similar results can be produced for other pages and perhaps even for the list of the paid ads in the right sidebar.

In addition to the academic researches there is a number of companies producing eye-tracking studies for the commercial use. The most notable of them are Eyetools.com and Poynterextra (http://www.poynterextra.org/EYETRACK2004/index.htm)
References:

1. Laura A. Granka, Thorsten Joachims, Geri Gay. ‘Eye-tracking analysis of user behavior in WWW search’, SIGIR, 2004. Available at http://www.cs.cornell.edu/People/tj/publications/granka_etal_04a.pdf Retrieved on 26.10.06

Tuesday, February 19, 2008

SEO things I read today...about Flash :)

4 Solutions to SEO problems of using Flash

Many of us has seen a website with Adobe® Flash®. It has attracted many website surfers every since Macromedia came out first with Shockwave® and then create a more lightweight version for the web, Flash. It's main strength is the animation capabilities along with a strong scripting language that seems no one else on the market has matched . With the lack of competition and a superb product, it is supported by almost every browser and operating system with many third-party add-on tools made by several companies. It's no surprise to see many sites using Flash these days.

But the problem is, Flash, just like images and videos, these are not made in a plain text language embedded within the HTML code using tags. A browser plug-in is used here that needs to be installed at least once in order for the web browsers to display Flash websites properly. With current day bandwidth standards this just takes a few minutes or even seconds.

Flash is in a binary file format, not in plain text which makes it more difficult to consistently or not possible at all for some search engines to extract the content found within a Flash file.

Do SEO professionals hate Flash?


Depends on who you are asking, I have heard many people in the SEO industry that just hate Flash period! Just because they claim these sites cannot be search engine optimize. The feeling can be mutual from some of the web design and development community that loves Flash and AJAX and just hates SEO (watch related funny videos).

Flash and AJAX are two technologies that enhance the user experience on a website and these technologies are going to stay here for a long time. Thus should be embraced by the SEO community and learn all the workarounds on how to implement a successful SEO campaign running AJAX or Flash. If your SEO analyst simply tells you do not use Flash, they just might not really know just what to do with it.

SEO Solutions in using Flash


Below I will mention 4 solutions on how to optimize your website well even if they are running Adobe Flash. Since Flash cannot be interpreted perfectly and consistently by search engines, you run into 2 main problems. First is [1] the important text content rendered in Flash cannot be read well by the search engines and second is [2] navigational elements within Flash cannot be crawled by search engine spiders. Having this in mind, here are four tips on how to implement SEO successfully on a Flash website.


1. The Non-Flash Site Version for Sites Completely Made in Flash
Content: If your website is made all in Flash and has no other HTML elements except the code that embeds the Flash file(s), making another website with the exact look and feel (for branding purposes) but does not have all the bells and whistles of Flash will make the content readable by search engine. Since this is like totally having a separate website, you would really expect the non-Flash website generating the search engine traffic, which will then funnel user through the navigation leading them into the Flash website.
Navigation: Since there is a separate website with different pages targeted for different keywords, search engine crawlers will have the opportunity to see the links and follow them in the non-Flash plain HTML website. Thus getting all pages included in the search engine index. With the nature of Flash, similar to AJAX, where each page view may not necessarily load a new HTML page with a new URL, it would still be best to create a unique URL for each "Flash view" that will be the entry point page from the non-flash pages.

In the image above, users will normally enter the page through the flash version of the website. And once on the homepage, the URL will no longer change although the "Flash Views" that serve like pages in the perspective of the user will change.

On the other side of the story, Search engines will crawl the plain HTML version of the site. Which can generate traffic going to the HTML pages and each page will have the option for the user to see the flash version that lies on it's own unique URL and loading the same Flash file but going straight to the appropriate Flash view.


Advantage: You can design your Flash as intricate you want. No limitations since the SEO'd pages are on the non-Flash website.
Disadvantage: You need to spend more time and resources making the website. Having a Flash and non-Flash website is having two websites to maintain. More time, more resources, more money spent.
Quick Tip: Flash can input data from XML documents relatively easy. Server-side scripting languages such as PHP, JSP, ASP, etc. can also import data from XML documents. Creating a unified CMS should make life easier for this kind of setup.

2. Alternative Content and Navigation
Content: On the same page where Flash is displayed, somewhere else on your screen real estate some equivalent content of what was found within the flash file is also presented outside of Flash as plain text within the HTML code.
Navigation If a navigational element was made in Flash, similar to the content, adding another alternative navigation on the site will help get all other pages crawled and index in the search engines. A common implementation of this is having footer text links at the bottom of every page.
Advantage: Unlike the technique above, you do not need to make another website. Just make sure whatever text content you have on your flash file. Have it available elsewhere on the page outside of Flash.
Disadvantage: Cannot be done on a full Flash website, but generally any website implementing SEO should not have a full Flash website.
Quick Tip: Flash should be used best here for areas where you want to attract attention. Places for your unique selling statement, current product or service promotions. Whatever content was found here is also found elsewhere in a plain text format for the search engines to read. The animation is mainly used to draw the attention (not to annoy) of a user to read and pay attention to the Flash and hopefully draw in the user to read and explore further.

3. sIFR for Flash Designed Text
Content: Text fonts used on websites are declared either in the non-standard HTML tag or in the declared font styles using CSS. Either way, these layout commands are telling the web browser to load the font file available on the viewer's local computer. In the absence of these font files, the browser will load it's default browser fonts that is often Times New Roman. This limited web designers to use commonly installed fonts such as Arial, Helvetica, Verdana, Geneva, Courier, Times New Roman, and others that limited creative freedom of many web designers. So in order to design beautiful typefaces, you can either use an image or Flash but search engines had trouble reading these. Scalable Inman Flash Replacement or simply sIFR (many pronounce this is as sifer) is an effective use of Flash text replacement. Similar to how CSS image text replacement is done, but done with Flash. What makes this even better than CSS image text replacement is sIFR can be used more effectively in CMS since both can be easily generated dynamically. Although an image with text written on the image can be done dynamically also with tools such as PHP's GD library, it is not as easy nor resource friendly as sIFR.
Navigation: The text on any anchor link is important in SEO for this is what gives meaning to what the destination page is all about. The important keywords within the a tags are your targeted keywords. But if the link is applied on a image or a Flash file, you have to make sure the targeted keywords are still read by search engines. In images you use the alt attributes. Since sIFR is used to 'stylize' text and not for any other animated effect, sIFR is idea to add beautiful looking text links not sacrificing the crawlability of the links.
Advantage: Ability to create styled text font faces that are completely viewable as plain text in the "eyes" of the search engines. sIFR is very lightweight and scalable. Relatively easy to implement and Google has expressed their acceptance of this method. sIFR also downgrades gracefully if Flash is disabled or not installed on a web browser.
Trivia: Inman is the last name of Shaun Inman that first to experiment with Javascript code used in sIFR. This was then modified and improved by Mike Davidson and Mark Wubben so that is can be used to replace HTML text elements.

4. Use SWFObject
Content: Probably something you have heard quite often if you have been keeping up-to-date with the latest SEO techniques and those who were not able to fully catch up may think this technology is so confusing. To explain this further, let's explain how Flash is added to a webpage. Normally, Flash will have HTML object and embed tags. Within these tags is the source of the Flash file with the .swf file name extension along with several parameters how this .swf file is displayed such as the height, width and more. The SWFObject is a Javascript function that detects if Flash is available. As mentioned on Geoff Stearns website:

SWFObject is a small Javascript file used for embedding Adobe Flash content. The script can detect the Flash plug-in in all major web browsers (on Mac and PC) and is designed to make embedding Flash movies as easy as possible. It is also very search engine friendly, degrades gracefully, can be used in valid HTML and XHTML 1.0 documents*, and is forward compatible, so it should work for years to come.


Since this is mainly a Flash detection script and it replaces HTMLs blocks such as a typical div tag, plain text HTML content can be placed within the div tag. And only if Flash is enabled will it display the Flash over the div tag. A simple code implementation would look like this:

In the section, the SWFObject is called.

script type="text/javascript" src="swfobject.js"/script

Also in the or optionally in the you have the SWFObject call stating the .swf file to load.

script type="text/javascript"
var so = new SWFObject("flashfile.swf", "flashheader", "400", "200", "8", "#ffffff");
so.write("flashcontent");
/script


This will then look for the element with the ID flashcontent and replace it with the .swf file flashfile.swf. And anywhere within the webpage's content, you can have:

div id="flashcontent"
Text placed within these tags are search engine friendly. And can be read by search engines.
/div


Since this script is mainly a script that checks the browser's Flash capability, search engine spiders are not Flash enabled, thus Flash does not run. Aside from that, JavaScript is used to invoke Flash and search engines are not JavaScript enabled either. But since the content of the Flash file is displayed as plain text beneath the Flash in a div tag, search engines are able to read the content you gave of that Flash file.
Navigation: In the same way how the content is replaced with Flash, navigational elements work exactly the same way. Having plain text HTML links within the Flash replaced div tag.
Advantage: Can work with full Flash sites and websites that have only portions of it in Flash. Full Flash sites can degrade normally into non-Flash sites without having the need for a visitor to choose which site version to view. Flash page URLs and non-Flash page URLs are unified into a single URL.
Disadvantage: Very easy to implement shady to dark blackhat techniques with keyword stuffing behind the flash that can get you banned in the search engine. Although this is easily avoidable by simply not putting any content on the plain text HTML code that is not visible on the Flash file. As long as you keep it clean, you are safe. In Goggle's Adam Lasnik's own words in an interview by Eric Enge, he states:

I haven't happened to catch any of the SWFObject based flash sites, so, I can't give a definite answer on that one, but the key thing here is that if the text that is essentially gracefully rendered outside of the flash for those who don't have it, is identical to what folks that do have flash capabilities in their browser are seeing, then generally there is not going to be a problem.

Trivia: SWFObject used to be called FlashObject. The name was changed due to legal/trademark reasons.


Do I block my Flash files?

Google has been showing their advancements in reading .swf files. Google can actually go into a .swf file and extract the text it can find within this file. Although it may not be a wise decision to let Google index your Flash files.



Above is a sample .swf file indexed by Google on a popular website. The title is totally meaning less and so is the description. And if ever someone does visit this Flash file, it won't lead the visitor to the rest of the site. Google is indeed doing good in reading Flash files but in my opinion it is still not the right time to let Google index your Flash files as you cannot optimize them as well as you can with HTML pages, and if visitors check the Flash file, you have a smaller chance into making them visit the rest of the site. And so far it is only Google that has this capability and it is nice to still be search engine friendly to all major search engines. To solve this issue, you can simply place all .swf files in one folder and block them off in robots.txt.

Disallow: /swf/



Adobe®, Flash® and Shockwave® are tradenames of Adobe Corporation.