Topics In Demand
Notification
New

No notification found.

Blog
Site built on React JS and Google not crawling it? Our solution to get SEO to work

April 16, 2018

5430

0

Before we begin, here is a little something about us to give you a better context. 

Pickyourtrail helps travellers create, customize and book their vacations, all in just 10 minutes. Our machine learning algorithms ensure that you have the perfect itinerary that is suited to taste while keeping the cost minimal. We let the users edit cities, days, activities. We heavily rely on ReactJS to enable such hyper-customization. This is where this challenge comes in. 

A little background

Pickyourtrail website offers travel solutions/packages and also has a destination guide for travellers. These two ‘products’ are built on different stacks.

Most stacks that are built on React/angular use Node as their backend, instead of Java. When we built the product – we moved from JS to React. The primary reason being Java is a lot more established language in comparison to Node which was still new. While Java has not let us down (and we don’t see any major reason to move out of Java from an architecture perspective) – there is a challenge we faced.

The Case

For all the pages on our site that have been rendered with ReactJS, we had a problem with Google crawling them. For instance, the page, https://pickyourtrail.com/guides/bali is powered by React and data for the application is dependent on our backend API’s.

Most articles that we came across would talk about fetch + render on Google webmasters. However, when we tried this to our page, it seemed to work, but with a catch. A google search revealed that the content of this page was not indexed.

We did some research and found that most people solve this issue by resorting to methods like SSR (Server Side Rendering) or pre-rendering. This, in our case, was a time-consuming solution and not really feasible. So we were left with no option but coming up with our own solution!

The Solution

There were a lot of discussions that we came across on how robust Google is when it comes to crawling javascript generated content. We came up with an application with React JS using the create-react-app boilerplate. When we submitted this to Google for indexing, some of the pages were indexed while others weren’t.

 

It was strange.

 

We started analyzing the pages that were indexed to understand the behaviour. 

class=image-1

 Above GSC screenshot shows the Googlebot view (left) and visitor view (right). It’s clearly shown that Googlebot is not able to read/crawl the content rendered based on API response.

 

So the API call seemed to be creating the issue. Refer to the test below-

 

Content Source

Time Taken

 

 

Crawled

Within the react component

(hardcoded)

> 5 secs

Y

Embedded JSON file within the project

> 5 secs

Y

AJAX API

> 5 secs

N

 

And yes the react pages we built comes under the last category, that is dependent on an AJAX API call for the data. The whole page took more than 5 seconds to completely render. Because the page had more than 20 – 25 images, a couple of Google maps embedded and a carousel etc.

To make sure that this was the real issue, we removed all the images and the embedded Google maps from the page. We also removed some parts of the response payload to reduce the size of the DOM tree.

Now we tried submitting the page to Google again.

As you can see from the screenshot above that Googlebot wasn’t crawling the content that came from API.

After we removed the images, we added a javascript timer on the header of the page. Then we did a Fetch & Render in GSC.

class=image-2

Voila!

For the first time, Googlebot could see what the user would see. And Googlebot exactly waited for 5 seconds, crawled whatever is available in the DOM.

So now we have a clear direction to head towards; make the entire page load below 5 seconds, at a reasonable internet speed.

And this how the test worked:

 

Content Source

Time Taken

Crawl status

Within the react component

(hardcoded)

less than 5 secs

crawled

Within the react component

(hardcoded)

more than 5 secs

crawled

Embedded JSON file within the project

less than 5 secs

crawled

Embedded JSON file within the project

more than 5 secs

crawled

AJAX API

less than 5 secs

crawled

AJAX API

more than 5 secs

not crawled

 

And this is how we arrived at the golden ‘5 seconds’

We faced quite a challenge in getting the entire page to load in within 5 seconds, based on the API response.

We used react-router to render for routing and the page functions as below:

Step 1: The Page loads

Step 2: An API call is made based on the requested URL.

Step 3: Content is rendered based on the response received from the API.

 

Conclusion: Three different ways to approach this problem

 

Approach 1: Progressively loading the content

The page had a single API call which returned a relatively large payload with the content for the entire page.

At first, we thought of breaking this into multiple API calls to reduce the payload size and progressively load the content to keep the DOM tree minimal for the first load. We heard a big NO from our SEO team because it would be seen as cloaking by the Google bot (since Google bot would see a different content and the user will see a different content).

Pros: Reduced page load time, better UX

Cons: Possibility of a negative SEO implication

 

Approach 2: Using a “Read more” button

What if we load the main fold content alone, put a “Read more” call to action, load the rest of the page’s content with another API call?

It solves the cloaking issue as the user and the Google bot will see same content until the user clicks on “Read more” call to action.

But then, it leads to a very poor user experience. We heard a big NO from the UX and business teams.

Pros: Reduced initial page load time, no risk of SEO penalization for cloaking

Cons: Leads to poor UX

 

Approach 3: Optimize everything, almost everything

So we went back to the safe and sound method of performance optimization. This is what worked for us.

  1. Slightly reduced the payload size of the API response.
  2. Compressed every single image on the page with highest possible lossless compression ration.
  3. Optimized our CSS and  removed every unwanted line, minified them
  4. Optimized our javascript, deferred some.
  5. Cached the API response (although this doesn’t have an impact from crawlability stand-point)
  6. Lazy loaded all the images and progressively loaded the embedded Google maps.
  7. Enabled GZIP

Now the page is loading pretty well under 5 seconds but of course, images were loading after that. But now the initial load of this page is able to pass the ‘less than 5 seconds’ test.

We gathered some courage and submitted it to Google Webmaster tools to Fetch and render.

And it got crawled in a couple of days!

How did we handle meta title and description?

This being CSR-only (client-side rendering) project, we had to find ways to dynamically rewrite the values of meta title and description of the index.html page for every dynamic page.

The documentation create-react-app suggests replacing the values meta title and description at the server.

Generating Dynamic <meta> Tags on the Server

Since Create React App doesn’t support server rendering, you might be wondering how to make <meta> tags dynamic and reflect the current URL. To solve this, we recommend to add placeholders into the HTML, like this:

<!doctype html>
<
html lang=en>
  <
head>
    <
meta property=og:title content=__OG_TITLE__>
    <
meta property=og:description content=__OG_DESCRIPTION__>

Irrespective of the backend you use, on the server it is possible to read index.html into memory. Replace placeholders such as __OG_TITLE__, __OG_DESCRIPTION__ with values corresponding to the URL. To ensure these placeholders are safely embedded into HTML, cleanup interpolated values.

If you use a Node server, you can even share the route matching logic between the client and the server. However, duplicating it also works fine in simple cases.

This is out of option given the limitations of our architecture. Then we looked out for other options.

React Helmet

React helmet is a plugin that allows us to manage the head portion of the HTML document even from within the body.

The caveat of this approach is, it only replaces the meta title and description after the particular line of code is executed inside the react component render method. So we had to make sure the appears as close to the top of the return function of the root element.

We were able to dynamically change the meta title and description of the index.html page for every dynamic page and all these dynamically generated meta title and description got indexed in Google.

Finally, we cracked it!


That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.


© Copyright nasscom. All Rights Reserved.