Database Consistency

In this post, we will show the meaning of Database consistency by showing an example involving Google App Engine Datastore Consistency.

Database Consistency when it is eventual

When one thread is responsible for updating the data we can see that the thread that is responsible for showing the data is showing the previous data because of Datastore is eventual consistent.

Examine this simple Web App from this repository:

The model for the datastore is just consisted of simple 3 fields: email, message and auto filling date and time :
from google.appengine.ext import ndb

class database1(ndb.Model):
email=ndb.StringProperty()
message=ndb.StringProperty()
last_touch_date_time = ndb.DateTimeProperty(auto_now=True)
#auto_now=true is telling constructor to keep track of dtate of time
The main.py looks like:
import logging
import os
import jinja2
import webapp2

from model import database1

jinja_env = jinja2.Environment(
loader=jinja2.FileSystemLoader(os.path.dirname(__file__)),
autoescape=True)

class MainHandler(webapp2.RequestHandler):
def get(self):
message_query=database1.query().order(-database1.last_touch_date_time)
template1= jinja_env.get_template('templates/message.html')
self.response.out.write(template1.render({"message_query":message_query}))

class saveMessage(webapp2.RedirectHandler):
def post(self):
logging.info(str(self.request))
obj_databasemodel=database1(email=self.request.get("email"),
message=self.request.get("message"))
obj_databasemodel.put()
self.redirect(self.request.referer)

app = webapp2.WSGIApplication([
('/', MainHandler),
('/save_message', saveMessage),
], debug=True)

So we have 2 handlers here. MainHandler GET the value from message.html. In the message.html form action is set as
form action="/save_message" method="post";
So it sends to the save_message handler and object obj_databasemodel is updating values by PUT. But When a Datastore entity is "PUT" into the Datastore, really what happens is an event is created to do the "PUT" eventually, but the flow is returned for speed reasons immediately (the real "PUT" is slow and done later on another thread). Then the page was reloading before the "PUT" actually finished and new messages weren't displayed, rather still shows the old ones.



To clarify more:

When the query is not ancestor query(like the query above), an updated entity value may not be immediately visible when executing a query. The replication is executed with the Paxos algorithm, which synchronously waits until a majority of the replicas have acknowledged the update(PUT) request. The replica is updated(PUT) with data from the request after a period of time. In many cases, the update will have reached all the replicas very quickly. However, there are several factors that may, when compounded together, increase the time to achieve consistency. So, in case we deploy this app in the production, we might see that the updated data if we are lucky. But there is no guarantee. So, to demonstrate if we deploy the app in the development, the development environment emulate the behaviour of eventual database consistency. And by that it let us know that it is not an ancestor query.

Upgrading the code with Ajax to auto refresh the page keeping the database eventual consistent

Now let's have changed code here:

We added an extra handle showMessages in the code if you notice in between MainHandler and saveMessage in main.py.
class MainHandler(webapp2.RequestHandler):
def get(self):
message_query=database1.query().order(-database1.last_touch_date_time)
template1= jinja_env.get_template('templates/message.html')
self.response.out.write(template1.render({"message_query":message_query}))

class showMessages(webapp2.RequestHandler):
def get(self):
message_query=database1.query().order(-database1.last_touch_date_time)
template2= jinja_env.get_template('templates/all_the_messages.html')
self.response.out.write(template2.render({"message_query":message_query}))

class saveMessage(webapp2.RedirectHandler):
def post(self):
logging.info(str(self.request))
obj_databasemodel=database1(email=self.request.get("email"),
message=self.request.get("message"))
obj_databasemodel.put()
self.redirect(self.request.referer)
And also notice the difference after <h1>Messages</h1> section in the "message.html".

In the eventual consistent demo app(without ajax) the part is:
{% for message_entities in message_query %}
email: {{message_entities.email}}
div class="message"
message: {{ message_entities.message }}
{% endfor %}
In the upgraded app with ajax (still the database is eventually consistent) the part is:
<div id="messages">

</div>

<script>
function updateMsg(){
$.ajax({
url:"/all_the_messages",
cache: false,
success: function(frag){
$("#messages").html(frag);
}
});
setTimeout('updateMsg()', 4000);
}
updateMsg();
<script>
So, in the previous code, we had just one file "message.html" to show all things. Now we have two files: 1. message.html containing the text boxes for email and message to enter by the user and another 2. all_the_messages.html  to show the messages. And how these messages are showing? These messages are shown by updating all_the_message.html at an interval of 4000 ms.

That is why in the main.py file we have two separate handlers now. One handler that stayed as before meaning the MainHandler that is showing message.html that is not containing the message itself plus another extra handler, this time, showMessages that is responsible to show the messages "all_the_messages.html" which was not present in the previous example but in a separate file all_the_messages.html this time which is a fragment of HTML. There's no header, not HTML tags nothing. Let's not forget to import the jQuery Library inside the script tag in message.html so that we can use ajax to show all_the_message.html.

Database Consistency when is strong

Now let's have a look at the changed code from this repository.

We added a parent key to all MovieQuote entities. Then we add an Ancestor Query for entities that had that parent key. Ancestor Queries will look for unfinished events within the entity group and wait for them to complete before running the query.



So, the way to get this Strong Consistency feature is to make the entities children of a parent key and use that parent key for an Ancestor Query. In that case, what it does is that it looks in the event logs that needs to be processed. If waits until its finished and then it makes its query.

So the main handler should be:
class MainHandler(webapp2.RequestHandler):
def get(self):
message_query=database1.query(ancestor=PARENT_KEY).order(-database1.last_touch_date_time)

template1= jinja_env.get_template('templates/message.html')

self.response.out.write(template1.render({"message_query":message_query}))
If you had a million users you wouldn't want to lock down the entire Datastore using a single shared parent for all users. If you were worried about scaling to millions of people there needs to be a more clever mechanism in place where there are MANY entity groups using different parent keys if Strong Consistency is required.



Post a Comment

0 Comments