A Puzzling Problem with ctx.parents()[i].description()

I’m using a JSON script to pass data from the server side (written in Python) to the client side (in JavaScript) so that I can format the data and show it on the pushlog onScroll. I’m trying to add functionality so that merge changesets show up properly for the newly loaded data like the the image below:

Now, to do this I need to add extra data to the json-pushes script which means making changes server side. The functionality of the script is in pushes_worker() in the pushlog-feed.py file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def pushes_worker(repo, startID=0, endID=None):
    #pdb.set_trace()
    stmt = 'SELECT id, user, date, rev, node from pushlog INNER JOIN changesets ON id = pushid WHERE id > ? %s ORDER BY id ASC, rev ASC'
 
    args = (startID,)
    if endID is not None:
        stmt = stmt % 'and id <= ?'
        args = (startID, endID)
    else:
        stmt = stmt % ""
    if os.path.basename(repo.path) != '.hg':
        repo.path = os.path.join(repo.path, '.hg')
    conn = sqlite.connect(os.path.join(repo.path, 'pushlog2.db'))
    pushes = {}
    for id, user, date, rev, node in conn.execute(stmt, args):
        ctx = repo.changectx(node)
        if id in pushes:
            pushes[id]['changesets'].append(node)
        else:
            pushes[id] = {'user': user,
                          'date': date,
                          'formattedDate': util.datestr(localdate(date)),
                          'changesets': [node],
                          'individualChangeset': hex(ctx.node()),
                          'author': ctx.user(),
                          'desc': ctx.description()
                          }
    return pushes

Now, a row is a merge row if len(ctx.parents()) is greater than 1. Then we can get the required data for the merge changesets (changeset, user, description) by going ctx.parents()[0].user() or ctx.parents()[0].description() etc.

Then, I needed to decide which data structure to use so that I can get the changeset, user and description for each merge row. I thought of using a 3D array or a character delimited string inserted into an array or a dictionary within a dictionary but the problem is that I’m getting a weird server error when I call ctx.parents()[0].description()

The following is my code right now:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def pushes_worker(repo, startID=0, endID=None):
    #pdb.set_trace()
    stmt = 'SELECT id, user, date, rev, node from pushlog INNER JOIN changesets ON id = pushid WHERE id > ? %s ORDER BY id ASC, rev ASC'
 
    args = (startID,)
    if endID is not None:
        stmt = stmt % 'and id <= ?'
        args = (startID, endID)
    else:
        stmt = stmt % ""
    if os.path.basename(repo.path) != '.hg':
        repo.path = os.path.join(repo.path, '.hg')
    conn = sqlite.connect(os.path.join(repo.path, 'pushlog2.db'))
    pushes = {}
    mergeData = []
    for id, user, date, rev, node in conn.execute(stmt, args):
        ctx = repo.changectx(node)
        if len(ctx.parents()) > 1:
          #pdb.set_trace()
          for cs in ctx.parents():
            #uc = hex(cs.node()) + '||' + cs.user()
            #mergeData.append(hex(cs.node()) + '||' + cs.user() + '||' + cs.description())
            #mergeData.append(str(cs.description()))
            #mergeData[uc] = hex(cs.description())
            #mergeData = cs.description()
            #mergeData[cs.user()] = cs.description()
            mergeData = cs.user()
        if id in pushes:
            pushes[id]['changesets'].append(node)
        else:
            pushes[id] = {'user': user,
                          'date': date,
                          'formattedDate': util.datestr(localdate(date)),
                          'changesets': [node],
                          'individualChangeset': hex(ctx.node()),
                          'author': ctx.user(),
                          'desc': ctx.description(),
                          'isMerge': len(ctx.parents()),
                          'mergeData': mergeData
                          }
    return pushes

I’m looping through each ctx.parent() and then performing cs.user() or cs.description() to get the required data. Then I am giving the data (either in array or dictionary form) to the script by going ‘mergeData’: mergeData

Now, here’s the problem, if I do what I have on line 22, mergeData.append(… and then give that array to the script I get the following error… http://pastebin.com/f1b4e6981

BUT…

if I package the data in a dictionary like on line 26 I get no error and the script renders it with no problem. I’ve narrowed the problem down to cs.description(). If I invoke cs.description() and put it in an array I get the error but if I put the same cs.description() within a dictionary I get no error. This is very puzzling and as you can probably tell from all the various comments I’ve tried multiple different ways to pass cs.description() but only the dictionary way works, which is not ideal for what I want to be able to do

I’ve spent 8-9 hrs on this issue but I still keep hitting the same wall and I just don’t see why this is happening…

EDIT: The only way I have been able to pass this information to the client side is by putting the user and description in a dictionary and then putting the changeset into a seperate array. This is not a preferred solution but it is the only way I have been able to make this work so far. So the following is how this would work…

1
2
3
4
5
6
7
mergeData = {}
mergeChangesets = []
ctx = repo.changectx(node)
if len(ctx.parents()) > 1:
  for cs in ctx.parents():
    mergeData[person(cs.user())] = cs.description()
    mergeChangesets.append(hex(cs.node()))

EDIT:Thanks to some great work by jorendorff I have managed to solve this problem. Let me explain…hg stores everything in 8 bit strings (it is Unicode ignorant). When trying to convert to json it attempts to convert to JavaScript strings which are 16 bit. I was getting a UTF-8 error because the 8 bit strings used by hg are not valid.

So to check where the error was happening I printed out the results of cs.description() to a file (http://pastebin.com/m249f0ef5). I showed it to jorendorff and he pointed out line 146, which read:

“”Back out Dxe3o Gottwald’s patch from bug 380960 due to JS exception.”

The name should read Däo Gottwald. The ä wasn’t being rendered properly and thus the UTF-8 error was occuring. A great catch by jorendorff! 8 hrs or so of work and it all came down to one character. I think jorendorff put it best, “some people are UTF-8 impaired” and I couldn’t agree more (Just kidding Däo, if you are reading).

The Fix

OK, so we now know why the error was happening, now how can we fix it? Well, jorendorff recommended that I use clean()

1
2
3
4
5
def clean(s):
    try:
        return s.decode('utf-8')
    except UnicodeDecodeError, exc:
        return s.decode('latin-1')

Then, I can simply use the above function like so:

1
clean(cs.description())

and the problem is solved! The error is gone!

This entry was posted in Mercurial Project, Open Source and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>