Saturday, August 29, 2020

Build a FAQ-based NLU for chatbot

Chatbots are one type of AI assistants, to build a chatbot, NLU is one of key components, NLU stands for Natural Language Understanding.  Understanding the user input correctly is the foundation of giving out correct response.

The Challenge

The challenge is quite simple, given 2 CSV files below build a NLU model with highest accuracy 
  • training set: 20 pairs of questions and answers, each mapped to a question ID
  • test set: 20 rephrased questions and their corresponding question ID
The Research

After research, we found that the following vendors are available for building NLU and pros and cons listed below

Google Dialogflow: create FAQ chatbot with Dialogflow (video)
  • pros: 
    • ease of use: its knowledge connector feature (launched August 2019, in beta status as of August 2020) enables automatic entity extraction and intent creation, no need of manual creation of entities and intents which saves time and can get us straight to model testing and improvement phase
  • cons: 
    • cloud based, unable to own the full IP rights
    • the model's accuracy depends on whether input questions are distinct enough from each other, if two questions are too similar to each other, the NLU model may give wrong result, e.g. Eg. Q1-Do you allow pets? Q2 - Are kids allowed? And user asks - Are pets allowed and it matches with Q2
    • cannot train the model on individual question
Amazon Lex: JSON format for importing and exporting
  • cons: 
    • cloud based, unable to own the full IP rights
    • difficult to use: requires manual definition of intents
  • pros
    • open source, user owns full IP rights
  • cons
    • need to build from scratch and need more effort than cloud-based solutions
The Conclusion

Due to time constraint of this challenge (only a few hours), we went ahead with knowledge connector offered by Google's Dialogflow and built a model on cloud.  Google Dialogflow automatically creates Service Account to access the Dialogflow Agent (NLU model), but its token expires every hour or so, we manually refreshed token and created a Python program to iteratively feed test questions to Dialogflow Agent and compared its output answer to correct answer and got 70% accuracy rate.

If you run into similar problems or would like to learn more details of this story, please contact buytition@gmail.com

Monday, November 25, 2019

Front-End Firefight: ng serve CORS and TD not taking full width of TABLE

Web page design / formatting is not easy, a look-simple effort of moving search box and pagination controls from top of page to fixed header took 2 days or 8+ hours.

Motivation

as a tool enthusiast, I am addicted to useful and easy-to-use tools such as google, gmail etc, and because of this addiction, I spent 10+ hours to put out 2 fires over the weekend, both fires are related and were started from my motivation below.

I am therefore passionate to creating such a tool if possible.  With that in mind, one feature I have desired for Buytition Quote List View and Message List View for a long time is to move search box and pagination controls to fixed header section.  Before this change, both search box and pagination controls are at top of the views and if you scroll down the list, both of them will be gone until you go back to page top, but both controls are so useful and I feel very inconvenient that I had to scroll back to top to either modify the search term or scroll to next or previous page.

But for a long time I was busy with something else and did not have a chance to get to this wishful thinking, until recently, my TODO list is finally cleared somehow to an extent that on Saturday morning I woke up and felt it's a good time to get this long-waited wish fulfilled..

Fires and Firefight

First of all, I am not a front-end professional but I do have experience modifying and improving a web application based on AngularJS. 

My previous Front End effort  with AngularJS application left a legacy problem which I wanted to solve first because I believe solving that problem will not only make this effort easier but also make subsequent front-end effort easier.  That problem was CORS issue when using `ng serve`, I like ng serve feature from node.js as it makes developing AngularJS applications much easier and more convenient, however, one problem I encountered with ng serve for more than 1 year was inability for ng serve http://localhost:4200 to access API served on another port of same server.

This fire has existed for more than 1 year, I know it can be put out, but I did not have a good idea of how to put it out and it was not that urgent to put out, so it was there for that long time.  Now I have determined the urgency to put out it, I focused on solving it and that focus did pay off, after a few hours of research and try, I found a solution that problem by doing

ng serve  —-proxy-config proxy.conf.json

thanks to this online tutorial.

That did make my life of making change to AngularJS application easier than before and then comes the 2nd fire.

For the 2nd fire, I do not want to elaborate too much on it except acknowledging that, as title indicates, it's caused by TD not taking full width of TABLE which is caused by mysterious incorrect configuration of Cascade Style Sheet which I do not know the real root cause up to now.

Lesson Learned

I want to thank Google Chrome to make it possible to solve this problem without being a CSS expert, and the way I solved it was to try different CSS styles to problematic TABLE element in Google Chrome until I see a satisfactory appearance.

If you run into similar problems or would like to learn more details of this story, please contact buytition@gmail.com

Saturday, November 16, 2019

Story of reducing EBS volume size

As my previous writing stated, technologists journey is filled up by unexpected challenges at unexpected places due to unexpected reasons.

Motivation

AWS offers Elastic Block Storage (EBS) as hard drives for EC2 instances.  One of our EC2 instance used a 40GB EBS volume which cost us about $4 per month, but we are only using 5.5GB of that 40GB, so I have been thinking of reducing size of storage to save cost, however, unlike the convenience AWS gives you when expanding storage size, it is totally a different animal if you try to reduce EC2 storage size. $4 a month is not big deal, but given prospect of our rapid growth in future, despite the technical challenge and unavailability of its know-how, I felt it's necessary to take on this challenge and solve it.

Fire: Following Wrong Tutorials

As one of the tutorial mentioned, It turned out to be a 6* hour gruesome effort luckily with a happy ending.  But before the happy ending, like the theme of this blog, this effort of exploration started a fire as I followed these steps:

  • google search keyword "reduce EBS volume size"
  • following steps of 2 tutorials at top of Google search results but failed after about 4 hours of effort


Fire Drill: Find and Follow Correct Tutorials

Now after spending the previous 4 hours walking the wrong path, I recollected the reason for failure being the fact that EBS volume I tried to shrink size being system boot drive, while the tutorials I followed did not address this particular characteristic, therefore I adjusted google search term and did the following

  • googled again by adding some keyword related to boot or startup and found another tutorial which looks better than previous two
  • following the tutorial and achieved goal


Lesson Learned

For a technical issue with multiple online tutorials each offer a different solution, Google's ranking of search results (these online tutorials) is not a reliable metric to measure quality and reliability of these tutorials. 

If you run into similar problems or would like to learn more details of this story, please contact buytition@gmail.com

Saturday, July 20, 2019

Problem with lxml.html.clean.Cleaner

If you are a Gmail user, have you ever wondered how Google did to the raw HTML of infinite types of email messages so that they can all displayed properly in your browser and mobile phone?

In case you don’t know but email messages you receive from various senders include the multiple kinds of HTML tags including style, script etc.   On a web mail website such as Gmail’s,, HTML of an email is displayed within HTML of parent webmail page, so email message script and style  HTML tags will interfere with the web mail parent html, therefore Gmail needs to process the raw HTML in the email message, so does Buytition Web Mail.

So the requirements are given a HTML page strip out interactive tags such as script and style but to leave the remaining parts.  To satisfy these requirements, the best Python package in open source space is lxml.html.clean.Cleaner  which does exactly what the requirements ask for.   So we tried this  it as Solution 1 below but a few months later we found a problem

Solution 1: lxml.html.clean.Cleaner
Result: DID NOT WORK
This solution works in most cases but in few cases make damaging errors.   One of these few cases are that when an entire table is wrapped by an a tag, in this case cleaner will wrongfully consider the a tag as unclosed tag and wrongfully modify the cleaner html by immediately appending a closing a tag after it, thus making the Originally linked table unlinked in the clean HTML

Solution 2: BeautifulSoup

Result: DID NOT WORK
This solution does not work because it does not retain HTML tags in the output

Finally we chose a third a solution which used none of the open source packages. The lesson learned from this fire drill is that even though many open source packages are available but their quality are usually unknown to developers and whether they fit your use case will need ta lot of testing and effort from you to find out.

If you run into similar problems or would like to learn more details of this story, please contact buytition@gmail.com

Sunday, July 14, 2019

Our Story with Flask, Google OAuth2 Library and Gunicorn

Earlier this year, we announced that buytition.com was integrated with Gmail, in order to achieve this integration, we were required to roll out an API endpoint on our side to receive call from Google after user completes authorization flow steps in popup window.

Challenge: Flask Native Web Server vs Gunicorn

The API framework we chose to use was Flask because of its simplicity.  Then I had to make a choice of using one of 2 web servers to serve the Flask API:

  1. Flask native web server
  2. one of wsgi web servers such as gunicorn

Although it's been widely heard that choice 2 is better than choice 1 such as this article, as choice 1 is for development purpose and choice 2 is a more of a real web server.  I still went with choice 1 first because when it comes to unknown technical choices, one of my guiding principles is: never believe hearsay, go with most simple and straightforward solution first, unless there is verifiable evidence to support alternative choice is better.

Problem with Flask Native Web Server

A few months after using Flask's native web server, I do observe a huge disadvantage: the API process may become dead after error of IOError: [Error 32] Broken pipe,, usually the error comes up when multiple API requests are made at same time.  And this error is quite nasty because: 1st, I don't get a notice when this error happens; 2nd, Flask server needs to be manually restarted which is time-consuming.

Fire: Problem with Gunicorn

Naturally, now the hearsay I heard a few months ago is proven to have its validity, with this understanding, I felt comfortable to switch to choice 2 of gunicorn.  However, just as things go, nothing is perfect, everything has its pros and cons, a few weeks after, a nasty error blocking users from linking their Gmail accounts started to surface, and this is at last step of Google OAuth2 flow.  Google OAuth2 Server-side Process which includes 5 steps of complex interactions among 3 parties: Google, user and Application Web Server.  The error happens at last step: Exchange authorization code for refresh and access tokens, The error is InsecureTransportError: (insecure_transport) OAuth 2 MUST utilize https. and rises at fetch_token of the following code

state = flask.session['state']
flow = google_auth_oauthlib.flow.Flow.from_client_secrets_file(
    'client_secret.json',
    scopes=['https://www.googleapis.com/auth/youtube.force-ssl'],
    state=state)
flow.redirect_uri = flask.url_for('oauth2callback', _external=True)

authorization_response = flask.request.urlflow.fetch_token(authorization_response=authorization_response)

What's strange about this error is it happened specifically to gunicorn web server, the same piece of code worked fine under Flask native web server mode.  Now I felt having gone a round trip and back to the initial challenge: Flask Native Web Server vs Gunicorn,  Both options have pros and cons, now Gunicorn has a hard stopper, should I go back to Flask web server approach?

Fire Drill

Out of fear of IOError, I decided to stick with Gunicorn approach and tackle the problem of InsecureTransportError.

The first challenge I faced was to get visibility into redirect URL that was passed to fetch_token function,  Getting to know content of this URL string is key first step to diagnose this problem since the error indicates this URL is http rather than https protocol.  However, strangely enough, for unknown reason, logging calls such as print in this function does not print out the string content like in any other Flask API function calls, in addition, this error cannot be replicated locally as well.  So I used an unconventional solution by logging the debug information into a DB table at end of API function call and it worked.

After getting visibility into URL string passed to fetch_token function, I tested running Google OAuth2 process using Flask web server and Gunicorn and compared values of that URL of the two, To my surprise, values are same for both options, something like this: http://0.0.0.0:5000/oauth2callback?state=..., again for some unknown reason, Gunicorn option run into InsecureTransportError which Flask web server does not experience.  I don't want to explore why Flask web server can tolerate this but Gunicorn cannot, I just went ahead and do the following

authorization_response = authorization_response.replace(
"http://0.0.0.0:5000", "https://buytition.com")

After doing the above replacement, I tested both options and both of them work fine.

If you run into similar problems or would like to learn more details of this story, please contact buytition@gmail.com

Friday, July 13, 2018

Hello World from replybot.io Engineers

How we spend our lives and energy defines who we are.  Engineers are problem solvers and constant learners. When we ware problem solver hat, we are like fire fighters and police detectives, when we change to constant learner hat, we are like curious George, and would like to learn new technology and explore uncharted areas.  As we explore in uncharted areas, we usually run into fires and problems, then we will put back problem-solver hat and went back to fire-drill mode.. Lives of engineers are journeys full of learning / exploring and putting out fires.

In fact, the only things I remember about 20 years ago when I first entered career as a junior engineer are when I felt guilty of having screwed things up and subsequent intense days and nights trying different solutions to put out fires I started due to lack of experience.

Now 20 years after that, I still constantly run into similar situations, I still start fires while exploring uncharted areas, however, the only difference is that I am a much more experienced fire fighter now so I always have my firefighter toolkit at my disposal nowadays.  In the journey of bringing new features to market and implementing new products, different kinds of fires get started now and then in different areas, I keep entering into fire drill mode.

Usually this process goes like this: a problem that breaks things is discovered, usually it is a roadblock to work stream or have a significant impact to customers, no easy solution can be found, then I enter fire drill mode, this mode usually lasts for a few days or a few weeks, under this mode,  my nerve gets tight, I feel the desire to not sleep, I can't stop thinking about the problem and any possible solutions, I feel myself a detective facing a tough criminal case, the first thing I do after I wake up is to turn on computer and try solutions I have been thinking at bed.  After trying different solutions, I finally come to the right solution and put out fire.

Through the journey of consecutive such fire-drill events, I have learned a lot, the tougher a fire to put out, the more I learn from the process of putting out it.  This blog is intended to log problem solving journeys of myself and all ReplyBot engineers.