Text summarization web service with NLP machine learning

I wanted to create a web service that takes the text and summarizes it using machine learning.

There are several API services on the market, but I wanted to see how easy (or hard) it is to create my own.

Step 1 – Use an existing NLP package

Table of Contents

machine learning summary with NLP

There’s no need to reinvent the wheel, and I took this NLP package: https://pypi.org/project/bert-extractive-summarizer/

I followed the instructions and got this error:

Neither PyTorch nor TensorFlow >= 2.0 have been found.Models won't be available and only tokenizers, configurationand file/data utilities can be used.

Traceback (most recent call last):

Step 2 – Install missing dependencies

The solution was easy, and I had to install the two packages:

pip3 install TensorFlow

pip3 install torch

If you plan to use a GPU, you might want to install TensorFlow-GPU instead.

Step 3 – Checking that the example works

I took their example, and tested it, before moving forward, we need to see it’s working. I used the “large example”, and it worked.

Now we are ready to create the server.

Step 4 – Python HTTP server

I used this example for Python’s web server: https://pythonbasics.org/webserver/

Remember that Python’s web server is not meant to be secure, so make sure you wrap it with an HTTP secure server, or use it locally.

I tested their script, and it worked.

Step 5 – Combining it all

I did a merge of both source codes and added some magic. This server is not using any of the summarizer extra features, but it should be easy to customize it.

Here’s the code: (you can download it from: https://seo-explorer.io/code/open-source/machine-learning/python-nlp-summarizer-web-service)

from summarizer import Summarizer
from http.server import BaseHTTPRequestHandler, HTTPServer
import time
from urllib.parse import parse_qs, urljoin, urlparse
import base64

hostName = ""
serverPort = 80

model = Summarizer()

class MyServer(BaseHTTPRequestHandler):
    def do_GET(self):
        text = parse_qs(urlparse(self.path).query).get('text', None)
        self.send_response(200)
        self.send_header("Content-type", "text/html")
        self.end_headers()
        result = model(base64.b64decode(text[0]).decode('utf-8'), min_length=60)
        full = ''.join(result)
        self.wfile.write(bytes(full,'utf-8'))

if __name__ == "__main__":
    webServer = HTTPServer((hostName, serverPort), MyServer)
    print("Server started http://%s:%s" % (hostName, serverPort))

    try:
        webServer.serve_forever()
    except KeyboardInterrupt:
        pass

    webServer.server_close()
    print("Server stopped.")

Usage:

Run it with:

python3 server.py

It will listen on port 80 on all interfaces, send it a base64 encoded text:

http://yourserver/?text=base64encodedtext

The server will send back the summarized reply. The base64 is to avoid any errors regarding text and text encoding.

Summary

No wonder the APIs cost is low, it’s straightforward to create your own service, one thing to consider is using a GPU for high volumes.

0 0 votes

Article Rating