Text summarization web service with NLP machine learning
I wanted to create a web service that takes the text and summarizes it using machine learning.
There are several API services on the market, but I wanted to see how easy (or hard) it is to create my own.
Step 1 – Use an existing NLP package
There’s no need to reinvent the wheel, and I took this NLP package: https://pypi.org/project/bert-extractive-summarizer/
I followed the instructions and got this error:
Neither PyTorch nor TensorFlow >= 2.0 have been found.Models won't be available and only tokenizers, configurationand file/data utilities can be used. Traceback (most recent call last):
Step 2 – Install missing dependencies
The solution was easy, and I had to install the two packages:
pip3 install TensorFlow
pip3 install torch
If you plan to use a GPU, you might want to install TensorFlow-GPU instead.
Step 3 – Checking that the example works
I took their example, and tested it, before moving forward, we need to see it’s working. I used the “large example”, and it worked.
Now we are ready to create the server.
Step 4 – Python HTTP server
I used this example for Python’s web server: https://pythonbasics.org/webserver/
Remember that Python’s web server is not meant to be secure, so make sure you wrap it with an HTTP secure server, or use it locally.
I tested their script, and it worked.
Step 5 – Combining it all
I did a merge of both source codes and added some magic. This server is not using any of the summarizer extra features, but it should be easy to customize it.
Here’s the code: (you can download it from: https://seo-explorer.io/code/open-source/machine-learning/python-nlp-summarizer-web-service)
from summarizer import Summarizer from http.server import BaseHTTPRequestHandler, HTTPServer import time from urllib.parse import parse_qs, urljoin, urlparse import base64 hostName = "" serverPort = 80 model = Summarizer() class MyServer(BaseHTTPRequestHandler): def do_GET(self): text = parse_qs(urlparse(self.path).query).get('text', None) self.send_response(200) self.send_header("Content-type", "text/html") self.end_headers() result = model(base64.b64decode(text[0]).decode('utf-8'), min_length=60) full = ''.join(result) self.wfile.write(bytes(full,'utf-8')) if __name__ == "__main__": webServer = HTTPServer((hostName, serverPort), MyServer) print("Server started http://%s:%s" % (hostName, serverPort)) try: webServer.serve_forever() except KeyboardInterrupt: pass webServer.server_close() print("Server stopped.")
Usage:
Run it with:
python3 server.py
It will listen on port 80 on all interfaces, send it a base64 encoded text:
http://yourserver/?text=base64encodedtext
The server will send back the summarized reply. The base64 is to avoid any errors regarding text and text encoding.
Summary
No wonder the APIs cost is low, it’s straightforward to create your own service, one thing to consider is using a GPU for high volumes.