Photo by LinkedIn Sales Navigator on Unsplash
To not miss out on any new articles, consider subscribing.
spaCy is a free open source advanced natural language processing library for python. According to the official documentation site, spaCy is designed specifically for production use and helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning. spaCy currently supports English, German, Spanish, Portuguese, French, Italian and Dutch languages.
Normally to install spaCy, you’ll do
>> pip install spacy
It installs spaCy library from pip and all necessary dependencies. However, if you’re using Windows 8, you might see this error:
error: Microsoft Visual C++ 14.0 is required. Get it with “Microsoft Visual C++ Build Tools”: http://landinghub.visualstudio.com/visual-cpp-build-tools
Even though Python is an interpreted language, you may need to install Windows C++ compilers in some cases and this is one of them.
I went ahead to install Microsoft Build Tools via the link provided and discovered that Microsoft Visual Studio 2017 is the version available for download. I had to first download .NET framework 4.5.1 to be able to install Microsoft Visual Studio. After downloading that, the visual Studio installation failed because my OS is not compatible. I needed Windows 8.1 and above.
I searched around the Internet and saw Microsoft Visual Studio 2015 works on Windows 8 so I downloaded it. It is supported on Windows 7 Service Pack 1, Windows Server 2008 R2 SP1, Windows Server 2012, Windows Server 2012 R2, Windows 8 and above Operating Systems.
Trying to install SpaCy once again yielded the same error as my interpreter wasn’t recognizing it. I later found out that Python 3.7 (the version I was using) doesn’t work with Microsoft Visual Studio 2015 anymore.
I needed to downgrade to python 3.5.
Which I did.
At this point, other packages and dependencies started failing because the versions were not compatible with python 3.5. I tried downgrading the packages but errors due to compatibility kept coming up.
My entire app was broken because I tried installing spaCy. I was frustrated.
But I needed it to get the work done.
Then I remembered Anaconda. When installing Anaconda, it is advisable to check the dialog box that asks to add the path to your environment variable. This helps your IDE identify the Anaconda python interpreter easily.
I launched VSCode, waited a few minutes, then opened Command Palette
(Ctrl + Shift + P)
and clicked on
Python: Select Interpreter
I chose the Conda Interpreter (3.6.5) and it loaded.
I created and activated my virtual environment, and tried installing spaCy via conda in the terminal.
>> conda install -c conda-forge spacy
-c: means we’re downloading the library from a channel
conda-forge: is the channel name
spacy: is the package name
Installation of spaCy and all needed dependencies was successful.
The final step to be able to work with spacy is to specify a language. These are called models in spaCy. spaCy v2.0 features new neural models for tagging, parsing and entity recognition. The models have been designed and implemented from scratch specifically for spaCy.
My text to be processed is English so I needed to download the English model. There are 3 sizes for the English language model (small, medium and large indicated as sm, md and lg respectively.)
I downloaded the lg model by
>> pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz
en_core_web_lg is the English model, core for general-purpose model, trained on web text and is large in size. 2.0.0 is the version I downloaded.
After it was installed, I was finally able to import the library and the model, load the model and assign it to a variable in my program code.
import spacy import en_core_web_lg nlp = en_core_web_lg.load()
As the model has been assigned to a variable, to run spaCy on a text saved in variable, text_doc, we run
parsed_text = nlp(text_doc)
I hope this article helps someone out there with a Windows 8 who is having issues installing spaCy or any other python library. You can use the Anaconda distribution to run the installation as there are various channels on the distribution and very many packages available.
To not miss out on any new articles, consider subscribing.