Skip to content
Toggle navigation
P
Projects
G
Groups
S
Snippets
Help
likorn
/
estonian_verbs
This project
Loading...
Sign in
Toggle navigation
Go to a project
Project
Repository
Issues
0
Merge Requests
0
Pipelines
Wiki
Snippets
Members
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Commit
9b866efc
authored
Nov 26, 2018
by
Paktalin
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
corrected regex for occurences search
parent
291fad22
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
2 additions
and
2 deletions
__pycache__/util.cpython-36.pyc
not_found.txt
preprocessing.py
__pycache__/util.cpython-36.pyc
View file @
9b866efc
No preview for this file type
not_found.txt
0 → 100644
View file @
9b866efc
This diff is collapsed.
Click to expand it.
preprocessing.py
View file @
9b866efc
...
@@ -8,7 +8,7 @@ def extract_verbs_occurences_from_articles(verbs, articles):
...
@@ -8,7 +8,7 @@ def extract_verbs_occurences_from_articles(verbs, articles):
for
i
in
tqdm
(
range
(
len
(
verbs
))):
for
i
in
tqdm
(
range
(
len
(
verbs
))):
# finish the pattern
# finish the pattern
pattern
=
'
.*
\
W'
+
verbs
[
8
][
i
]
+
'.*
'
pattern
=
'
^(.*
\
W)*'
+
verbs
[
8
][
i
]
+
'(?!(mi|ja)).*$
'
occurences
=
list
(
set
([
sentence
+
'.'
for
sentence
in
articles
.
split
(
'.'
)
if
re
.
match
(
pattern
,
sentence
)]))
occurences
=
list
(
set
([
sentence
+
'.'
for
sentence
in
articles
.
split
(
'.'
)
if
re
.
match
(
pattern
,
sentence
)]))
verbs
[
'occurences'
][
i
]
=
filter_wrong_occurences
(
verbs
.
iloc
[
i
],
occurences
)
verbs
[
'occurences'
][
i
]
=
filter_wrong_occurences
(
verbs
.
iloc
[
i
],
occurences
)
save_csv
(
verbs
,
"with_approximate_occurences.csv"
)
save_csv
(
verbs
,
"with_approximate_occurences.csv"
)
...
@@ -20,7 +20,7 @@ def filter_wrong_occurences(verb, occurences):
...
@@ -20,7 +20,7 @@ def filter_wrong_occurences(verb, occurences):
for
occurence
in
occurences
:
for
occurence
in
occurences
:
found
=
False
found
=
False
for
form
in
all_forms
:
for
form
in
all_forms
:
pattern
=
'
.*
\
W'
+
form
+
'
\
W.*
'
pattern
=
'
^(.*
\
W)*'
+
form
+
'(
\
W.*)*$
'
if
re
.
match
(
pattern
,
occurence
):
if
re
.
match
(
pattern
,
occurence
):
verified_occurences
.
append
(
occurence
)
verified_occurences
.
append
(
occurence
)
found
=
True
found
=
True
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment