Case #4: Color Image Processing

 

 

The Case

The given in this exercise is a set of images of pigs with green tags shown below. The backgrounds contain various elements and propose challenges in processing. no lighting control was apparently applied. The images were basically taken with the convenience of easily taking then under the busy condition of a pig farm or a market. The input images are shown below.

 

The objective is to produce two images. One is the isolated image of the tag and the other is an image of the outline of the swine.

 

Deductions and Assumptions

Shown below is the pseudo code followed for solving the problem

for each image:
    find rectangular blobs from red channel
    identify rectangular blob with highest green value
    select skin samples around green rectangle
    select all pixels that are similar to the skin samples

The first step is to identify the green tags of the images. The intuitive is to locate the green objects of the image; however, the images showed green patches of grass and weed in the rgb color space. Exploring the hsv color space gives even worse results. In hsv, the tags sometimes a0pper red not green and random patches in the swine skin and background appear green. One solution that was explored was shape detection. It is clear that the tags are polygons with four vertices. the problem now is how to produce the right amount of gradient that will give the clearest edge for OpenCV to detect. To solve this, the red channels of the images were used for edge detection. The green tags would always contain a smaller amount of red component compared to the swine skin thus it would always produce a good edge.

This operation opened up a new problem. Some of the swine images have rough textures in their skin due to hair. These fine textures greatly affect the edge detection, that to some extent, no rectangular polygon were detected. Bilateral filtering was used to reduce the effect of these textures to the edge detection. The compromise is it takes quite longer time to process one image.

Next step is to identify the green tag in the image. The output of the previous step may contain more than one polygon. In order to determine which is the green tag, the color of each polygon was checked knowing that the tag is green and having a polygon in green color is low due to the reduction of unnecessary details in the previous step. Prior to checking, the green channel of the images was enhanced by splitting the RGB channels, normalizing the green channel and recombining the RGB channels. The polygon with the highest green value was selected with the rest discarded. This image now is the isolated green tag. Shown below are the images of the isolated green tags.

 

With the tag identified, the next step is to identify the pixels comprising the swine. In order to do this Backprojection was performed. Backprojection checks the histogram of a sample image section and compares it with sections of the image being tested. this produces a grayscale image with the value of the pixels equivalent to the probability of the pixel being part or same as the sample image.

This approach requires that a sample skin section be selected. Since it is known that all tags are surrounded by skin pixels, the polygon containing the tag was dilated using a rectangular kernel. The green tag was removed from the selected section. Four strips of skin sample bounded between the extreme points of the tag polygon and its dilated blob were extracted. This was done to achieve maximum possible area of skin sample. These samples were used as sample images for Backprojection. Four gray scale images resulted from this and were summed to produce one image containing the most probable skin pixels.

The gray image was thresholded at intensity value of 10. This approach allowed the easy detection of the middle part of the swine, where tags were located, but most of the time fail to recognize the topmost part of the swine, where the pixels have much more intensity compared to the middle. The head was not also not recognized well due to the difference in texture with the body. The legs having usually darker shade due to lighting and manure were also not detected well. Some images showed miss detection of background pixels. These patches had similar color and texture to the skin.

To minimize these errors only the largest continuous blob was selected and its contours were extracted. Among its contours only the parent contour was selected to redraw the blob. This eliminated the problem of patches of skin not detected within it. Unwanted concavities in the blob were solved by performing a closing filter.

After performing all the steps mentioned, the resulting binary swine shapes did not resemble the original images and can only be interpreted well if the originals were superimposed against the binary images. All of the outputs were irregularly shaped. The output images of the swines are shown below.

 

Closing the Case

The failure of the algorithm can be attributed to the condition during the capture of the images. These were captured either on a farm or in livestock auction markets were there were no lighting controls. The lighting conditions were not favorable for image processing wherein there was an uneven light distribution causing an intensity gradient over the subjects. Pr-cleaning of the subjects was not performed causing undetected patches due to dirt and manure. Backgrounds contained various elements where some were quite similar to the texture and color of the subjects casing miss detected parches. Lastly the subjects were not properly arranged in a fashion that allows easy recognition of the subjects’ shape.

The output of this method can be further improved given the type and quality of input by tweaking the values used for the backprojection and using an iterative method of determining skin color whereby another round of skin detection will be performed. Another sample detection would be performed using the determined skin of the previous iteration and extract larger areas of skin and use it for another backprojection. This would result to a better output with reduced miss detection and increased skin area detected.

 

Case#3: Using the Tesseract OCR Engine

The Case

Tesseract OCR engine is an open-source document reader. Its development was initiated by HP Research in 1984 and showed leverage against its competitors during its time. After releasing the its source code to the public, some changes were made which showed improvement to its performance. The task at hand is to assess its performance in extracting text from machine printed document images with varying font style, text alignment and content and hand written document images.

Deductions and Assumptions

Tesseract was tested against documents with machine printed text and hand written text. The texts used for the machine printed documents were derived from an unpublished paper entitled “Design and Economic Evaluation of Suitable Farm Mechanization Plan for Corn Production”. Three pages were used from the paper wherein the font and alignment were varied.

A quick way to assess the accuracy of tesseract was devised for the task. This was done by comparing the number of word populations of the text file generated by the OCR against the actual word population of the document image. Any deviation from the actual word population will be credited as an error. While this method is prone to detection of false positive results, the code used was simple and fast to execute. Strict accuracy on the determination of error in the processing is not needed since the goal is just to compare the results among the configurations mentioned and a quantitativeanalysis of the accuracy was not needed.

Machine Printed Text

The result of the analysis of the document image are summarized in the table below.

Font Type

Front Style % Error
FONT ALIGNMENT

WITH PICTURE

LEFT

JUSTIFIED

Serif Bell MT        1.92    4.07            1.63  –
Bodoni MT        1.28    0.81            2.17  –
Times New Roman        2.24    1.63            0.81           6.03
San Serif Arial        1.28    1.63            1.63           9.05
Berlin San FB        4.47    2.17            2.71  –
Calibri        0.96    1.63            0.54  –
Decorative Ravie      60.43  –  –  –
Script Brush Script MT    100.00  –  –  –

 

Serif vs San Serif

The difference between serif and san serif is the presence of extensions in the ends and vertices of the characters in the former. Three styles under serif and san serif were tested against tesseract. Serif font styles were Bell MT, Bodoni MT and Times New Roman while San Serif font styles were Arial, Berlin Sans FB, and Calibri.

The test document images are shown below. After processing these documents in Tesseract, the application produced errors of 1.92%, 1.28%, and 2.24% for Bell MT, Bodoni MT and Times New Roman, respectively, and 1.28%, 4.47% and 0.96% for Arial, Berlin Sans FB and Calibri. On the average, the two font families were processed relatively accurate. While San Serif Family produced a higher average error with Berlin Sans FB showing the highest error at 4.47%, it also produced the lowers at 0.96% for Calibri. Serif Family has a more or less consistent amount of error in the analysis.

A notable error for this document is the failure of Tesseract to include the last lines of each paragraph to the paragraph body. Also, super scripts tend to be reduced to mere apostrophes and quotation marks. Such can be produced by the lack of ability of the OCR to identify characters that do not cover the boundary lines completely. The input documents and results of the analysis are shown below.

Bell MT:

serif_bell_mt

I. INTRODUCTION
 Background of the Study
Corn (Zea mays), specifically yellow corn, is one of the most important crops of the
 country. It contributed 6.3% (Php100,629.40) to the total value of production in agriculture in
 the country in 2014. This value ranks 43" among the main types of crops grown in the
 Philippines ,which includes rice (28.5%), banana (8.1%) and coconut (6.5%) (Philippine Statistics
 Authority, 2015). This share is due to the fact that this crop is used in various industries. One
 of largest industry that uses yellow corn is the livestock industry. Yellow corn is one of the
 primary ingredient used in animal feeds. Animal feeds contain about 30% to 60% corn. Corn
 basically serves as energy source for animals and the percentage ofcorn in the animal feed is
 determined by the age of animal being fed. It is considered indispensable for poultry feed
 manufacturers due to its beta carotene content. This compound imparts the distinct yellowish
 color in chicken skin and egg yolk which is one important desirable characteristic that the
market considers.
Aside from being used as animal feed, yellow corn is also used for human consumption.
 Yellow corn is milled and marketed as yellow corn grits for direct human consumption. This
 serves as cheap alternative to rice and white corn grits in the Visayas. Yellow corn is also
 processed to produce other food items like starch, noodles, corn snacks, beer beverages,
 sweeteners and cooking oil (del Rosario, Borja, 8: Relieve, 2015). The wide variety of food
 products derived from corn implies the importance of the crop in the overall food industry of
 the country. This means that there is a need for a stable supply to meet the demand of the
 market. There will always be a demand for yellow corn. Investing in the venture of corn
farming, given the proper technology and environmental conditions, would always profit.

Bodoni MT:

serif_bodoni_mt

I. INTRODUCTION
 Background of the Study
Corn (Zea mays), specifically yellow corn, is one of the most important crops of the
 country. It contributed 6.3% (Php100,629.40) to the total value of production in agriculture
 in the country in 2014. This value ranks 43" among the main types of crops grown in the
 Philippines .which includes rice (23.5%), banana (8.1%) and coconut (6.5%) (Philippine
 Statistics Authority. 2015). This share is due to the fact that this crop is used in various
 industries. One of largest industry that uses yellow corn is the livestock industry. Yellow corn
 is one of the primary ingredient used in animal feeds. Animal feeds contain about 30% to
 60% corn. Corn basically serves as energy source for animals and the percentage of corn in
 the animal feed is determined by the age of animal being fed. It is considered indispensable
 for poultry feed manufacturers due to its beta carotene content. This compound imparts the
 distinct yellowish color in chicken skin and egg yolk which is one important desirable
characteristic that the market considers.
Aside from being used as animal feed, yellow corn is also used for human
 consumption. Yellow corn is milled and marketed as yellow corn grits for direct human
 consumption. This serves as cheap alternative to rice and white corn grits in the Visayas.
 Yellow corn is also processed to produce other food items like starch, noodles. corn snacks,
 beer beverages, sweeteners and cooking oil (del Rosario, Borja, & Relleve, 2015). The wide
 variety of food products derived from corn implies the importance of the crop in the overall
 food industry of the country. This means that there is a need for a stable supply to meet the
 demand of the market. There will always be a demand for yellow corn. Investing in the
 venture of corn farming, given the proper technology and environmental conditions, would
always profit.

Times New Roman:

serif_times_new_roman

I. INTRODUCTION
 Background of the Study
Corn (Zea mays), specifically yellow corn, is one of the most important crops of the
 country. It contributed 6.3% (Php100,629.40) to the total value of production in agriculture in the
 country in 2014. This value ranks 4'h among the main types of crops grown in the Philippines
 ,which includes rice (23.5%), banana (8.1%) and coconut (6.5%) (Philippine Statistics Authority,
 2015). This share is due to the fact that this crop is used in various industries. One of largest
 industry that uses yellow corn is the livestock industry. Yellow corn is one of the primary
 ingredient used in animal feeds Animal feeds contain about 30% to 60% com. Corn basically
 serves as energy source for animals and the percentage of corn in the animal feed is determined
 by the age of animal being fed. It is considered indispensable for poultry feed manufacturers due
 to its beta carotene content. This compound imparts the distinct yellowish color in chicken skin
and egg yolk which is one important desirable characteristic that the market considers.
Aside from being used as animal feed, yellow com is also used for human consumption.
 Yellow corn is milled and marketed as yellow corn grits for direct human consumption. This
 serves as cheap alternative to rice and white corn grits in the Visayas. Yellow corn is also
 processed to produce other food items like starch, noodles, corn snacks, beer beverages,
 sweeteners and cooking oil (del Rosario, Borja, & Relleve, 2015). The wide variety of food
 products derived from com implies the importance of the crop in the overall food industry of the
 country. This means that there is a need for a stable supply to meet the demand of the market.
 There will always be a demand for yellow com. Investing in the venture of corn farming, given
the proper technology and environmental conditions, would always profit.

Arial:

san_serif_arial

I. INTRODUCTION
 Background of the Study
Corn (Zea mays), specifically yellow corn, is one of the most important crops of
 the country. It contributed 6.3% (Php100,629.40) to the total value of production in
 agriculture in the country in 2014. This value ranks 4'“ among the main types of crops
 grown in the Philippines ,which includes rice (23.5%), banana (8.1%) and coconut
 (6.5%) (Philippine Statistics Authority, 2015). This share is due to the fact that this crop
 is used in various industries. One of largest industry that uses yellow corn is the
 livestock industry. Yellow corn is one of the primary ingredient used in animal feeds.
 Animal feeds contain about 30% to 60% com. Corn basically serves as energy source
 for animals and the percentage of corn in the animal feed is determined by the age of
 animal being fed. It is considered indispensable for poultry feed manufacturers due to its
 beta carotene content. This compound imparts the distinct yellowish color in chicken
 skin and egg yolk which is one important desirable characteristic that the market
 considers.
Aside from being used as animal feed, yellow corn is also used for human
 consumption. Yellow corn is milled and marketed as yellow corn grits for direct human
 consumption. This serves as cheap alternative to rice and white corn grits in the
 Visayas. Yellow corn is also processed to produce other food items like starch, noodles,
 corn snacks, beer beverages, sweeteners and cooking oil (del Rosario, Borja, &
 Relieve, 2015). The wide variety of food products derived from corn implies the
 importance of the crop in the overall food industry of the country. This means that there
 is a need for a stable supply to meet the demand of the market. There will always be a
 demand for yellow oorn. Investing in the venture of corn farming, given the proper
 technology and environmental conditions, would always profit.

Berlin San FB:

san_serif_berlin_san_fb

L INTRODUCTION
Wound of the filly
Corn (Zea mays). specifically yellow corn, is one of the most important crops of the
 country. It contributed 6.3% (Php100,629.40) to the total value of production in agriculture in
 the country in 2014. This value ranks 4‘“ among the main types of crops grown in the
 Philippines ,which includes rice (23.5%), banana (8.1%) and coconut (6.5%) (Philippine Statistic
 Authority, 2015). This share is due to the fact that this crop is used in various industries. One of
 largest industry that uses yellow corn is the livestock industry. Yellow corn is one of the primary
 ingredient used in animal feeds. Animal feeds contain about 30% to 60% com. Corn basically
 serves as energy source for animals and the percentage of corn in the animal feed is
 determined by the age of animal being fed. It is considered indispensable for poultry feed
 manufacturers due to its beta carotene content. This compound imparts the distinct yellowish
 color in chicken skin and egg yolk which is one important desirable characteristic that the
 market considers.
Aside from being used as animal feed, yellow corn is also used for human consumption.
 Yellow corn is milled and marketed as yellow corn grits for direct human consumption. This
 serves as cheap altemative to rice and white corn grits in the Visayas. Yellow corn is also
 processed to produce other food items like starch, noodles, com snacks, beer beverages.
 sweeteners and cooking oil (del Rosario, Borja, 8: Relieve, 2015). The wide variety of food
 products derived from corn implies the importance of the crop in the overall food industry of
 the country. This means that there is a need for a stable supply to med: the demand of the
 market. There will always be a demand for yellow com. Investing in the venture of corn
 farming, given the proper technology and environmental conditions, would always profit.

Calibri:

san_serif_calibri

I. INTRODUCTION
 Background of the Study
Corn (Zea mays), specifically yellow corn, is one of the most important crops of the
 country. It contributed 6.3% (Php100,629.40) to the total value of production in agriculture in
 the country in 2014. This value ranks 4'h among the main types of crops grown in the
 Philippines ,which includes rice (23.5%), banana (8.1%) and coconut (6.5%) (Philippine Statistics
 Authority, 2015). This share is due to the fact that this crop is used in various industries. One of
 largest industry that uses yellow corn is the livestock industry. Yellow corn is one of the primary
 ingredient used in animal feeds. Animal feeds contain about 30% to 60% com. Corn basically
 serves as energy source for animals and the percentage of corn in the animal feed is
 determined by the age of animal being fed. It is considered indispensable for poultry feed
 manufacturers due to its beta carotene content. This compound imparts the distinct yellowish
 color in chicken skin and egg yolk which is one important desirable characteristic that the
market considers.
Aside from being used as animal feed, yellow corn is also used for human consumption.
 Yellow corn is milled and marketed as yellow corn grits for direct human consumption. This
 serves as cheap alternative to rice and white corn grits in the Visayas. Yellow corn is also
 processed to produce other food items like starch, noodles, corn snacks, beer beverages,
 sweeteners and cooking oil (del Rosario, Borja, & Relieve, 2015). The wide variety of food
 products derived from corn implies the importance of the crop in the overall food industry of
 the country. This means that there is a need for a stable supply to meet the demand of the
 market. There will always be a demand for yellow corn. Investing in the venture of corn
farming, given the proper technology and environmental conditions, would always profit.

Left Aligned Vs Justified

The difference between left aligned and justified is the spacing between words. Left Aligned texts has a consistent word spacing while Justified has a varying word spacing that depends on the number of words and word sizes in a line. Theoretically there would be no much difference between the analysis of the two under tesseract since tesseract uses a fixed spacing in order to differentiate characters from each other. Varying the word spacing would not have an effect on the recognition of the words.

After analyzing 3 serif and 3 sans serif documents with left aligned and justified paragraphs, it was discovered that there was a difference between the errors of left aligned and justified documents. A consistent pattern on the change of accuracy cannot be determined. Some font styles showed better results in the left aligned while other in the justified versions. Bell MT in left alignment had the worst accuracy but had at 4.47% error but produced a relatively better result for its justified version with ab error of 1.63%. Bodoni had the best accuracy for left aligned documents with 0.81% error which was degraded to 2.17% error when the justified version of the document was analyzed. The worst for the justified documents was the one with Berlin Sans FB font at 2.71%, however if the percent error of the left aligned document with the same font is to be compared with it, there would be not much difference with it having an error of 2.17%. Calibri gave the best result for the Justified documents (0.54%). Arial Showed a consistent error count for the two alignments. The images used and results are shown below.

Bell MT:

JUSTIFIED:
I. RESULTS AND DISCUSSION
 Farm Layout
It was assumed that the QOO-ha farm has a rectangular shape with a total length onOOOm
 and total width of 1000m as shown in Figure 1. The office, warehouse, motor pool and drying
 facilities were placed in front of the farm to minimize the need ofinstalling utility lines within
 the farm and easy access to the office buildings. One sub—shed was placed 1km from the motor
 pool. This serves as temporary storage ofmachineries and equipment during operations to reduce
 the travel time needed by machines to reach the site of operation. The area allotted for the office,
warehouse, motor pool and drying facilities and sub-shed was 0.75 ha and 0.18 ha, respectively.
The farm contains two-lane and one-lane roads with widths equal to 6.5m and 4.5m,
 respectively. The dimensions were based on PAES V1 with additional 0.25m on each side to give
 room for road cliches. A two-lane road bisects the farm and runs along the length of the fam
 This serves as the main road that are used by the machineries. Three equally spaced roads were
 set. These roads are two—lane roads and spaced at 500m from each other. They serve as secondary
 roads that are used to travel from the main road to the tertiary roads that lead to the field
 divisions. The tertiary roads are one-lane roads that run along the length of the farm. Four of
 these were placed in the farm. Two were placed in the opposite edges of the farm and another
 two were placed 500 km from both sides of the main road. This road network divided the farm
 into a 4x4 matrix. Each of the cells are further subdivided by 0.5m dividers into an 8x2 matrix.
These dividers contain the levee and necessary canals for irrigation and drainage.
A total of256 subdivisions are present in the farm. Each subdivision measures 121.69m
 x 61.36m. For each subdivision, a 4m width is allotted as headland. The farm has a total of 188.96
 ha of productive area. This means that only 8.02% of the total area is being utilized by office
 buildings, warehouse, motor pool, sheds, drying facility, road networks, irrigation canals and
levee.
LEFT ALIGNED:
I. RESULTS AND DISCUSSION
 Farm Layout
It was assumed that the QOO-ha farm has a rectangular shape with a total length of
 2000m and total width of 1000m as shown in Figure l. The office, warehouse, motor pool and
 drying facilities were placed in front of the farm to minimize the need of installing utility lines
 within the farm and easy access to the office buildings. One sub-shed was placed 1km from the
 motor pool. This serves as temporary storage of machineries and equipment during operations
 to reduce the travel time needed by machines to reach the site of operation. The area allotted
 for the office, warehouse, motor pool and drying facilities and sub-shed was 0.75 ha and 0.18 ha,
respectively.
The farm contains two-lane and one-lane roads with widths equal to 6.5m and 4.13m,
 respectively. The dimensions were based on PAES VI with additional 0.25m on each side to
 give room for road cliches. A two-lane road bisects the farm and runs along the length of the
 farm. This serves as the main road that are used by the machineries. Three equally spaced roads
 were set. These roads are two-lane roads and spaced at 500m from each other. They serve as
 secondary roads that are used to travel from the main road to the tertiary roads that lead to the
 field divisions. The tertiary roads are one-lane roads that run along the length of the farm.
 Four of these were placed in the farm. Two were placed in the opposite edges of the farm and
 another two were placed 500 km from both sides of the main road. This road network divided
 the farm into a 4x4 matrix. Each of the cells are further subdivided by 0.5m dividers into an
8x2 matrix. These dividers contain the levee and necessary canals for irrigation and drainage.
A total of 256 subdivisions are present in the farm. Each subdivision measures 121.69m
 x 61.36m. For each subdivision, a 4m width is allotted as headland. The farm has a total of
 183.96 ha of productive area. This means that only 8.02% of the total area is being utilized by
 office buildings, warehouse, motor pool. sheds, drying facility, road networks, irrigation canals
and levee.

Bodoni MT:

JUSTIFIED:
1. RESULTS AND DISCUSSION
 Farm Layout
It was assumed that the ZOO-ha farm has a rectangular shape with a total length of
 2000m and total width of 1000m as shown in Figure l. The office, warehouse, motor pool and
 drying facilities were placed in front of the farm to minimize the need of installing utility lines
 within the farm and easy access to the office buildings. One sub-shed was placed 1km from the
 motor pool. This serves as temporary storage of machineries and equipment during operations
 to reduce the travel time needed by machines to reach the site of operation. The area allotted
 for the office, warehouse, motor pool and drying facilities and sub-shed was 0.75 ha and 0.18
ha, respectively.
The farm contains two-lane and one-lane roads with widths equal to 6.5m and 4.5m,
 respectively. The dimensions were based on PAES VI with additional 0.25m on each side to
 give room for road diches. A two-lane road bisects the farm and runs along the length of the
 farm. This serves as the main road that are used by the machineries. Three equally spaced
 roads were set. These roads are two-lane roads and spaced at 500m from each other. They serve
 as secondary roads that are used to travel from the main road to the tertiary roads that lead
 to the field divisions. The tertiary roads are one-lane roads that run along the length of the
 farm. Four of these were placed in the farm. Two were placed in the opposite edges of the farm
 and another two were placed 500 km from both sides of the main road. This road network
 divided the farm into a 4x4 matrix. Each of the cells are further subdivided by 0.5m dividers
 into an 8x2 matrix. These dividers contain the levee and necessary canals for irrigation and
drainage.
A total of 256 subdivisions are present in the farm. Each subdivision measures 121.69m
 x 61.36m. For each subdivision, a 4m width is allotted as headland. The farm has a total of
 133.96 ha of productive area. This means that only 8.02% of the total area is being utilized by
 office buildings, warehouse, motor pool, sheds, drying facility, road networks, irrigation canals
and levee.
LEFT ALIGNED:
I. RESULTS AND DISCUSSION
 Farm Layout
It was assumed that the ZOO-ha farm has a rectangular shape with a total length of
 2000111 and total width of 1000m as shown in Figure l. The office. warehouse, motor pool and
 drying facilities were placed in front of the farm to minimize the need of installing utility
 lines within the farm and easy access to the office buildings. One sub-shed was placed 1km
 from the motor pool. This serves as temporary storage of machineries and equipment during
 operations to reduce the travel time needed by machines to reach the site of operation. The
 area allotted for the office, warehouse, motor pool and drying facilities and sub-shed was 0.75
ha and 0.18 ha, respectively.
The farm contains two-lane and one-lane roads with widths equal to 6.5m and 4.5m,
 respectively. The dimensions were based on PAES VI with additional 0.25m on each side to
 give room for road dishes. A two-lane road bisects the farm and runs along the length of the
 farm. This serves as the main road that are used by the machineries. Three equally spaced
 roads were set. These roads are two-lane roads and spaced at 500m from each other. They
 serve as secondary roads that are used to travel from the main road to the tertiary roads that
 lead to the field divisions. The tertiary roads are one-lane roads that run along the length of
 the farm. Four of these were placed in the farm. Two were placed in the opposite edges of the
 farm and another two were placed 500 km from both sides of the main road. This road
 network divided the farm into a 4x4 matrix. Each of the cells are further subdivided by 0.5m
 dividers into an 8x2 matrix. These dividers contain the levee and necessary canals for
irrigation and drainage.
A total of 256 subdivisions are present in the farm. Each subdivision measures
 121.69m x 61.36m. For each subdivision, a 4m width is allotted as headland. The farm has a
 total of 183.96 ha of productive area. This means that only 8.02% of the total area is being
 utilized by office buildings, warehouse, motor pool. sheds, drying facility, road networks.
irrigation canals and levee.

Times New Roman:

 

JUSTIFIED:
I. RESULTS AND DISCUSSION
 Farm Layout
It was assumed that the ZOO-ha farm has a rectangular shape with a total length of 2000m
 and total width of 1000m as shown in Figure l. The office, warehouse, motor pool and drying
 facilities were placed in front of the farm to minimize the need of installing utility lines within the
 farm and easy access to the office buildings. One sub-shed was placed 1km from the motor pool,
 This serves as temporary storage of machineries and equipment during operations to reduce the
 travel time needed by machines to reach the site of operation. The area allotted for the office,
warehouse, motor pool and drying facilities and sub-shed was 0.75 ha and 0.18 ha, respectively.
The farm contains two-lane and one-lane roads with widths equal to 6.5m and 4.5m,
 respectively. The dimensions were based on PAES V1 with additional 0.25m on each side to give
 room for road diches. A two-lane road bisects the farm and runs along the length of the farm. This
 serves as the main road that are used by the machineries. Three equally spaced roads were set.
 These roads are two-lane roads and spaced at 500m from each other. They serve as secondary
 roads that are used to travel from the main road to the tertiary roads that lead to the field divisions.
 The tertiary roads are one-lane roads that run along the length of the farm. Four of these were
 placed in the farm. Two were placed in the opposite edges of the farm and another two were placed
 500 km from both sides of the main road. This road network divided the farm into a 4x4 matrix.
 Each of the cells are further subdivided by 0.5m dividers into an 8x2 matrix. These dividers contain
the levee and necessary canals for irrigation and drainage.
A total of 256 subdivisions are present in the farm. Each subdivision measures 121.69m x
 61 .36m. For each subdivision, a 4m width is allotted as headland. The farm has a total of 183.96
 ha of productive area. This means that only 8.02% of the total area is being utilized by office
 buildings, warehouse, motor pool, sheds, drying facility, road networks, irrigation canals and
levee.
LEFT ALIGNED:
1. RESULTS AND DISCUSSION
 Farm Layout
It was assumed that the 200-ha farm has a rectangular shape with a total length of 2000m
 and total width of 1000m as shown in Figure 1. The office, warehouse, motor pool and drying
 facilities were placed in front of the farm to minimize the need of installing utility lines within
 the farm and easy access to the office buildings. One sub-shed was placed 1km from the motor
 pool. This serves as temporary storage of machineries and equipment during operations to reduce
 the travel time needed by machines to reach the site of operation. The area allotted for the Mike,
warehouse, motor pool and drying facilities and sub-shed was 0.75 ha and 0.18 ha, respectively.
The farm contains two-lane and one-lane roads with widths equal to 6.5m and 4.5m,
 respectively. The dimensions were based on PAES VI with additional 0.25m on each side to give
 room for road diches. A two-lane road bisects the farm and runs along the length of the farm.
 This serves as the main road that are used by the machineries. Three equally spaced roads were
 set. These roads are two-lane roads and spaced at 500m from each other. They serve as
 secondary roads that are used to travel from the main road to the tertiary roads that lead to the
 field divisions. The tertiary roads are one-lane roads that run along the length of the farm. Four
 of these were placed in the farm. Two were placed in the opposite edges of the farm and another
 two were placed 500 km from both sides of the main road. This road network divided the farm
 into a 4x4 matrix. Each of the cells are further subdivided by 0.5m dividers into an 8x2 matrix.
These dividers contain the levee and necessary canals for irrigation and drainage.
A total of 256 subdivisions are present in the farm. Each subdivision measures 121.69m x
 61.36m. For each subdivision, a 4m width is allotted as headland. The farm has a total of 183.96
 ha of productive area. This means that only 8.02% of the total area is being utilized by office
 buildings, warehouse, motor pool, sheds, drying facility, road networks, irrigation canals and
levee.

Arial:

JUSTIFIED:
l. RESULTS AND DISCUSSION
 Farm Layout
It was assumed that the ZOO-ha farm has a rectangular shape with a total length
 of 2000m and total width of 1000m as shown in Figure 1. The office, warehouse, motor
 pool and drying facilities were placed in front of the farm to minimize the need of installing
 utility lines within the farm and easy access to the office buildings. One sub-shed was
 placed 1km from the motor pool. This serves as temporary storage of machineries and
 equipment during operations to reduce the travel time needed by machines to reach the
 site of operation. The area allotted for the office, warehouse, motor pool and drying
 facilities and sub—shed was 0.75 ha and 0.18 ha, respectively.
The farm contains two-lane and one-lane roads with widths equal to 6.5m and
 4.5m, respectively. The dimensions were based on PAES Vi with additional 0.25m on
 each side to give room for road diches. A two—lane road bisects the farm and runs along
 the length of the farm. This serves as the main road that are used by the machineries.
 Three equally spaced roads were set. These roads are two-lane roads and spaced at
 500m from each other. They serve as secondary roads that are used to travel from the
 main road to the tertiary roads that lead to the field divisions. The tertiary roads are one-
 lane roads that run along the length of the farm. Four of these were placed in the farm.
 Two were placed in the opposite edges of the farm and another two were placed 500 km
 from both sides of the main road. This road network divided the farm into a 4x4 matrix,
 Each of the cells are further subdivided by 0.5m dividers into an 8x2 matrix. These
 dividers contain the levee and necessary canals for irrigation and drainage.
A total of 256 subdivisions are present in the farm. Each subdivision measures
 121.69m x 61.36m. For each subdivision, a 4m width is allotted as headland. The farm
 has a total of 183.96 ha of productive area. This means that only 8.02% of the total area
 is being utilized by office buildings, warehouse, motor pool, sheds, drying facility, road
 networks, irrigation canals and levee.
LEFT ALIGNED:
I. RESULTS AND DISCUSSION
 Farm Layout
it was assumed that the 200-ha farm has a rectangular shape with a total length
 of 2000m and total width of 1000m as shown in Figure 1. The office, warehouse, motor
 pool and drying facilities were placed in front of the farm to minimize the need of
 installing utility lines within the farm and easy access to the office buildings. One sub-
 shed was placed 1km from the motor pool. This serves as temporary storage of
 machineries and equipment during operations to reduce the travel time needed by
 machines to reach the site of operation. The area allotted for the office, warehouse,
 motor pool and drying facilities and sub-shed was 0.75 ha and 0.18 ha, respectively.
The farm contains two-lane and one-lane roads with widths equal to 6.5m and
 4.5m, respectively. The dimensions were based on PAES VI with additional 0.25m on
 each side to give room for road diches. A two-lane road bisects the farm and runs along
 the length of the farm. This serves as the main road that are used by the machineries.
 Three equally spaced roads were set. These roads are two-lane roads and spaced at
 500m from each other. They serve as secondary roads that are used to travel from the
 main road to the tertiary roads that lead to the field divisions. The tertiary roads are one-
 lane roads that run along the length of the farm. Four of these were placed in the farm.
 Two were placed in the opposite edges of the farm and another two were placed 500
 km from both sides of the main road. This road network divided the farm into a 4x4
 matrix. Each of the cells are further subdivided by 0.5m dividers into an 8x2 matrix.
 These dividers contain the levee and necessary canals for irrigation and drainage.
A total of 256 subdivisions are present in the farm. Each subdivision measures
 121 .69m x 61.36m. For each subdivision, a 4m width is allotted as headland. The farm
 has a total of 183.96 ha of productive area. This means that only 8.02% of the total area
 is being utilized by office buildings. warehouse, motor pool, sheds, drying facility, road
 networks. irrigation canals and levee.

Berlin San FB:

JUSTIFIED:
I. REflIIJ: AND DISCIISHON
 Farm Layout
It was assumed that the zoo-ha farm has a rectangular shape with a total length of
 2000m and total width of 1000m as shown in Figure 1. The office, warehouse, motor pool and
 drying facilities were placed in front of the farm to minimize the need of installing utility lines
 within the farm and easy access to the office buildings. One sub-shed was placed lhm from the
 motor pool. This serves as temporary storage of machineries and equipment during operations
 to reduce the travel time needed by machines to reach the site of operation. The area allotted
 for the office, warehouse, motor pool and drying facilities and sub-shed was 0.75 ha and 0.18
 ha, respectively.
The form contains two-lane and one-lane roads with widths equal to 6.5m and 4.5m,
 respectively. The dimensions were based on PAES Vl with additional 0.25m on each side to give
 room for road diches. A two-lane road bisects the farm and runs along the length of the farm.
 This serves as the main road that are used by the machineries. Three equally spaced roads were
 set. These roads are two-lane roads and spaced at 500m from each other. They serve as
 secondary roads that are used to travel from the main road to the tertiary roads that lead to
 the field divisions. The tertiary roads are one-lane roads that run along the length of the farm.
 Four of these were placed in the farm. Two were placed in the opposite edges of the farm and
 another two were placed 500 km from both sides of the main road. This road network divided
 the form into a 4x4 matrix. Each of the cells are further subdivided by 0.5m dividers into an m
 matrix. These dividers contain the levee and necessary canals for irrigation and drainage.
A total of 256 subdivisions are present in the farm. Each subdivision measures 121.69m x
 61.36m. For each subdivision, a 4m width is allotted as headland. The farm has a total of 183.96
 ha of productive area. This means that only 8.02% of the total area is being utilized by office
 buildings, warehouse, motor pool, sheds, drying facility, road networks, irrigation canals and
levee.
LEFT ALIGNED:
I. mun: All) DlstOION
 lam: Lao-II
It was assumed that the zoo-ha farm has a rectangular shape with a total length of
 2000m and total width of 1000m as shown in Figure 1. The office, warehouse. motor pool and
 drying facilities were placed in front of the farm to minimize the need of installing utility lines
 within the farm and easy access to the office buildings. One sub-shed was placed 1km from the
 motor pool. This serves as temporary storage of machineries and equipment during operations
 to reduce the travel time needed by machines to reach the site of operation. The area allotted
 for the office, warehouse, motor pool and drying facilities and sub-shed was 0.75 ha and 0.18
 ha. respectively.
The farm contains two-lane and one-lane roads with widths equal to 6.5m and 4.5m,
 respectively. The dimensions were based on PAES Vl with additional 0.25m on each side to
 give room for road diches. A two-lane road bisects the farm and runs along the length of the
 farm. This serves as the main road that are used by the machineries. Three equally spaced
 roads were set. These roads are two-lane roads and spaced at 500m from each other. They
 serve as secondary roads that are used to travel from the main road to the tertiary roads that
 lead to the field divisions. The tertiary roads are one-lane roads that run along the length of
 the farm. Four of these were placed in the farm. Two were placed in the opposite edges of the
 farm and another two were placed 500 km from both sides of the main road. This road
 network divided the farm into a 4x4 matrix. Each of the cells are further subdivided by 0.5m
 dividers into an M matrix. These dividers contain the levee and necessary canals for irrigation
 and drainage.
A total of 256 subdivisions are present in the farm. Each subdivision measures 121.69m x
 61.36m. For each subdivision, a 4m width is allotted as headland. The farm has a total of
 183.96 ha of productive area. This means that only 8.02% of the total area is being utilized by
 office buildings, warehouse, motor pool, sheds, drying facility, road networks, irrigation canals
 and levee.

Calibri:

JUSTIFIED:
l. RESULTS AND DISCUSSION
 Farm Layout
it was assumed that the ZOO-ha farm has a rectangular shape with a total length onOOOm
 and total width of 1000m as shown in Figure 1. The office, warehouse, motor pool and drying
 facilities were placed in front of the farm to minimize the need of installing utility lines within the
 farm and easy access to the office buildings. One sub-shed was placed 1km from the motor pool.
 This serves as temporary storage of machineries and equipment during operations to reduce the
 travel time needed by machines to reach the site of operation. The area allotted for the office,
warehouse, motor pool and drying facilities and sub-shed was 0.75 ha and 0.18 ha, respectively.
The farm contains two-lane and one-lane roads with widths equal to 6.5m and 4.5m,
 respectively. The dimensions were based on PAES Vl with additional 0.25m on each side to give
 room for road diches. A two-lane road bisects the farm and runs along the length of the farm.
 This serves as the main road that are used by the machineries. Three equally spaced roads were
 set. These roads are two-lane roads and spaced at 500m from each other. They serve as
 secondary roads that are used to travel from the main road to the tertiary roads that lead to the
 field divisions. The tertiary roads are one—lane roads that run along the length of the farm. Four
 of these were placed in the farm. Two were placed in the opposite edges of the farm and another
 two were placed 500 km from both sides of the main road. This road network divided the farm
 into a 4x4 matrix. Each of the cells are further subdivided by 0.5m dividers into an 8x2 matrix.
These dividers contain the levee and necessary canals for irrigation and drainage.
A total of 256 subdivisions are present in the farm. Each subdivision measures 121.69m x
 61.36m. For each subdivision, a 4m width is allotted as headland. The farm has a total of 183.96
 ha of productive area. This means that only 8.02% of the total area is being utilized by office
 buildings, warehouse, motor pool, sheds, drying facility, road networks, irrigation canals and
levee.
LEFT ALIGNED:
l. RESULTS AND DISCUSSION
 Farm Layout
It was assumed that the ZOO-ha farm has a rectangular shape with a total length of
 2000m and total width of 1000m as shown in Figure 1. The office, warehouse, motor pool and
 drying facilities were placed in front of the farm to minimize the need of installing utility lines
 within the farm and easy access to the office buildings. One sub-shed was placed 1km from the
 motor pool. This serves as temporary storage of machineries and equipment during operations
 to reduce the travel time needed by machines to reach the site of operation. The area allotted
 for the office, warehouse, motor pool and drying facilities and sub-shed was 0.75 ha and 0.18
ha, respectively.
The farm contains two-lane and one-lane roads with widths equal to 6.5m and 4.5m,
 respectively. The dimensions were based on PAES VI with additional 0.25m on each side to give
 room for road diches. A two-lane road bisects the farm and runs along the length of the farm.
 This serves as the main road that are used by the machineries. Three equally spaced roads were
 set. These roads are two-lane roads and spaced at 500m from each other. They serve as
 secondary roads that are used to travel from the main road to the tertiary roads that lead to
 the field divisions. The tertiary roads are one-lane roads that run along the length of the farm.
 Four of these were placed in the farm. Two were placed in the opposite edges of the farm and
 another two were placed 500 km from both sides of the main road. This road network divided
 the farm into a 4x4 matrix. Each of the cells are further subdivided by 0.5m dividers into an 8x2
matrix. These dividers contain the levee and necessary canals for irrigation and drainage.
A total of 256 subdivisions are present in the farm. Each subdivision measures 121.69m
 x 61.36m. For each subdivision, a 4m width is allotted as headland. The farm has a total of
 183.96 ha of productive area. This means that only 8.02% of the total area is being utilized by
 office buildings, warehouse, motor pool, sheds, drying facility, road networks, irrigation canals
and levee.

Notable errors remain to be Tesseracts failure to include the last lines of paragraphs in the main body of each paragraph. It tends to separate the last lines and always creates a new paragraph from the line. Such is most likely due to the lack of document analysis algorithm in the application. Tesseract has its own simple method of determining document layout. Moreover, the program has a difficulty in reading numbers and differentiating numbers from letters.

Decorative and Script

Ravie and Brush Script MT fonts were applied to a document image and were analyzed using Tesseract. The results exhibited poor performance of the OCR for decorative and script fonts. Ravie font document was analyzed and gave an error of 60% while Brush script gave 100% error. This basically shows that Tesseract has a difficulty in processing decorative fonts. This is most likely due to the fact that Tesseract uses polygonal transformation of blobs in the character identification. The irregular shapes of the character in Ravie makes it difficult to extract the characters. The characters in Ravie are distorted and vary greatly from the typical serif and san serif fonts that were initially tested. This also goes for the Brush script. The difference is that the characters in a script font are designed to have more curves and lacks the traditional shapes of a machine printed text and resembles more the design of hand written characters. Tesseract is totally unreliable in reading script fonts. The images and resulting output are shown below.

Ravie:

SP_decorative_ravie

1. 238111568 Ann Discussion
 Farm Layout
Itmasassmnedthattheaoc-hafiarmhasareetanaulorshape mitha
 totallenathotaeoonzandtotalmidthot‘lmmasshomninfiam-el.the
 oflfice. warehouse. motor pool and drains {facilities were placed in Front o?
 the farm to minimise the need o? installing utility lines within the (farm and
 easy access to the oflh‘ce buildings. One sub-shed was placed 1km #rom the
 motorpool.thisserVesastempormstoraaeo9maehmeriesndeqfirment
 din-ins operations to reduce the travel time needed by machines to reach the
 site o? operation. the area allotted ¢or the ofl'ice. warehouse. motor pool and
 drying cacilities and sub-shed was 0.75 ha and 0.18 ha. respectively.
the tat-m contains two-lane and one-lane roads with widths equal to
 8.5m and 4.5111. respectiVels. the dimensions mere based on PAES V1 with
 additionalOflSmoneachsidetoaiveroomtorroaddiches.Atmo-laneroad
 bisects the (farm and runs along the length of the term. this servas as the
 mahroadthatareusedbsthemachh‘eries.threeequallsspaeedmadsmere
 set.theaeroadsaretmo-laneroadsandspacedat500mmeaehother.
 thesserveassecondarsroadsthatareusedtotravelfi‘romthemainroad
 tothe tertiarsroadsthat leadtothe field divisions. the tertiat’sroadsare
 one-lane roads that run along the length o? the farm. Pour o? these mere
 placedintheI-‘arm.tmo mereplacedintheoppositeedaesotthel’armahd
 mothertmomereplaeedSOOkInQrombothsidesotthemafilroaithisroad
 network divided the (farm into a 4x4 matrix. Each o? the cells are {further
 subdividedbsofimdifidersmtomaxzmatrimthesedifiderscontainthe
 levee andnecessarscanalstor irrigation and drainage.
Atotalotzsesubdi‘fisionsarepresenththetarm.tachsubdivision
 mares 121.8811: x 81.38114. For each subdivision. a 4m width is allotted as
 headland. the cat-mhasatotalotlszifiehaoifprodwtiwaruthismans
 that only 3.02% of the total area is being utflised by oflfice buildings.
 warehouse. motor pool. sheds. drains l’acflits. road networks. irrigation
 canals and levee.

Brush script MT:

SP_script_brush_script_mt

7. ms .470 9950155707!
 7“- 14,31
wwmmmm-aww¢mmmammagmwqu
 IWuMwyu-w‘l. 75:44“ W. mmmmmmwamquw
 Wmmogmmmmdcwwmmawww. Mad—cum
 Mluwdemw. flammuwmagmaimdmuwwmmhm
 mwmmlymamddamagm. flammaed/uaeagm. W. mp4
 MMWMM—Mw0.75hmalih,w‘
7kfamwmm-lauudw-Mwauflddaf4quth65uu445u. W. 7a
 WWWuPflssmudWafiaumIthmgumd‘da.Ara-(«um
 mmwwwmugwqu. flammau‘mwmmmawm,
 7&«qulb/Wuwmad. Wanaamm—bumadafiudasmufimadda. 714m
 MWWMwudwwwdcwmwdamwwwhfieWm. 7k
 WMwm-mmdammdawagdem,7m4mmwadaw.7am
 wawwmagdewmmmmmwummm4mmm,742m
 mwdekmaaamm.Sadagnfeedbmmwkasuwaauizzmm.
 Waudmumwemwmmaguwmm.
Amqwmwmafim. MWWIZL69n26L36a. 7,4544
 ahWfiWuWI 7kkaulaamqli3.96i¢afflamm. 74am“
wxwzgkmmawmgwm, mm, mm. M. m

Embedded Pictures

The document used for the test for document images with pictures contains two images embedded in the right hand side of the document in a tight box layout. Captions were also provided below the image. The result of analyzing the documents showed relatively high percent errors for the two font styles used, Arial (9.05%) and Times New Roman (6.03%). The presence of the pictures simply degraded the accuracy of the program due to the introduction of non-text pixels. Tesseract assumes that the input image is a binary image of document. The incorporation of the colored images in the document made it difficult for the program to analyze the texts near the pictures. The raw images and resulting extracted text are shown below.

Arial:

WP_san_serif_arial

Machine Selection and Determination of Coverage Area
To mechanize the operation 7 r ‘ ‘ "
 of the land consolidated corn farm,
 different machinery will be employed
 catering to the different farm
 practices associated with corn
 production. For the land preparation,
 trailing harrows and rotary tillers will
 be used coupled to a ninety
horsepower four-wheel tractor. A
 _ [figure I. Thc niner-hnrxeprm'cr rracmr M‘llh (he 20:24 (rm/mg harmuz
 large tractor was chosen Since corn
production requires a deeper cultivation as compared to other crops such as rice. A one
 passing of a 20x24 trailing harrow will be done to be followed by two passes of a 2.2m
rotary tiller across and along the headland.
After the preparing the
 land, crop establishment will
 immediately follow. To perform
 the operation, a four-row
 pneumatic corn planter will be
 used hitched to the existing four-
 wheel tractor. A pneumatic corn
 planter was chosen to minimize
 incurring expenses clue to
additional input requirements.
Figure .7 71w 4-row pneumqu cum planter whi/c being named an field.
Being a tool of precision, a
pneumatic corn planter can ensure that there would only be one seed per hill. Also, the
 use of the pneumatic corn planter will make the crop establishment faster and more
 efficient.

Times New Roman:

WP_serif_times_new_roman

Machine Selection and Determination of Coverage Area
To mechanize the operation of r A, . r ' ' ’
 the land consolidated corn farm,
 different machinery will be employed
 catering to the different farm practices
 associated with corn production For the
 land preparation, trailing harrows and
 rotary tillers will be used coupled to a
ninety horsepower four-wheel tractor. A
large tractor was chosen since corn
 . . _ , [Figure l. The rilner-holtreprm-er Imcmr With the 20x24 trailing harrow.
 production requires a deeper cultivation
 as compared to other crops such as rice. A one passing of a 20x24 trailing harrow will be done to
be followed by two passes of a 2.2m rotary tiller across and along the headland.
After the preparing the land,
 crop establishment will immediately
 follow. To perform the operation, a
 four-row pneumatic corn planter will
 be used hitched to the existing four-
 wheel tractor. A pneumatic corn
 planter was chosen to minimize
 incurring expenses due to additional
input requirements. Being a tool of .
precision, a pneumatic corn planter
Figure 2. The: J—mw pneummlc corn plan/Ur while [70mg levied on field.
can ensure that there would only be
 one seed per hill. Also, the use of the pneumatic corn planter will make the crop establishment
faster and more efficient.

Hand Printed Text

Three hand written documents were used to determine the performance of the OCR in reading hand written documents. The first one shown below was scanned using a cellphone camera. Tesseract gave poor results most likely due to the non-uniformity of the characters. The characters and words were not well spaced. Most of the characters in words were connected. Tesseract’s chopping feature did not work well in this scenario.

HW_jake

Ho" "films btoks n91 beam/u ""1 “*3
~ W " flak, ra‘i’hcr \m, The How CM
kt 90M +0 a 41+}wa plum wémfl
"twig. I 5mg when “M alum brown
“0 one “li m5 ‘1‘):9. ‘
lWYH‘ 005W thx gmtfid.
50 \cusiwm)‘ l alonl+ WW: a. \floYJbUH‘AQ-M
.0” 155; 99m midtr 51am“; a NW5,
.l bu} w 1 Cam he a’ 39mm. , w.

The second had written document has well-spaced characters and words and has ample amount of spacing between lines. Tesseract was able to read some of the words and some parts of sentences but still had a lot of misdetections. It has a difficulty in reading ‘l’ which it sometimes turns it into ‘/’ or ‘1’.

HW_michelle

HcHo Jthere}. 1 am
M'\C\‘\e\\e. I love
reading books g
wrihnj various
compos'x’cion 5. 1
a\ so enjoy drawing.

On the third document is my own hand writing. I used a free hand lettering style used in technical drawing (which I am not very good at due to lack of practice) to see if the OCR can distinguish hand writing which are very much like sans serif fonts. The results show that Tesseract still had the difficulty of reading the document. Most errors were from letter ‘t’ in the document. They were reduced to mere apostrophes despite the letters being clear.

HW_mine

Earn an engineer, a programmer
and a cook. I have a Jrerriioie hand~
wriiing, so Iwro‘re inis paragraph
using a lefiering siyie used in
Jrechnical drawing.1 hope Tesserac’r
will read ii we“.

Closing the Case

The results of the analysis of Tesseract on the documents with various font styles showed that the program can read and extract text from machine printed texts with a relatively high accuracy. In terms of font style preference, San Serif beats Serif font family by a very small margin. The difference in accuracy on the performance in using the two is virtually non-existent, thus both can be used for the program. In terms of alignment, justified documents showed lower errors. This can be attributed to the increased distance between words making them easier to distinguish from each other as compared to the left alight documents which featured uniformly spaced words. If each of the font styles used in the documents were to be compared to each other, Calibri is the most recognizable font. This is mainly due to the uniform thickness of the letters and good character spacing.

On the other hand, Decorative and Script fonts should not be used for Tesseract since they cannot be read well by the algorithm. Such can be solved by changing how the program identifies characters. Tesseract uses Polygons in identifying characters and does not use thinning to remove unneeded thick lines from decorative fonts. Script fonts can be identified by adding new forms of characters to the character identifying algorithm of the program.

Incorporating pictures into the documents degrades the performance of the program and text lines that are out of the paragraph cannot be identified well by the program. This is mainly due to the lack of layout analysis in the steps taken by Tesseract. If layout analysis isto be included, such can errors can be reduced and also the incidence of creation of new paragraphs from the last line of paragraphs.

Tesseract showed poor performance on reading hand written documents. Poorly written text suffered more than the well written ones. Despite having well written characters the OCR still has problems in reading them.

References

Smith, R. (2007). Retrieved March 25, 2016, from Helsingin yliopisto: http://www.helsinki.fi/~mpsilfve/ocr_course/materials/tesseracticdar2007.pdf

CASE #2: Document Layout Analisys

The Case

The task in this exercise was to identify document elements, which includes characters, words, lines, and paragraphs in the three document images shown below. The first image appears to be a book cover with varying colored text and backgrounds. Notable noises can easily be seen in the image due to low resolution. The second is a journal article page with two columns of text and the third is a scanned news article with fairly large resolution. All these document images contain colored pictures and have varying font sizes.

The objective is to generate 5 images per input document image with each mentioned element boxed.

Deductions and Assumptions

In order to solve the problem of identifying text elements in the given documents, a pseudocode was generated and used as guide in the development of code for the exercise.

for each image in input folder:
    load colored image
    convert to grayscale
    perform otsu binarization and determine the threshold value
    if threshold value used <= 127:
        perform multi-level thresholding
    else:
        invert binarized image
    extract contours from binarized image
    determine 90th percentile contour height
    determine 90th percentile contour width
    kernel_height = 90th percentile contour height* 0.1 
    kernel_width = 90th percentile contour width* 0.4 
    for each contour(letter):
        draw containing boxes in colored image
    save image
    dilate letters in binary image horizontally using kernel_width
    for each contour(word):
        draw containing boxes in colored image
    save image
    dilate words in binary image horizontally using kernel_width
    for each contour(line):
        draw containing boxes in colored image
    save image
    dilate lines in binary image vertically using kernel_height
    for each contour(paragraph):
        draw containing boxes in colored image
    save image
    dilate paragraph blobs in binary image
    erode resulting single blob
    for each contour(paragraph blobs):
        draw containing boxes in colored image
    save image

Binarization

The first step of analyzing the documents was to convert these images to gray scale images and then into binary images. The journal article and news article had no issues in the banalization process while the book cover had issues due to the multi colored fonts and backgrounds present in the document. Otsu’s Algorithm for thresholding produced good binary images from the two but not for the latter one as shown below. Only the black texts were included in the binary images. The light colored texts were considered as background and were not included. It should also be noted that the multiple background colors affect the result of thresholding.

Multilevel thresholding

To solve the problem of texts with multiple levels of intensity of text and background, the book cover was subjected to a multi-multilevel thresholding method, however to segregate it from the pool of images, the threshold value obtained from Otsu’s method and was used as deciding factor. Documents with colored backgrounds tend to have lower threshold value compared to documents with plain white backgrounds. All images with relatively low threshold value determined from Otsu’s method were considered to be eligible for multi-level thresholding. A value of 127 was arbitrarily set as limit. The gray scale image was thresholded at 5 values of intensity with values of 100, 165,130,187,194 and 237. This produced the 6 binary images shown below.

 

Binary images

Some texts were missing from other images and some images included background strips which makes the contained texts unreadable (images 2-5). To extract the needed text from the images, image subtraction operations were performed. Image 3 was subtracted from image 2 to produce Image 7, which contains the isolated website text. Image 5 was subtracted from image 4 to produce image 8, which contained only the title.

To combine all the needed texts in one image, images 1, 6, 7 and 8 were combine using direct image addition. The resulting binary image contained all the needed text and was used for the rest of the procedures.

bnw_CookingAtItsBest
final binarized book cover

Document Layout Analysis

With the proper binary image at hand element identification can now proceed. Most of the contents of each document are text. This means that most of the connected components or blobs in the binary image are texts. The rest would either be blobs from the pictures and pepper noise. It is important to remove these elements since they would cause false positive detection of characters in the analysis, however it is quite difficult to remove these unwanted blobs from the binary images. Morphological operations may be used to remove them however; these operations can greatly degrade the important character blobs. Pursuing the removal of the noise elements would cause probable false negative detections, thus, clearing the noise elements was not performed. Moreover, these elements tend to cluster near the original location of pictures in the document, thus, they are less likely to affect the analysis of the document layout.

Characters

In the Document Layout Analysis proper, each blob was inspected and boxed. Using the deduction that most of the blobs in the binary image are character blobs, each of the blobs were enclosed in a rectangle to enclose the characters in a box. The resulting image had non text pixels boxed and multiple characters enclosed in one box. These are side effects of not applying morphological filters to the binary images. Characters blobs remained connected and were boxed all together. Blob fragments from pictures were also boxed.

Words

Character spacing between words is less compared to spacing between characters from different words, thus words can be determined by connecting relatively nearby characters. This was done by performing dilation along the line of text or horizontal dilation using a dynamically determined horizontal kernel. The resulting connected blobs were boxed and are shown below. Some words remain fragmented due to non-uniform distances between letters. Such happens when an paragraph is set to be “justified”. This can also be observed from the titles and other large font characters where letters have more distances from each other.

Lines

Lines were simply determined by applying horizontal dilation to words. Dilation was limited not to combine lines from different columns. The resulting line blobs were enclosed in a rectangle as shown below. Small boxes appear in the lines with ‘i’ and ‘j’ characters since the dot of these characters are separated from the rest of the character. Some fragmentations can also be observed in some lines. Some lines remain fragmented due to non-uniform distances between words, “justified” paragraphs and overly large character spacing.

Paragraphs

Paragraphs are generally lines of text. Lines were dilated vertically using a vertical kernel. Note that the program can only disassociate paragraph if there is a considerable vertical distance between blocks of paragraphs. Paragraphs with paragraph spacing equal to line spacing belonged to the same paragraph. The analysis result is shown below.

Paragraphs with Margins

Closing was applied to Paragraphs to connect them without excessively covering the entire page. As a result, nearby paragraphs and tittles were connected together and an apparent margin around the document containing all texts and images was drawn. The resulting documents are shown below.

Dynamic Kernel Size

It was mentioned that the text in each of the documents had varying font sizes. This complicates the proceeding steps in the analysis. The next steps involve dilation of blobs which requires the right kernel size in order to properly connect letter blobs into word blobs, word blobs into lines and line blobs into paragraphs.  Improper kernel size may over dilate or under dilate a blob causing multiple elements to be boxed or failure to properly connect an element to another.

The kernel size should be based on the smallest character in the document. To determine that, the height and width of the of each blob contour were listed in a descending order. Then the 90th percentile value of each list were determined since. 90th percentile was used since most of the characters are included in the lower percentile ranks. A factor was multiplied to the determined 90th percentile height and 90th percentile width to estimate the needed thickness needed to add to the characters in order to make nearby characters and lines connected. The factor used of the height was 0.1 while 0.4 was used for the width. Horizontal dilation used horizontal kernels using the value calculated from the 90th percentile width and vertical dilation used vertical kernels using the computed value from the 90th percentile height.

Closing the Case

In the course of writing the code for analyzing document layout it was observed that the quality of the document greatly affects the method of analysis and the quality of the outcome. The book cover image had the worst resolution and was the most difficult to extract details from. Letters from the image were detected to be connected to each other. Using erosion on the image decomposed the text to mere pixel noises. It would be possible to enlarge the image to bring the character blobs to workable sizes however, computer memory needs to be large enough to contain the excessively large images and its derivatives. Unfortunately, the machine used during the development of this code did not have that much memory, thus, exploring such solution was not performed. Excluding the results of the character analysis, the results of the document layout analysis gave acceptable outcomes in identifying basic document layout.

CASE #1: Processing SET forms

The Case

The task for this exercise was to process a given set of scanned Student Evaluation for Teachers (SET) forms using image processing. Processing these forms required the identification of shaded and unshaded circles in the answer sheets. To make the task easier, the forms were already in gray scale – except for the blue boxes that sensor some info –, had their histograms modified and a list of centroids of the circles for each question was also provided. The main problem to solve now is how to determine the condition of each of the circle in the forms. There are three expected results from this analysis. This includes identification of unshaded, shaded and crossed-out circles. A certain circle can either be shaded or unshaded. Shaded circles can either be plainly shaded or crossed out. It should be noted that crossed out circles are always shaded since crossing out a circle is a means of changing ones answer in the form. Crossing out an unshaded circle is pointless.

The challenge is how to make a program able to identify these three. It is clear that a method of quantifying the given conditions is necessary in order for the program to understand the difference among the three.

Deductions and Assumptions

With the problem clearly identified, a theory can now be generated. We can start by inspecting the nature of each circle with varying condition – unshaded, shaded and crossed-out. Unshaded circles are basically plain black rings while shaded ones are black rings with even and uneven gray areas within. If these were to be binarized using a proper threshold then inverted, shaded circles are expected to become entirely white while unshaded ones would become white rings in black background. Some circles would contain white specs inside due to uneven shading and noise. This can easily be corrected by applying a morphological filter called closing. It should be noted that white rings (unshaded) contain less number of white pixels than white circles (shaded). The number of white pixels contained by each circle can be used to differentiate unshaded from shaded. The lesser white pixel content the more likely it is to be unshaded.

Both crossed-out circles and shaded circles contain relatively high number of white pixels. This makes the latter method invalid for differentiating the two, however, it can be observed that the shape of shaded circles varies from crossed-out circles. The difference is brought about by the appendages from the ‘x’ marks. The morphological closing kernel used to remove noise should be limited to avoid the unnecessary removal of these appendages.

Looking back into the difference between the two conditions, shaded circles would appear like solid circular white mass of pixels, while crossed out would appear more distorted and less circular in shape. The measure that can be used for this case is circularity. Circularity is a measure of compactness of a shape. The more it is shaped like a circle, the nearer it’s circularity to 1.0. It can be implied that shaded circles would have higher circularity than crossed out circles.

Shown below is the pseudo code used for solving the problem.

load coordinates of circles
for each form:
    load original form image
    binarize form image
    invert pixel values of binary image
    use closing to remove noise
    
    for each given coordinate:
        select region around the coordinate in the binary image
        determine area covered by the circle
        determine the perimeter of the circle
        compute circularity = sqrt(4*Pi*area/perimeter^2)
        
        if area is less than a threshold area value, 
            then circle is not shaded
            draw a green box around circle in original form
        
        else if area is greater than a threshold area value,
            if circularity is greater than a threshold circularity value,
                then circle is shaded
                draw a red box around circle in original form
            
            else if circularity is less than a threshold circularity value,
                then circle is crossed out
                draw a blue box around circle in original form

Kernel size

There are four values needed to make the pseudo code work. Most of these values were achieved through trial and error. Values were tested into the code and adjusted to give the best output possible within the given time frame for solving it. The first one is the kernel size for closing. It was mentioned that a proper size was need in order to remove noise and holes in the shaded circles and prevent excessive removal of appendages from crossed-out circles. A size of 5×5 was used and gave satisfactory results.

Region of Interest size

Next is the area for the region of interest. This value was used to select the circle from the original image using the coordinates in the csv file. After computing the distances between each coordinate, an area of 56×30 pixels was derived and was used in the program. This did not produce reliable values for area since it included unwanted pixels from the text and neighboring crossed-out circles, thus, a lower area was used, 40×30 pixels. This produced sufficiently good results for area measurement.

Threshold values for Area and Circularity

The most important values needed in identifying the condition of a circle in this problem would be the threshold values for area and circularity. The threshold for area was set to 150. It was observed that the unshaded circles in one of the provided form image tend not to have pixel counts greater than around 150, thus the value was used as deciding factor for classifying shaded and unshaded circles. The value used produced satisfactory results.

Identifying the threshold for circularity, on the other hand, posed as a more difficult task. There were only a few number of forms that contained crossed-out circles. The occurrence of crossed-out circles in these forms was very low with only one to three instances per form. This made the sampling difficult, thus a preliminary value of 0.5 was assumed. After testing the value selected it was observed that some crossed out circles were not detected. Circularity values of these circles were checked and it was found out that the value was greater than 0.75, thus, this value was used for in the code. The value produced acceptable results.

The images below shows sample input and output images for the created program.

Closing the Case

A program for reading shaded answers from SET forms was successfully developed. It should be noted however that despite the code being flexible enough to accommodate other similarly images, it is limited in processing forms with sizes other than 1247×1757 pixels. Since it depends on absolute coordinates provided in the csv file. Furthermore, with the lack of statistical analysis and lack of data on the threshold values used in classifying the circles, it is likely to give misdetection in images aside from the given set of images.